Skip to content

Conversation

@mohamedawnallah
Copy link
Contributor

@mohamedawnallah mohamedawnallah commented Jun 29, 2025

Description

  • Update the Enrichment Documentation to include Google CloudSQL Enrichment Handler integration
  • Add Python Example for Google CloudSQL (Managed PostgreSQL) Enrichment to the Documentation
  • Add Python Example for Unmanaged PostgreSQL Enrichment to the Documentation
  • Add Python Example for Unmanaged MySQL Enrichment to the Documentation
  • Add Python Example for Unmanaged Microsoft SQL Server Enrichment to the Documentation
  • Add Release Note for the added documentation and notebook example for Google CloudSQL Enrichment Handler

Closes #30773.
Closes #36095.
Towards #35046.
Prev #34398.

Website Update Demo

website_update_demo.mov

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

@mohamedawnallah mohamedawnallah force-pushed the updateDocsAndAddExamplesForCloudSQLHandler branch from 76686aa to cdd1f6f Compare June 29, 2025 02:53
@github-actions
Copy link
Contributor

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

Copy link
Contributor

@damccorm damccorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally LGTM, but we will probably want to wait until #34398 is merged/released to actually merge. Thanks!

…ementwise/enrichment-cloudsql.md

Co-authored-by: Danny McCormick <dannymccormick@google.com>
@damccorm
Copy link
Contributor

@mohamedawnallah would you mind fixing the conflicts here? Otherwise I think this should be good to merge

DEVELOCITY_ACCESS_KEY: ${{ secrets.DEVELOCITY_ACCESS_KEY }}
GRADLE_ENTERPRISE_CACHE_USERNAME: ${{ secrets.GE_CACHE_USERNAME }}
GRADLE_ENTERPRISE_CACHE_PASSWORD: ${{ secrets.GE_CACHE_PASSWORD }}
ALLOYDB_PASSWORD: ${{ secrets.ALLOYDB_PASSWORD }}
Copy link
Contributor Author

@mohamedawnallah mohamedawnallah Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CloudSQL instance on Google Cloud expected to fail in the examples since that workflow change haven't yet switched on. As seeing in the CI logs:

____________ EnrichmentTest.test_enrichment_with_google_cloudsql_pg ____________
[gw0] linux -- Python 3.11.12 /runner/_work/beam/beam/sdks/python/test-suites/tox/py311/build/srcs/sdks/python/target/.tox-py311-cloud/py311-cloud/bin/python
self = <apache_beam.examples.snippets.transforms.elementwise.enrichment_test.EnrichmentTest testMethod=test_enrichment_with_google_cloudsql_pg>
mock_stdout = <_io.StringIO object at 0x7f388c1adbd0>
    def test_enrichment_with_google_cloudsql_pg(self, mock_stdout):
      db_adapter = DatabaseTypeAdapter.POSTGRESQL
>     with EnrichmentTestHelpers.sql_test_context(True, db_adapter):
apache_beam/examples/snippets/transforms/elementwise/enrichment_test.py:147: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/opt/hostedtoolcache/Python/3.11.12/x64/lib/python3.11/contextlib.py:137: in __enter__
    return next(self.gen)
apache_beam/examples/snippets/transforms/elementwise/enrichment_test.py:203: in sql_test_context
    result = EnrichmentTestHelpers.pre_sql_enrichment_test(
apache_beam/examples/snippets/transforms/elementwise/enrichment_test.py:246: in pre_sql_enrichment_test
    os.environ['GOOGLE_CLOUD_SQL_DB_PASSWORD'] = password
<frozen os>:684: in __setitem__
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
value = None
>   ???
E   TypeError: str expected, not NoneType
<frozen os>:758: TypeError
- generated xml file: /runner/_work/beam/beam/sdks/python/test-

@damccorm
Copy link
Contributor

damccorm commented Sep 5, 2025

Looks like the coverage failure here is probably a real issue. We probably need the same change there that we're making to the examples workflow.

<testcase classname="apache_beam.examples.snippets.transforms.elementwise.enrichment_test.EnrichmentTest" name="test_enrichment_with_google_cloudsql_pg" time="2.420">
<failure message="TypeError: str expected, not NoneType">self = <apache_beam.examples.snippets.transforms.elementwise.enrichment_test.EnrichmentTest testMethod=test_enrichment_with_google_cloudsql_pg> mock_stdout = <_io.StringIO object at 0x781158679f70> def test_enrichment_with_google_cloudsql_pg(self, mock_stdout): db_adapter = DatabaseTypeAdapter.POSTGRESQL > with EnrichmentTestHelpers.sql_test_context(True, db_adapter): apache_beam/examples/snippets/transforms/elementwise/enrichment_test.py:147: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ /opt/hostedtoolcache/Python/3.9.22/x64/lib/python3.9/contextlib.py:119: in __enter__ return next(self.gen) apache_beam/examples/snippets/transforms/elementwise/enrichment_test.py:203: in sql_test_context result = EnrichmentTestHelpers.pre_sql_enrichment_test( apache_beam/examples/snippets/transforms/elementwise/enrichment_test.py:246: in pre_sql_enrichment_test os.environ['GOOGLE_CLOUD_SQL_DB_PASSWORD'] = password /opt/hostedtoolcache/Python/3.9.22/x64/lib/python3.9/os.py:684: in __setitem__ value = self.encodevalue(value) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ value = None def encode(value): if not isinstance(value, str): > raise TypeError("str expected, not %s" % type(value).__name__) E TypeError: str expected, not NoneType /opt/hostedtoolcache/Python/3.9.22/x64/lib/python3.9/os.py:756: TypeError</failure>
</testcase>

Rather than bundling those changes with this PR, lets make them in a separate PR so that we can confirm all tests pass here. Added #36061 to do this

@github-actions github-actions bot removed the build label Sep 5, 2025
@mohamedawnallah mohamedawnallah force-pushed the updateDocsAndAddExamplesForCloudSQLHandler branch from fcbc37d to af8e4fe Compare September 5, 2025 13:20
@damccorm
Copy link
Contributor

damccorm commented Sep 5, 2025

@mohamedawnallah would you mind taking a look at the failures here?

apache_beam/examples/snippets/transforms/elementwise/enrichment_test.py:203: in sql_test_context
    result = EnrichmentTestHelpers.pre_sql_enrichment_test(
apache_beam/examples/snippets/transforms/elementwise/enrichment_test.py:246: in pre_sql_enrichment_test
    os.environ['GOOGLE_CLOUD_SQL_DB_PASSWORD'] = password
/opt/hostedtoolcache/Python/3.9.22/x64/lib/python3.9/os.py:684: in __setitem__
    value = self.encodevalue(value)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

value = None

    def encode(value):
        if not isinstance(value, str):
>           raise TypeError("str expected, not %s" % type(value).__name__)
E           TypeError: str expected, not NoneType

@mohamedawnallah
Copy link
Contributor Author

mohamedawnallah commented Sep 7, 2025

@mohamedawnallah would you mind taking a look at the failures here?

apache_beam/examples/snippets/transforms/elementwise/enrichment_test.py:203: in sql_test_context
    result = EnrichmentTestHelpers.pre_sql_enrichment_test(
apache_beam/examples/snippets/transforms/elementwise/enrichment_test.py:246: in pre_sql_enrichment_test
    os.environ['GOOGLE_CLOUD_SQL_DB_PASSWORD'] = password
/opt/hostedtoolcache/Python/3.9.22/x64/lib/python3.9/os.py:684: in __setitem__
    value = self.encodevalue(value)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

value = None

    def encode(value):
        if not isinstance(value, str):
>           raise TypeError("str expected, not %s" % type(value).__name__)
E           TypeError: str expected, not NoneType

The tests are currently failing because ALLOYDB_PASSWORD is being perceived as None, even though it's configured as a secret in the relevant workflows PreCommit Python Coverage and PreCommit Python Examples (#36061).

I've encountered this exact issue before and couldn't really figure it out (similar problem when tried to access on beam PostCommit Python workflow in #34398 (comment)). This secret specifically seems having issues when attempting to use in other workflows.

The failure occurs at this specific line where the code attempts to read the password environment variable:

password = os.getenv("ALLOYDB_PASSWORD")

https://github.com/mohamedawnallah/beam/blob/af8e4fe198b89a0abfd9a88fb2c1ab0303338772/sdks/python/apache_beam/examples/snippets/transforms/elementwise/enrichment_test.py#L242

When the environment variable doesn't exist, os.getenv() returns None, causing the test to fail.

This appears to be an infrastructure issue related to how the ALLOYDB_PASSWORD secret is accessed across different
workflows. Perhaps @Amar3tto, @A1K28 can help

Probably the least resistant path would be to skip these tests when ALLOYDB_PASSWORD is not accessible. However, this approach doesn't address the root cause, perhaps not a clean solution, and at the same time allow us to make progress on this PR and that secret access issue can be resolved later

What do you think, @damccorm?

@damccorm
Copy link
Contributor

damccorm commented Sep 8, 2025

@mohamedawnallah would you mind taking a look at the failures here?

apache_beam/examples/snippets/transforms/elementwise/enrichment_test.py:203: in sql_test_context
    result = EnrichmentTestHelpers.pre_sql_enrichment_test(
apache_beam/examples/snippets/transforms/elementwise/enrichment_test.py:246: in pre_sql_enrichment_test
    os.environ['GOOGLE_CLOUD_SQL_DB_PASSWORD'] = password
/opt/hostedtoolcache/Python/3.9.22/x64/lib/python3.9/os.py:684: in __setitem__
    value = self.encodevalue(value)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

value = None

    def encode(value):
        if not isinstance(value, str):
>           raise TypeError("str expected, not %s" % type(value).__name__)
E           TypeError: str expected, not NoneType

The tests are currently failing because ALLOYDB_PASSWORD is being perceived as None, even though it's configured as a secret in the relevant workflows PreCommit Python Coverage and PreCommit Python Examples (#36061).

I've encountered this exact issue before and couldn't really figure it out (similar problem when tried to access on beam PostCommit Python workflow in #34398 (comment)). This secret specifically seems having issues when attempting to use in other workflows.

The failure occurs at this specific line where the code attempts to read the password environment variable:

password = os.getenv("ALLOYDB_PASSWORD")

https://github.com/mohamedawnallah/beam/blob/af8e4fe198b89a0abfd9a88fb2c1ab0303338772/sdks/python/apache_beam/examples/snippets/transforms/elementwise/enrichment_test.py#L242

When the environment variable doesn't exist, os.getenv() returns None, causing the test to fail.

This appears to be an infrastructure issue related to how the ALLOYDB_PASSWORD secret is accessed across different workflows. Perhaps @Amar3tto, @A1K28 can help

Probably the least resistant path would be to skip these tests when ALLOYDB_PASSWORD is not accessible. However, this approach doesn't address the root cause, perhaps not a clean solution, and at the same time allow us to make progress on this PR and that secret access issue can be resolved later

What do you think, @damccorm?

Have you tried running the gradle task that is failing locally? One idea is that it may be that we're invoking these tests with tox and need to allow list this variable to be passed through to the tox environment -

passenv=TERM,CLOUDSDK_CONFIG,DOCKER_*,TESTCONTAINERS_*,TC_*

@mohamedawnallah
Copy link
Contributor Author

mohamedawnallah commented Sep 9, 2025

The tests are now passing for this changeset 👍

@mohamedawnallah
Copy link
Contributor Author

We can enforce running CloudSQL enrichment handler on Precommit Python Transforms by removing is_integration_test=True from TestPipeline and removing using uses_testcontainer pytest marker so it not collected to run in the postcommit suite. That way we can limit the scope of running those test cases as well as avoiding such flaky runs on postcommit suite #35816 (comment):

task testcontainersTest {
dependsOn 'installGcpTest'
dependsOn ':sdks:python:sdist'
doLast {
def testOpts = basicTestOpts
def argMap = [
"test_opts": testOpts,
"suite": "postCommitIT-direct-py${pythonVersionSuffix}",
"collect": "uses_testcontainer",
"runner": "TestDirectRunner",
"region": "us-central1",
]
def cmdArgs = mapToArgString(argMap)
exec {
executable 'sh'
args '-c', ". ${envdir}/bin/activate && ${runScriptsDir}/run_integration_test.sh $cmdArgs"
}
}
}

That require us to add ALLOYDB_PASSWORD env variable to beam_PreCommit_Python_Transforms workflow so Google CloudSQL PostgreSQL instance can run there without being skipped

env:
DEVELOCITY_ACCESS_KEY: ${{ secrets.DEVELOCITY_ACCESS_KEY }}
GRADLE_ENTERPRISE_CACHE_USERNAME: ${{ secrets.GE_CACHE_USERNAME }}
GRADLE_ENTERPRISE_CACHE_PASSWORD: ${{ secrets.GE_CACHE_PASSWORD }}

@mohamedawnallah
Copy link
Contributor Author

mohamedawnallah commented Sep 9, 2025

Submitted this PR #36096 to add ALLOYDB_PASSWORD to Precommit Python Transforms workflow. Follow-up on #35473 (comment)

@mohamedawnallah
Copy link
Contributor Author

mohamedawnallah commented Sep 9, 2025

After enforcing CloudSQL tests to run only on precommit python transforms they are running and passing except the Google CloudSQL Enrichment tests due to lack of ALLOYDB_PASSWORD in PreCommit Python Transforms workflow. With this PR #36096 we can enable them to run on this workflow

[gw3] [ 91%] SKIPPED apache_beam/transforms/enrichment_handlers/cloudsql_it_test.py::TestCloudSQLPostgresEnrichment::test_sql_enrichment 
apache_beam/transforms/enrichment_handlers/cloudsql_it_test.py::TestCloudSQLPostgresEnrichment::test_sql_enrichment_batched 
[gw3] [ 91%] SKIPPED apache_beam/transforms/enrichment_handlers/cloudsql_it_test.py::TestCloudSQLPostgresEnrichment::test_sql_enrichment_batched 
apache_beam/transforms/enrichment_handlers/cloudsql_it_test.py::TestCloudSQLPostgresEnrichment::test_sql_enrichment_batched_multiple_fields 
[gw3] [ 91%] SKIPPED apache_beam/transforms/enrichment_handlers/cloudsql_it_test.py::TestCloudSQLPostgresEnrichment::test_sql_enrichment_batched_multiple_fields 
apache_beam/transforms/enrichment_handlers/cloudsql_it_test.py::TestCloudSQLPostgresEnrichment::test_sql_enrichment_on_non_existent_table 
[gw3] [ 91%] SKIPPED apache_beam/transforms/enrichment_handlers/cloudsql_it_test.py::TestCloudSQLPostgresEnrichment::test_sql_enrichment_on_non_existent_table 
apache_beam/transforms/enrichment_handlers/cloudsql_it_test.py::TestCloudSQLPostgresEnrichment::test_sql_enrichment_with_condition_value_fn 
[gw3] [ 91%] SKIPPED apache_beam/transforms/enrichment_handlers/cloudsql_it_test.py::TestCloudSQLPostgresEnrichment::test_sql_enrichment_with_condition_value_fn 
apache_beam/transforms/enrichment_handlers/cloudsql_it_test.py::TestCloudSQLPostgresEnrichment::test_sql_enrichment_with_query_fn 
[gw3] [ 91%] SKIPPED 
...

@mohamedawnallah
Copy link
Contributor Author

Merged now the changes in this PR after the merge of #36096

@damccorm
Copy link
Contributor

damccorm commented Sep 9, 2025

It looks like the coverage workflow was failing with the same issue as #35473 (comment)

Since this test is running in some suites, could we just skip it when GOOGLE_CLOUD_SQL_DB_PASSWORD is not present?

@mohamedawnallah
Copy link
Contributor Author

mohamedawnallah commented Sep 9, 2025

It looks like the coverage workflow was failing with the same issue as #35473 (comment)

Since this test is running in some suites, could we just skip it when GOOGLE_CLOUD_SQL_DB_PASSWORD is not present?

Added that conditional skip 👍. Also removed ALLOYDB_PASSWORD environment variable in precommit python coverage since it is no longer used now

@github-actions github-actions bot added the build label Sep 9, 2025
@damccorm
Copy link
Contributor

damccorm commented Sep 9, 2025

Hm, looks like the infra is right now, but the tests are failing - https://github.com/apache/beam/actions/runs/17588014826/attempts/1?pr=35473 is an example. I tried kicking it off again, but it looks like a real problem with duplicate data

@mohamedawnallah
Copy link
Contributor Author

mohamedawnallah commented Sep 9, 2025

Hm, looks like the infra is right now, but the tests are failing - https://github.com/apache/beam/actions/runs/17588014826/attempts/1?pr=35473 is an example. I tried kicking it off again, but it looks like a real problem with duplicate data

Yeah looked at it debugging it now 👍

EDIT:
The issue turned out that SQLAlchemy cls._metadata.drop_all(cls._engine) doesn't drop the residual table at the end of the test in case of Google CloudSQL PostgreSQL managed database other unmanaged ones works fine (that's why we get duplicate data for every test run).

Submitted a commit patch to try use raw SQL first as table removal approach and fallback to this SQLAlchemy. Also made the table id for CloudSQL to be unique using uuid similar to others (with this even table removal failed the tests still passing since unique table id almost for every run)

@mohamedawnallah
Copy link
Contributor Author

@damccorm - the relevant CI tests are green now

Copy link
Contributor

@damccorm damccorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - this LGTM!

@damccorm damccorm merged commit 1da37bc into apache:master Sep 10, 2025
100 of 101 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request]: add CloudSQL Examples and update the Beam website [Feature Request]: CloudSQL Enrichment Handler

2 participants