-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Adds helper functions for computing and using code object identifiers to facilitate more efficient pickling of lambdas. (#35656) #35656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
tvalentyn
reviewed
Aug 6, 2025
tvalentyn
reviewed
Aug 6, 2025
tvalentyn
reviewed
Aug 11, 2025
tvalentyn
reviewed
Aug 13, 2025
tvalentyn
reviewed
Aug 14, 2025
sdks/python/apache_beam/internal/test_cases/after_module_add_function.py
Show resolved
Hide resolved
…y into code_object_pickler_test.py, rename files in test_data, rename tests in code_object_pickler_test.py
tvalentyn
approved these changes
Aug 14, 2025
Contributor
tvalentyn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
| return path | ||
|
|
||
|
|
||
| def get_code_path(callable: types.FunctionType): |
Contributor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(totally optional) wdyt about naming this method:
get_code_object_identifier
tvalentyn
approved these changes
Aug 15, 2025
DKER2
pushed a commit
to DKER2/beam
that referenced
this pull request
Aug 20, 2025
…to facilitate more efficient pickling of lambdas. (apache#35656) * Update code_object_pickler.py to include consistent pickling for lambdas * Create code_object_pickler_test.py * Update code_object_pickler_test.py * Create after_module_add_function.py * Create after_module_add_lambda_variable.py * Create after_module_add_variable.py * Create after_module_remove_lambda_variable.py * Create after_module_remove_variable.py * Create after_module_with_classes.py * Create after_module_with_global_variable.py * Create after_module_with_nested_function.py * Create after_module_with_nested_function_2.py * Create after_module_with_single_class.py * Create before_module_with_classes.py * Create before_module_with_functions.py * Create before_module_with_lambdas.py * Create module_with_default_argument.py * Create pickle_code_path_test.py * Update code_object_pickler.py * Update pickle_code_path_test.py * Update code_object_pickler_test.py * Update pickle_code_path_test.py * Update after_module_add_function.py * Update after_module_add_lambda_variable.py * Update after_module_add_variable.py * Update after_module_remove_lambda_variable.py * Update after_module_remove_variable.py * Update after_module_with_classes.py * Update after_module_with_global_variable.py * Update after_module_with_nested_function.py * Update after_module_with_nested_function_2.py * Update after_module_with_single_class.py * Update before_module_with_classes.py * Update before_module_with_functions.py * Update before_module_with_lambdas.py * Update module_with_default_argument.py * Update after_module_add_lambda_variable.py * Update after_module_with_nested_function.py * Update code_object_pickler.py * Update code_object_pickler_test.py * Update code_object_pickler_test.py * Update code_object_pickler_test.py * Update code_object_pickler.py * Update pickle_code_path_test.py * Update after_module_with_nested_function.py * Update after_module_with_classes.py * Update after_module_with_nested_function_2.py * Update code_object_pickler_test.py * Update code_object_pickler.py * Update pickle_code_path_test.py * Update pickle_code_path_test.py * Update code_object_pickler.py * Update after_module_with_classes.py * Update after_module_with_nested_function.py * Update after_module_with_nested_function_2.py * Update code_object_pickler_test.py * Update code_object_pickler_test.py * Update code_object_pickler_test.py * Update pickle_code_path_test.py * Update pickle_code_path_test.py * Create __init__.py * Update before_module_with_classes.py * Update after_module_with_single_class.py * Update after_module_with_classes.py * Update pickle_code_path_test.py * Update code_object_pickler_test.py * Update code_object_pickler_test.py * Update pickle_code_path_test.py * Update code_object_pickler_test.py * Update pickle_code_path_test.py * Update code_object_pickler_test.py * Update pickle_code_path_test.py * Update code_object_pickler_test.py * Update code_object_pickler_test.py * Update code_object_pickler_test.py * Update code_object_pickler_test.py * Update code_object_pickler_test.py * Update code_object_pickler_test.py * Update code_object_pickler_test.py * Update code_object_pickler_test.py * Update code_object_pickler_test.py * Update code_object_pickler_test.py * Update code_object_pickler_test.py * Update code_object_pickler_test.py * Update code_object_pickler_test.py * Update pickle_code_path_test.py * Update after_module_with_nested_function.py * Update after_module_with_classes.py * Update after_module_with_single_class.py * Update before_module_with_functions.py * Update after_module_add_lambda_variable.py * Update after_module_with_global_variable.py * Update after_module_with_nested_function_2.py * Update after_module_add_function.py * Update before_module_with_lambdas.py * Update before_module_with_classes.py * Update after_module_add_variable.py * Update code_object_pickler.py * Update pickle_code_path_test.py * Update after_module_add_function.py * Update after_module_add_lambda_variable.py * Update after_module_add_function.py * Update after_module_add_function.py * Update after_module_add_variable.py * Update after_module_remove_lambda_variable.py * Update after_module_remove_variable.py * Update after_module_with_classes.py * Update after_module_with_global_variable.py * Update after_module_with_nested_function.py * Update after_module_with_nested_function_2.py * Update after_module_with_single_class.py * Update before_module_with_classes.py * Update before_module_with_functions.py * Update before_module_with_lambdas.py * Update module_with_default_argument.py * Update after_module_with_classes.py * Update before_module_with_classes.py * Update code_object_pickler_test.py * Update after_module_add_function.py * Update pickle_code_path_test.py * Update code_object_pickler_test.py * Update code_object_pickler.py * Update code_object_pickler_test.py * Update pickle_code_path_test.py * fix formatting and lint * formatting and lint changes * Update code_object_pickler.py * change type hints * fix typo that caused error * fix formatting and lint errors * fix typehints * fix formatting and lint * Update code_object_pickler.py * Update code_object_pickler.py * fix union error * import union * update code_object_pickler.py docstring, move pickle_code_path_test.py into code_object_pickler_test.py, rename files in test_data, rename tests in code_object_pickler_test.py * fix error * rename functions in code_object_pickler.py * fix errors * fix formatting and lint errors * Update sdks/python/apache_beam/internal/code_object_pickler.py --------- Co-authored-by: tvalentyn <tvalentyn@users.noreply.github.com>
3 tasks
tvalentyn
pushed a commit
that referenced
this pull request
Aug 27, 2025
* implement lambda name pickling in cloudpickle * add enable_lambda_name to __init__ * fix formatting and lint * fix typo * fix code paths in test * fix tests * fix lint * fix formatting and failing test * fix formatting again * remove cloudpickle implementation to leave only typo fixes and fixing test structure. * fix _make_function typo * revert regex * fix failing tests * fix formatting * update prefix to not hardcode
damccorm
added a commit
that referenced
this pull request
Aug 27, 2025
* sdks/python: properly make milvus as extra dependency * sdks/python: update image requirements * .github: trigger postcommit python * sdks/python: fix linting issues * sdks/python: fix formatting issues * .github: trigger beam postcommit python * sdks/python: revert milvus version in itests * sdks/python: update image requirements * trigger_files: trigger postcommit python * Bump github.com/docker/go-connections from 0.5.0 to 0.6.0 in /sdks (#35906) Bumps [github.com/docker/go-connections](https://github.com/docker/go-connections) from 0.5.0 to 0.6.0. - [Commits](docker/go-connections@v0.5.0...v0.6.0) --- updated-dependencies: - dependency-name: github.com/docker/go-connections dependency-version: 0.6.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Add the readme link to new YAML examples (#35941) * Bump google.golang.org/api from 0.247.0 to 0.248.0 in /sdks (#35969) * Remove mysql-connector-python dependency (#35932) * Fix typos and update test implementation from #35656 (#35958) * implement lambda name pickling in cloudpickle * add enable_lambda_name to __init__ * fix formatting and lint * fix typo * fix code paths in test * fix tests * fix lint * fix formatting and failing test * fix formatting again * remove cloudpickle implementation to leave only typo fixes and fixing test structure. * fix _make_function typo * revert regex * fix failing tests * fix formatting * update prefix to not hardcode * feat(mongodb): upgrade MongoDB Java driver to version 5.5.0 (#35946) * feat(mongodb): upgrade MongoDB Java driver to version 5.5.0 Update MongoDB Java driver from 3.12.11 to 5.5.0 and refactor code to use new API Add mongo-bson dependency required by new driver version Replace deprecated MongoClient with MongoClients and update GridFS implementation * refactor(mongodb): update MongoDB client usage to modern API Replace deprecated MongoClient with MongoClients.create() and update database drop method * build(dependencies): add mongodb driver core dependency Add mongodb-driver-core to support MongoDB Java driver functionality. Also mark mongo_java_driver as permitUnusedDeclared and add testImplementation. * fix(mongodb): update embedded mongo version and fix split key filtering Update embedded MongoDB test dependency to version 3.5.4 and simplify split key filtering logic by using BsonObjectId for range queries. This ensures proper type handling when filtering MongoDB documents by _id field. * build: add mongodb-driver-core dependency Add mongodb-driver-core version 5.5.0 to support MongoDB Java driver functionality * use version * refactor: simplify mongo client creation logic Remove redundant null check and consolidate uri handling in MongoDbGridFSIO * Bump github.com/aws/aws-sdk-go-v2/credentials in /sdks (#35974) Bumps [github.com/aws/aws-sdk-go-v2/credentials](https://github.com/aws/aws-sdk-go-v2) from 1.18.6 to 1.18.7. - [Release notes](https://github.com/aws/aws-sdk-go-v2/releases) - [Changelog](https://github.com/aws/aws-sdk-go-v2/blob/config/v1.18.7/CHANGELOG.md) - [Commits](aws/aws-sdk-go-v2@config/v1.18.6...config/v1.18.7) --- updated-dependencies: - dependency-name: github.com/aws/aws-sdk-go-v2/credentials dependency-version: 1.18.7 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump google.golang.org/grpc from 1.74.2 to 1.75.0 in /sdks (#35971) Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.74.2 to 1.75.0. - [Release notes](https://github.com/grpc/grpc-go/releases) - [Commits](grpc/grpc-go@v1.74.2...v1.75.0) --- updated-dependencies: - dependency-name: google.golang.org/grpc dependency-version: 1.75.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Override localhost endpoint when a worker is running in docker on mac (#35964) * fix(parquetio): handle missing nullable fields in row conversion (#35948) * fix(parquetio): handle missing nullable fields in row conversion Add null value handling when converting rows to Arrow tables for nullable fields that are missing from input data. This fixes KeyError when writing to Parquet with missing nullable fields, addressing issue #35791. * fix lint * Bump cloud.google.com/go/storage from 1.56.0 to 1.56.1 in /sdks (#35980) Bumps [cloud.google.com/go/storage](https://github.com/googleapis/google-cloud-go) from 1.56.0 to 1.56.1. - [Release notes](https://github.com/googleapis/google-cloud-go/releases) - [Changelog](https://github.com/googleapis/google-cloud-go/blob/main/CHANGES.md) - [Commits](googleapis/google-cloud-go@spanner/v1.56.0...storage/v1.56.1) --- updated-dependencies: - dependency-name: cloud.google.com/go/storage dependency-version: 1.56.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [Prism] Fix segv when docker container self-terminated. (#35977) * Fix segv when docker container is self-terminated * Add some debug logging for docker and process env. * add a jinja % include/import pipeline example to docs (#35931) * add a jinja include pipeline example * update yaml doc with import example * address gemini and other comments * fix table of contents for readme * add link to jinja pipeline examples * Bump github.com/aws/aws-sdk-go-v2/config from 1.31.2 to 1.31.3 in /sdks (#35983) * Add a security GCP log analyzer (#35922) * Add the base log_analyzer * Add github action for security logging * Enhance LogAnalyzer to filter logs by time range and include file names in event summary * Add dry-run option for weekly email report generation in LogAnalyzer * Better error handling for timezones and missing details * Refactor LogAnalyzer to use SinkCls for type consistency and enhance bucket permission management for log sinks * update py containers (#35982) * [YAML]: add import jinja pipeline example (#35945) * add import jinja pipeline example * revert name change * update overall examples readme * fix lint issue * fix gemini small issue * Update sdks/python/apache_beam/yaml/examples/transforms/jinja/import/README.md --------- Co-authored-by: tvalentyn <tvalentyn@users.noreply.github.com> * workflows: capture DinD tests in PreCommit Py Coverage workflow * workflows: temporarily removing `ubuntu-latest` till resolving deps * workflows: add `matrix.os` label to `beam_PreCommit_Python_Coverage` --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Mohamed Awnallah <mohamedmohey2352@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Chamikara Jayalath <chamikaramj@gmail.com> Co-authored-by: Yi Hu <yathu@google.com> Co-authored-by: kristynsmith <kristynsmith@google.com> Co-authored-by: liferoad <huxiangqian@gmail.com> Co-authored-by: Shunping Huang <shunping@google.com> Co-authored-by: Derrick Williams <derrickaw@google.com> Co-authored-by: Enrique Calderon <71863693+ksobrenat32@users.noreply.github.com> Co-authored-by: Ahmed Abualsaud <65791736+ahmedabu98@users.noreply.github.com> Co-authored-by: tvalentyn <tvalentyn@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Creates consistent pickling for lambdas by creating a unique name for lambdas that can be used for stable referencing.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.