-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Description
What happened?
I would like to report that Python 3.12 support for Apache Beam is a bit broken due to Python SDK depending on old version of dill (and cloudpickle as well but that's not likely a blocker)
Currently in Apache Airlfow, the beam provider is disabled for Python 3.12, because adding Apache Beam with it's dependencies made it impossible to have non-conflicting dependencies. After the last release of Apache Beam (2.59.0) - I was hoping all the problems with Python 3.12 were solved, and attempted to rebase the PR bringing back Beam provider to Python 3.12, but - unfortunately our tests had shown that there is one more conflict left.
You can see a failing build here https://github.com/apache/airflow/actions/runs/11121136124/job/30899938977?pr=41990
and PR to bring beam back is apache/airflow#42505.
The failing tests are not beam tests - there are tests that test "dill" serialization for Airflow Python Virtualenv Operator and the error is this:
INFO airflow.utils.process_utils:process_utils.py:190 Output:
INFO airflow.utils.process_utils:process_utils.py:194 Traceback (most recent call last):
INFO airflow.utils.process_utils:process_utils.py:194 File "/tmp/venv-callsdqfisel/script.py", line 72, in <module>
INFO airflow.utils.process_utils:process_utils.py:194 arg_dict = dill.load(file)
INFO airflow.utils.process_utils:process_utils.py:194 ^^^^^^^^^^^^^^^
INFO airflow.utils.process_utils:process_utils.py:194 File "/usr/local/lib/python3.12/site-packages/dill/_dill.py", line 270, in load
INFO airflow.utils.process_utils:process_utils.py:194 return Unpickler(file, ignore=ignore, **kwds).load()
INFO airflow.utils.process_utils:process_utils.py:194 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
INFO airflow.utils.process_utils:process_utils.py:194 File "/usr/local/lib/python3.12/site-packages/dill/_dill.py", line 472, in load
INFO airflow.utils.process_utils:process_utils.py:194 obj = StockUnpickler.load(self)
INFO airflow.utils.process_utils:process_utils.py:194 ^^^^^^^^^^^^^^^^^^^^^^^^^
INFO airflow.utils.process_utils:process_utils.py:194 TypeError: code() argument 13 must be str, not int
The analysis of the issue shown that the problem is with the dill version Apache Beam expects is not compatible with Python 3.12 and produces this error. Before re-enabling Beam for Python 3.12, the tests were passing on Python 3.12 and dill version used was 0.3.9, but apache beam has very strict requirement for dill version.
This is what happen when we add Apache Beam to Python 3.12 environment:
> apache-beam==2.59.0
145c146
< cloudpickle==3.0.0
---
> cloudpickle==2.2.1
167c168
< dill==0.3.9
---
> dill==0.3.1.1And it's caused by this limitation:
# Dill doesn't have forwards-compatibility guarantees within minor
# version. Pickles created with a new version of dill may not unpickle
# using older version of dill. It is best to use the same version of
# dill on client and server, therefore list of allowed versions is
# very narrow. See: https://github.com/uqfoundation/dill/issues/341.
'dill>=0.3.1.1,<0.3.2',Also cloudpickle is downgraded to 2.2.1 due to this limitation:
# It is prudent to use the same version of pickler at job submission
# and at runtime, therefore bounds need to be tight.
# To avoid depending on an old dependency, update the minor version on
# every Beam release, see: https://github.com/apache/beam/issues/23119
'cloudpickle~=2.2.1',But cloudpickle is not as problematic as dill is in this case - simply because the old version of dill does not properly support Python 3.12.
It would be great if the next release of Apache Beam bumps at least dill to latest version (and possibly cloudpickle) - as this would allow finally to make Apache Beam provider in Airflow to have Python 3.12 support.
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam YAML
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Infrastructure
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner