Skip to content

[Bug]: Python 3.12 in-compatibility of Apache Beam  #32617

@potiuk

Description

@potiuk

What happened?

I would like to report that Python 3.12 support for Apache Beam is a bit broken due to Python SDK depending on old version of dill (and cloudpickle as well but that's not likely a blocker)

Currently in Apache Airlfow, the beam provider is disabled for Python 3.12, because adding Apache Beam with it's dependencies made it impossible to have non-conflicting dependencies. After the last release of Apache Beam (2.59.0) - I was hoping all the problems with Python 3.12 were solved, and attempted to rebase the PR bringing back Beam provider to Python 3.12, but - unfortunately our tests had shown that there is one more conflict left.

You can see a failing build here https://github.com/apache/airflow/actions/runs/11121136124/job/30899938977?pr=41990
and PR to bring beam back is apache/airflow#42505.

The failing tests are not beam tests - there are tests that test "dill" serialization for Airflow Python Virtualenv Operator and the error is this:

INFO     airflow.utils.process_utils:process_utils.py:190 Output:
INFO     airflow.utils.process_utils:process_utils.py:194 Traceback (most recent call last):
INFO     airflow.utils.process_utils:process_utils.py:194   File "/tmp/venv-callsdqfisel/script.py", line 72, in <module>
INFO     airflow.utils.process_utils:process_utils.py:194     arg_dict = dill.load(file)
INFO     airflow.utils.process_utils:process_utils.py:194                ^^^^^^^^^^^^^^^
INFO     airflow.utils.process_utils:process_utils.py:194   File "/usr/local/lib/python3.12/site-packages/dill/_dill.py", line 270, in load
INFO     airflow.utils.process_utils:process_utils.py:194     return Unpickler(file, ignore=ignore, **kwds).load()
INFO     airflow.utils.process_utils:process_utils.py:194            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
INFO     airflow.utils.process_utils:process_utils.py:194   File "/usr/local/lib/python3.12/site-packages/dill/_dill.py", line 472, in load
INFO     airflow.utils.process_utils:process_utils.py:194     obj = StockUnpickler.load(self)
INFO     airflow.utils.process_utils:process_utils.py:194           ^^^^^^^^^^^^^^^^^^^^^^^^^
INFO     airflow.utils.process_utils:process_utils.py:194 TypeError: code() argument 13 must be str, not int

The analysis of the issue shown that the problem is with the dill version Apache Beam expects is not compatible with Python 3.12 and produces this error. Before re-enabling Beam for Python 3.12, the tests were passing on Python 3.12 and dill version used was 0.3.9, but apache beam has very strict requirement for dill version.

This is what happen when we add Apache Beam to Python 3.12 environment:

> apache-beam==2.59.0
145c146
< cloudpickle==3.0.0
---
> cloudpickle==2.2.1
167c168
< dill==0.3.9
---
> dill==0.3.1.1

And it's caused by this limitation:

          # Dill doesn't have forwards-compatibility guarantees within minor
          # version. Pickles created with a new version of dill may not unpickle
          # using older version of dill. It is best to use the same version of
          # dill on client and server, therefore list of allowed versions is
          # very narrow. See: https://github.com/uqfoundation/dill/issues/341.
          'dill>=0.3.1.1,<0.3.2',

Also cloudpickle is downgraded to 2.2.1 due to this limitation:

          # It is prudent to use the same version of pickler at job submission
          # and at runtime, therefore bounds need to be tight.
          # To avoid depending on an old dependency, update the minor version on
          # every Beam release, see: https://github.com/apache/beam/issues/23119
          'cloudpickle~=2.2.1',

But cloudpickle is not as problematic as dill is in this case - simply because the old version of dill does not properly support Python 3.12.

It would be great if the next release of Apache Beam bumps at least dill to latest version (and possibly cloudpickle) - as this would allow finally to make Apache Beam provider in Airflow to have Python 3.12 support.

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions