-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Description
What happened?
A bug introduced in orjson dependency (ijl/orjson#415) might cause Beam Python pipelines to crash with a segmentation fault or get stuck. Beam uses orjson in BigQuery IO, users of this IO might be affected.
Mitigation
Until Beam 2.51.0 is released, consider any of the following workarounds:
-
Use
apache-beam==2.49.0or below. To avoid running into another known issue, considerapache-beam==2.46.0. -
Install
orjson==3.9.1or below in the runtime environment. For example, you can use a--requirements_filepipeline option with a file that includes:orjson==3.9.1We recommend the version
orjson==3.9.1since it was previously tested with Beam 2.49.0 SDK.For more information, see: https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/
-
Install an [updated version of orjson dependency] (https://pypi.org/project/orjson/#history) once 3.9.4 has a threading issue ijl/orjson#415 is fixed.
Original report
In our latest deployment of our apache beam pipeline our dependency for orjson (dependency of the python apache beam SDK) was upgraded from 3.9.2 to 3.9.4.
The apache beam SDK has a dependency on orjson < 4.0 here:
https://github.com/apache/beam/blob/master/sdks/python/setup.py#L233
With this upgrade of orjson from 3.9.2 to 3.9.4 we are periodically seeing our apache beam SDK hang or the workers crash with segmentation fault errors that we believe is related to this issue in the orjson project:
When reverting from orjson 3.9.4 to 3.9.2 it seems that the issues are resolved.
The python apache beam SDK may want to limit orjson to 3.9.2 or below until orjson issue 415 is resolved.
Issue Priority
Priority: 2
Issue Components
- Component: Python SDK