Skip to content

[Python] 21.0.0 Removes Attribute PyExtensionType #47155

@bellabaiyunyu

Description

@bellabaiyunyu

Describe the bug, including details regarding any error messages, version, and platform.

Hi team,

We are seeing the following error after upgrading to pyarrow 21.0.0. Downgrading to 20.0.0 resolves the issue.

I believe this is a backward incompatible change. Will team be releasing a patch to make the latest pyarrow backward compatible?

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "<REDACTED_LOCAL_ENV>/build/workloads/nemo-dgxc-benchmarking-g_v25.04/NeMo/scripts/performance/llm/pretrain_deepseek_v2_lite.py", line 21, in <module>
    from nemo.collections.llm.recipes.deepseek_v2_lite import pretrain_recipe
  File "<REDACTED_LOCAL_ENV>/build/workloads/nemo-dgxc-benchmarking-g_v25.04/NeMo/nemo/collections/llm/__init__.py", line 21, in <module>
    from nemo.collections.llm.bert.data import BERTMockDataModule, BERTPreTrainingDataModule, SpecterDataModule
  File "<REDACTED_LOCAL_ENV>/build/workloads/nemo-dgxc-benchmarking-g_v25.04/NeMo/nemo/collections/llm/bert/data/__init__.py", line 3, in <module>
    from nemo.collections.llm.bert.data.specter import SpecterDataModule
  File "<REDACTED_LOCAL_ENV>/build/workloads/nemo-dgxc-benchmarking-g_v25.04/NeMo/nemo/collections/llm/bert/data/specter.py", line 18, in <module>
    from datasets import DatasetDict, load_dataset
  File "<REDACTED_LOCAL_ENV>/env/lib/python3.12/site-packages/datasets/__init__.py", line 22, in <module>
    from .arrow_dataset import Dataset
  File "<REDACTED_LOCAL_ENV>/env/lib/python3.12/site-packages/datasets/arrow_dataset.py", line 67, in <module>
    from .arrow_writer import ArrowWriter, OptimizedTypedSequence
  File "<REDACTED_LOCAL_ENV>/env/lib/python3.12/site-packages/datasets/arrow_writer.py", line 27, in <module>
    from .features import Features, Image, Value
  File "<REDACTED_LOCAL_ENV>/env/lib/python3.12/site-packages/datasets/features/__init__.py", line 18, in <module>
    from .features import Array2D, Array3D, Array4D, Array5D, ClassLabel, Features, Sequence, Value
  File "<REDACTED_LOCAL_ENV>/env/lib/python3.12/site-packages/datasets/features/features.py", line 634, in <module>
    class _ArrayXDExtensionType(pa.PyExtensionType):
                                ^^^^^^^^^^^^^^^^^^
AttributeError: module 'pyarrow' has no attribute 'PyExtensionType'. Did you mean: 'ExtensionType'?

Component(s)

Python

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions