Skip to content

[C++][Python] value_counts extremely slow for chunked DictionaryArray #37055

@randolf-scholz

Description

@randolf-scholz

Describe the bug, including details regarding any error messages, version, and platform.

I have a large dataset (>100M rows) with a dictionary[int32,string] column (ChunkedArray) and noticed that compute.value_counts is extremely slow for this column, compared to other columns.

table[col].value_counts() is 10x-100x slower than table[col].combine_chunks().value_counts() in this case.

Component(s)

C++, Python

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions