GH-43728: [Python] ChunkedArray fails gracefully on non-cpu devices #43795

danepitkin · 2024-08-22T21:27:06Z

Rationale for this change

ChunkedArrays that are backed by non-cpu memory should not segfault when the user invokes an incompatible API.

What changes are included in this PR?

Add IsCpu() to ChunkedArray
Throw a python exception for known incompatible APIs on non-cpu device

Are these changes tested?

Unit tests

Are there any user-facing changes?

The user should no longer see segfaults for certain APIs, just python exceptions.

GitHub Issue: [Python] ChunkedArray should fail gracefully on non-cpu devices #43728

github-actions · 2024-08-22T21:27:54Z

⚠️ GitHub issue #43728 has been automatically assigned in GitHub to PR creator.

felipecrv · 2024-08-22T23:46:41Z

cpp/src/arrow/chunked_array.cc

I think we should consider not caching this piece of information as state in the ChunkedArray instance and instead derive it from the chunks when we need it.

Additionally, one advantage of chunking is the flexibility that it brings regarding allocation of buffers (they don't have to be contiguous), so now requiring that all chunks be allocated on the same device seems too rigid.

I proposed a solution to this: chunked arrays producing a DeviceAllocationTypeSet with all the allocation types of the chunks. This set can be represented by a single 64-bit word in memory (I used C++ <bitset>) so it can be copied and matches very efficiently.

Here is the draft PR: https://github.com/apache/arrow/pull/43542/files#diff-b4ffb36b29cfaa2cf9be4fab774921b8344efdc595a358b02c3187ba04141f7eR89

I like your PR! I think this would work great for PyArrow.

We could move this "caching" to the Python side for now (if we don't want to do this in C++, which I think is certainly fine), or otherwise wait on your PR #43542 to land.
(and we should maybe still consider caching the DeviceAllocationTypeSet result? It might be cheap to calculcate in C++, but we still call this before every call of many methods on the python object)

Sorry, didn't see that the PR was already updated in the meantime :)

felipecrv · 2024-08-22T23:48:22Z

python/pyarrow/table.pxi

An is_cpu predicate on chunked arrays can be defined without us forcing a single device type for chunked arrays. This would unblock the Python checks without ruling out the possibility of arrays with mixed device allocations.

felipecrv

C++ part looks good to me.

danepitkin · 2024-08-23T21:06:20Z

@github-actions crossbow submit test-cuda-python

github-actions · 2024-08-23T21:08:33Z

Revision: 06dfe493466961225babc34d90e48ce17eadf970

Submitted crossbow builds: ursacomputing/crossbow @ actions-291bb70866

Task	Status
test-cuda-python

danepitkin · 2024-08-23T21:14:31Z

@github-actions crossbow submit test-cuda-python

github-actions · 2024-08-23T21:16:47Z

Revision: e5bf77396ee1d63a1c88ed143caed8550d75093f

Submitted crossbow builds: ursacomputing/crossbow @ actions-92f2f949c3

Task	Status
test-cuda-python

danepitkin · 2024-08-26T15:47:12Z

@github-actions crossbow submit test-cuda-python

github-actions · 2024-08-26T15:49:36Z

Revision: 739f2d70a40e1956ceac7fb496d6b313e612bab7

Submitted crossbow builds: ursacomputing/crossbow @ actions-b7a5f6c953

Task	Status
test-cuda-python

danepitkin · 2024-08-26T16:06:28Z

@github-actions crossbow submit test-cuda-python

github-actions · 2024-08-26T16:08:50Z

Revision: 0ac2ca4548d3484e3fdee44a18475c938cc8aa50

Submitted crossbow builds: ursacomputing/crossbow @ actions-41d0c41240

Task	Status
test-cuda-python

jorisvandenbossche

Looking good!

cpp/src/arrow/chunked_array.h

python/pyarrow/table.pxi

python/pyarrow/tests/test_table.py

jorisvandenbossche · 2024-09-04T07:47:54Z

python/pyarrow/table.pxi

Suggested change

if self._init_is_cpu == False:

if not self._init_is_cpu:

jorisvandenbossche · 2024-09-04T07:48:42Z

Need to fix some conflicts now I merged the other one

This reverts commit d91cfabbcc374b3fd30e263284a2168c7c7cbf71.

…cpu devices" This reverts commit 1fcdb1f790f9d34b4d63e33f8a162b0346bc2ab5.

danepitkin · 2024-09-04T14:28:05Z

@github-actions crossbow submit test-cuda-python

github-actions · 2024-09-04T14:30:27Z

Revision: a1d857a

Submitted crossbow builds: ursacomputing/crossbow @ actions-c503d748a8

Task	Status
test-cuda-python

conbench-apache-arrow · 2024-09-05T05:11:07Z

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 50219ef.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 60 possible false positives for unstable benchmarks that are known to sometimes produce them.

…ices (apache#43795) ### Rationale for this change ChunkedArrays that are backed by non-cpu memory should not segfault when the user invokes an incompatible API. ### What changes are included in this PR? * Add IsCpu() to ChunkedArray * Throw a python exception for known incompatible APIs on non-cpu device ### Are these changes tested? Unit tests ### Are there any user-facing changes? The user should no longer see segfaults for certain APIs, just python exceptions. * GitHub Issue: apache#43728 Authored-by: Dane Pitkin <dpitkin@apache.org> Signed-off-by: Dane Pitkin <dpitkin@apache.org>

danepitkin requested a review from jorisvandenbossche August 22, 2024 21:27

danepitkin marked this pull request as draft August 22, 2024 21:27

github-actions bot added the awaiting review Awaiting review label Aug 22, 2024

github-actions bot added Component: C++ Component: Python labels Aug 22, 2024

felipecrv reviewed Aug 22, 2024

View reviewed changes

github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Aug 22, 2024

felipecrv reviewed Aug 22, 2024

View reviewed changes

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Aug 23, 2024

felipecrv approved these changes Aug 23, 2024

View reviewed changes

github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Aug 23, 2024

danepitkin marked this pull request as ready for review August 23, 2024 21:06

github-actions bot added awaiting changes Awaiting changes and removed awaiting merge Awaiting merge labels Aug 26, 2024

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Aug 26, 2024

jorisvandenbossche reviewed Aug 27, 2024

View reviewed changes

github-actions bot removed the awaiting change review Awaiting change review label Aug 27, 2024

jorisvandenbossche changed the title ~~GH-43728:[Python] ChunkedArray fails gracefully on non-cpu devices~~ GH-43728: [Python] ChunkedArray fails gracefully on non-cpu devices Sep 4, 2024

jorisvandenbossche reviewed Sep 4, 2024

View reviewed changes

danepitkin added 18 commits September 4, 2024 13:46

apacheGH-43728:[Python] ChunkedArray fails gracefully on non-cpu devices

fa61d91

Update C++ docstring

ebdc7ae

Revert "Update C++ docstring"

414532f

This reverts commit d91cfabbcc374b3fd30e263284a2168c7c7cbf71.

Revert "apacheGH-43728:[Python] ChunkedArray fails gracefully on non-…

f680af0

…cpu devices" This reverts commit 1fcdb1f790f9d34b4d63e33f8a162b0346bc2ab5.

Implement ChunkedArray::IsCpu() instead

7ef98d5

Add tests

6e9c6af

Cleanup tests

ca9f52c

Lint

68954ed

Fix nbytes test

72acfca

Fix get_total_buffer_size() test

19b0239

Rebase

0caa984

Delete unused header

cca6c77

Update tests

dc585d2

Cache is_cpu property

7c413a0

Fix broken test

b422e2b

Delete bad test

6d12a4f

Cleanup is_cup()

5c67cd8

Fix tests after rebase

a1d857a

danepitkin force-pushed the danepitkin/chunked-array-device-type branch from 8a0f28c to a1d857a Compare September 4, 2024 14:25

danepitkin merged commit 50219ef into apache:main Sep 4, 2024

danepitkin removed the awaiting change review Awaiting change review label Sep 4, 2024

danepitkin mentioned this pull request Sep 4, 2024

[Python] ChunkedArray should fail gracefully on non-cpu devices #43728

Closed

GH-43728: [Python] ChunkedArray fails gracefully on non-cpu devices #43795

GH-43728: [Python] ChunkedArray fails gracefully on non-cpu devices #43795

Uh oh!

Conversation

danepitkin commented Aug 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

github-actions bot commented Aug 22, 2024

Uh oh!

felipecrv Aug 22, 2024

Choose a reason for hiding this comment

Uh oh!

danepitkin Aug 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche Aug 26, 2024

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche Aug 26, 2024

Choose a reason for hiding this comment

Uh oh!

felipecrv Aug 22, 2024

Choose a reason for hiding this comment

Uh oh!

felipecrv left a comment

Choose a reason for hiding this comment

Uh oh!

danepitkin commented Aug 23, 2024

Uh oh!

github-actions bot commented Aug 23, 2024

Uh oh!

danepitkin commented Aug 23, 2024

Uh oh!

github-actions bot commented Aug 23, 2024

Uh oh!

danepitkin commented Aug 26, 2024

Uh oh!

github-actions bot commented Aug 26, 2024

Uh oh!

danepitkin commented Aug 26, 2024

Uh oh!

github-actions bot commented Aug 26, 2024

Uh oh!

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jorisvandenbossche Sep 4, 2024

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche commented Sep 4, 2024

Uh oh!

danepitkin commented Sep 4, 2024

Uh oh!

github-actions bot commented Sep 4, 2024

Uh oh!

conbench-apache-arrow bot commented Sep 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

danepitkin commented Aug 22, 2024 •

edited

Loading

danepitkin Aug 23, 2024 •

edited

Loading