ARROW-12983: [C++][Python][R] Properly overflow to chunked array in Python-to-Arrow conversion #10556

kszucs · 2021-06-18T13:47:33Z

port the R changes from ARROW-12983: [C++][Python][R] Properly overflow to chunked array in Python-to-Arrow conversion #10470

Tested locally using:

 PYARROW_TEST_SLOW=ON PYARROW_TEST_LARGE_MEMORY=ON ./run_test.sh -sv pyarrow/tests/

github-actions · 2021-06-18T13:47:54Z

https://issues.apache.org/jira/browse/ARROW-12983

kszucs · 2021-06-18T13:49:33Z

cpp/src/arrow/python/python_to_arrow.cc

This is unrelated to the fix, but quality of life improvement regarding the testing speed.

Created a jira for the changelog https://issues.apache.org/jira/browse/ARROW-13142

cpp/src/arrow/util/converter.h

python/pyarrow/tests/test_convert_builtin.py

cpp/src/arrow/python/python_to_arrow.cc

kszucs · 2021-06-18T16:14:05Z

Since we haven't caught this issue from the R side either I assume there are no (or at least not exercised) large memory tests in the R bindings. @nealrichardson @romainfrancois could you help us out here?

kszucs · 2021-06-18T16:14:39Z

.github/workflows/python.yml

Just experimental to see whether GHA is able to execute these tests.

Builds get killed due to OOM.

lidavidm · 2021-06-18T16:15:12Z

IIRC on the R side the chunker isn't used anyways (this was mentioned in the original PR)

kszucs · 2021-06-18T16:18:19Z

I thought that it has been introduced via 7184c3f (didn't look at the R code).

kszucs · 2021-06-21T12:56:03Z

With PYARROW_TEST_LARGE_MEMORY=ON memory profiler shows the following usage:

According to the GHA docs the hosted macOS runners should have 14GB of RAM available. I'm going to verify that since it would be nice if we could exercise the large memory tests somewhere.

kszucs · 2021-06-21T16:38:03Z

@lidavidm @pitrou The GHA macOS hosted agents indeed provide 14GB of RAM, which means that we can exercise some of the large_memory tests there. I locally went through the large memory cases and annotated the ones taking more than 10 seconds as slow.

After enabling the large memory tests in the macOS python build the build time has increased from 18 minutes to 22 minutes which seems like a nice tradeoff in exchange of actually running the large memory tests.

lidavidm

Thanks for working this out!

… or not

kszucs

+1, merging on green

kszucs · 2021-06-22T15:46:36Z

The build failures are unrelated, merging.

…ython-to-Arrow conversion Still need to port the R changes from apache#10470 Tested locally using: ``` PYARROW_TEST_SLOW=ON PYARROW_TEST_LARGE_MEMORY=ON ./run_test.sh -sv pyarrow/tests/ ``` Closes apache#10556 from kszucs/fff Authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com> Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>

github-actions bot added Component: C++ Component: Python labels Jun 18, 2021

kszucs commented Jun 18, 2021

View reviewed changes

cpp/src/arrow/util/converter.h Outdated Show resolved Hide resolved

kszucs commented Jun 18, 2021

View reviewed changes

cpp/src/arrow/util/converter.h Outdated Show resolved Hide resolved

kszucs commented Jun 18, 2021

View reviewed changes

python/pyarrow/tests/test_convert_builtin.py Outdated Show resolved Hide resolved

lidavidm reviewed Jun 18, 2021

View reviewed changes

cpp/src/arrow/python/python_to_arrow.cc Outdated Show resolved Hide resolved

github-actions bot added the Component: R label Jun 18, 2021

kszucs commented Jun 18, 2021

View reviewed changes

lidavidm approved these changes Jun 18, 2021

View reviewed changes

kszucs changed the title ~~ARROW-12983: [C++][Python] Properly overflow to chunked array in Python-to-Arrow conversion~~ ARROW-12983: [C++][Python] Properly overflow to chunked array in Python-to-Arrow conversion [WIP] Jun 21, 2021

kszucs changed the title ~~ARROW-12983: [C++][Python] Properly overflow to chunked array in Python-to-Arrow conversion [WIP]~~ ARROW-12983: [C++][Python] Properly overflow to chunked array in Python-to-Arrow conversion Jun 21, 2021

lidavidm approved these changes Jun 21, 2021

View reviewed changes

kszucs changed the title ~~ARROW-12983: [C++][Python] Properly overflow to chunked array in Python-to-Arrow conversion~~ ARROW-12983: [C++][Python][R] Properly overflow to chunked array in Python-to-Arrow conversion Jun 21, 2021

kszucs mentioned this pull request Jun 21, 2021

ARROW-12983: [C++][Python][R] Properly overflow to chunked array in Python-to-Arrow conversion #10470

Closed

kszucs added 9 commits June 22, 2021 13:07

Fix auto chunking

6d05e34

Consolidate binary-like converters

4b1e771

Copy new tests cases from the original PR

7839fc7

Fix chunking for the struct types

04fd121

Temporarily enable large memory tests to see whether GHA tolerates it…

27a7660

… or not

Better name for the rewind flag

6acc8b0

Update R bindings

f81657a

Show system memory on macOS hosted runner

5cca163

Re-annotate slow test cases

319a58f

kszucs added 5 commits June 22, 2021 13:07

Explain rewind_on_overflow logic

0a20b7d

Trigger CI

010dcae

Re-annotate more slow test cases

161aa1e

Trigger all of the builds

e45079b

Directly return status

7f8b74a

kszucs force-pushed the fff branch from 5508354 to 7f8b74a Compare June 22, 2021 11:07

kszucs commented Jun 22, 2021

View reviewed changes

kszucs closed this in 8aeec28 Jun 22, 2021

asfimport mentioned this pull request Jul 20, 2021

[C++][Python] Converter::Extend gets stuck in infinite loop causing OOM if values don't fit in single chunk #28701

Closed

ARROW-12983: [C++][Python][R] Properly overflow to chunked array in Python-to-Arrow conversion #10556

ARROW-12983: [C++][Python][R] Properly overflow to chunked array in Python-to-Arrow conversion #10556

Conversation

kszucs commented Jun 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jun 18, 2021

Uh oh!

kszucs Jun 18, 2021

Choose a reason for hiding this comment

Uh oh!

kszucs Jun 22, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kszucs commented Jun 18, 2021

Uh oh!

kszucs Jun 18, 2021

Choose a reason for hiding this comment

Uh oh!

kszucs Jun 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lidavidm commented Jun 18, 2021

Uh oh!

kszucs commented Jun 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kszucs commented Jun 21, 2021

Uh oh!

kszucs commented Jun 21, 2021

Uh oh!

lidavidm left a comment

Choose a reason for hiding this comment

Uh oh!

kszucs left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kszucs commented Jun 22, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kszucs commented Jun 18, 2021 •

edited

Loading

kszucs Jun 21, 2021 •

edited

Loading

kszucs commented Jun 18, 2021 •

edited

Loading

kszucs left a comment •

edited

Loading