Skip to content

Conversation

@kszucs
Copy link
Member

@kszucs kszucs commented Jun 18, 2021

Tested locally using:

 PYARROW_TEST_SLOW=ON PYARROW_TEST_LARGE_MEMORY=ON ./run_test.sh -sv pyarrow/tests/

@github-actions
Copy link

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unrelated to the fix, but quality of life improvement regarding the testing speed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a jira for the changelog https://issues.apache.org/jira/browse/ARROW-13142

@kszucs
Copy link
Member Author

kszucs commented Jun 18, 2021

Since we haven't caught this issue from the R side either I assume there are no (or at least not exercised) large memory tests in the R bindings. @nealrichardson @romainfrancois could you help us out here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just experimental to see whether GHA is able to execute these tests.

Copy link
Member Author

@kszucs kszucs Jun 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Builds get killed due to OOM.

@lidavidm
Copy link
Member

IIRC on the R side the chunker isn't used anyways (this was mentioned in the original PR)

@kszucs
Copy link
Member Author

kszucs commented Jun 18, 2021

I thought that it has been introduced via 7184c3f (didn't look at the R code).

@kszucs kszucs changed the title ARROW-12983: [C++][Python] Properly overflow to chunked array in Python-to-Arrow conversion ARROW-12983: [C++][Python] Properly overflow to chunked array in Python-to-Arrow conversion [WIP] Jun 21, 2021
@kszucs
Copy link
Member Author

kszucs commented Jun 21, 2021

With PYARROW_TEST_LARGE_MEMORY=ON memory profiler shows the following usage:

image

According to the GHA docs the hosted macOS runners should have 14GB of RAM available. I'm going to verify that since it would be nice if we could exercise the large memory tests somewhere.

@kszucs
Copy link
Member Author

kszucs commented Jun 21, 2021

@lidavidm @pitrou The GHA macOS hosted agents indeed provide 14GB of RAM, which means that we can exercise some of the large_memory tests there. I locally went through the large memory cases and annotated the ones taking more than 10 seconds as slow.

After enabling the large memory tests in the macOS python build the build time has increased from 18 minutes to 22 minutes which seems like a nice tradeoff in exchange of actually running the large memory tests.

@kszucs kszucs changed the title ARROW-12983: [C++][Python] Properly overflow to chunked array in Python-to-Arrow conversion [WIP] ARROW-12983: [C++][Python] Properly overflow to chunked array in Python-to-Arrow conversion Jun 21, 2021
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working this out!

@kszucs kszucs changed the title ARROW-12983: [C++][Python] Properly overflow to chunked array in Python-to-Arrow conversion ARROW-12983: [C++][Python][R] Properly overflow to chunked array in Python-to-Arrow conversion Jun 21, 2021
Copy link
Member Author

@kszucs kszucs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, merging on green

@kszucs
Copy link
Member Author

kszucs commented Jun 22, 2021

The build failures are unrelated, merging.

@kszucs kszucs closed this in 8aeec28 Jun 22, 2021
sjperkins pushed a commit to sjperkins/arrow that referenced this pull request Jun 23, 2021
…ython-to-Arrow conversion

Still need to port the R changes from apache#10470

Tested locally using:

```
 PYARROW_TEST_SLOW=ON PYARROW_TEST_LARGE_MEMORY=ON ./run_test.sh -sv pyarrow/tests/
```

Closes apache#10556 from kszucs/fff

Authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants