ARROW-12683: [C++] Enable fine-grained I/O (coalescing) in IPC reader #11486

niyue · 2021-10-20T14:26:41Z

This PR tries to fix https://issues.apache.org/jira/browse/ARROW-12683 ([C++] Enable fine-grained I/O (coalescing) in IPC reader)

This is my first PR for arrow, please forgive my ignorance and let me know the issues for code format/convention/etc.
And probably I chose a wrong issue as the first problem I want to contribute since after investigating this issue for a while, I realize it is more difficult than I expected :(

Currently I chose an approach that can re-use the current code as much as possible in ArrayLoader, to do that, I use a no-op random access file to record the IO and replay only the necessary read operation later. But I am not certain if this is the best approach for solving this issue, and if this kind of approach doesn't fit, feel free to reject this PR, and please let me know how this should be done and I can give it another try.

Besides passing the unit tests, I verified the IO behavior under Linux manually by watching the file pages loaded in page cache, and it works largely as I expected, and the IO saving varies depending on the specific field to be accessed.

github-actions · 2021-10-20T14:27:02Z

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW

Opening JIRAs ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

niyue · 2021-10-20T14:28:08Z

cpp/src/arrow/ipc/io_recorded_random_access_file.cc

I introduce a IoRecordedRandomAccessFile class which will record the read IO operations performed, and it does nothing but save these read operations as <offset, length> pair in a vector, and it is replayed later to do the real IO.

niyue · 2021-10-20T14:29:15Z

cpp/src/arrow/ipc/message.cc

The recorded read IO operations are replayed here to really reading data from the file into body.

niyue · 2021-10-20T14:30:28Z

cpp/src/arrow/ipc/message.cc

Depending on if included_fields are used in IpcReadOptions, here either fields_loader will be used to load each field's buffers or the entire body will be loaded.

niyue · 2021-10-20T14:34:16Z

cpp/src/arrow/ipc/reader.cc

ArrayLoader is re-used to load fields subset. This is similar logic as the piece for decompressing/constructing each field's array according to included fields, but here it only uses the no-op random access file to load buffer and does nothing else for processing the loaded buffer (the loaded buffer is always null for the no-op random access file)

niyue · 2021-10-20T14:35:13Z

cpp/src/arrow/ipc/reader.cc

The fields_loader is passed as lambda to the bottom layer (message.cc) so that we don't have to duplicate the code of ArrayLoader both in message.cc and reader.cc

niyue · 2021-10-20T14:36:12Z

cpp/src/arrow/ipc/reader.cc

This line of code seem redundant and the context value is shadowed by another local variable below, so I remove this line.

emkornfield · 2021-10-20T18:25:52Z

@niyue Thanks for the PR. it looks like the CI is likely highlighting real issues with the PR, would you mind fixing those?

github-actions · 2021-10-20T22:06:07Z

https://issues.apache.org/jira/browse/ARROW-12683

github-actions · 2021-10-20T22:06:09Z

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

niyue · 2021-10-20T22:07:46Z

@niyue Thanks for the PR. it looks like the CI is likely highlighting real issues with the PR, would you mind fixing those?

Sure. Let me try it out.

niyue · 2021-10-21T01:15:18Z

@emkornfield I pushed a new commit trying to fix the issue reported by CI, but it seems the new running CI job failed because CI failed to download "MinIO.exe" (probably a temporary network issue in CI), how can I trigger the CI again?

niyue · 2021-10-21T11:32:03Z

@emkornfield I think I've fixed the issues reported by CI, could you please help to confirm? Thanks.

emkornfield · 2021-10-21T16:48:12Z

Yes, looks like logic issues are fixed. Still one small style/lint issue:

/arrow/cpp/src/arrow/ipc/io_recorded_random_access_file.h:40:  You don't need a ; after a }  [readability/braces] [4]
/arrow/cpp/src/arrow/ipc/io_recorded_random_access_file.h:82:  Could not find a newline character at the end of the file.  [whitespace/ending_newline] [5]
/arrow/cpp/src/arrow/ipc/io_recorded_random_access_file.cc:18:  Include the directory when naming .h files  [build/include_subdir] [4]

niyue · 2021-10-22T10:34:58Z

@emkornfield thanks. I pushed a new commit to fix the lint issue.

UPDATE: There is still one lint issue, and I've fixed it in commit 23b8a34

westonpace

First, let me apologize for taking so long to get to this, I sort of missed it.
Second, I'm happy this is being looked into, this is definitely something I was hoping we could address as part of 7.0.0.

Generally, I am in approval of this PR. Ideally, I would like a unit test showing that a RecordBatchFileReader truly does not read the entire file (possibly the MockFileSystem could be enhanced to record a total # of bytes read).

IoRecordedRandomAccessFile is a clever way to work around the fact that message.cc does not know about arrays and reader.cc does not know about file I/O. However, it is an odd implementation of io::RandomAccessFile since it doesn't return data. It relies on the fact that the reader is going to make read calls but never look at the data (which, admittedly, should be a pretty safe assumption going forward). I will add that this confusion between reader.cc and message.cc made it difficult to implement the asynchronous version of the record batch file reader (RecordBatchFileReader::GetRecordBatchGenerator, more on that later).

That being said, I am worried about the complexity that is growing here. Reading IPC files should be fairly straightforward. I'm worried that the abstractions in message.cc and reader.cc is causing more work than necessary. Maybe the ArrayLoader can move into message.cc. I don't think this is something we need to tackle in this PR, but maybe we should look at it in the 7.0.0 timeframe still (perhaps as part of ARROW-14429).

If we do go forward with the refactoring then maybe we won't need the complexity of IoRecordedRandomAccessFile.

One last complication. This approach currently does not work for the asynchronous version of the reader. As I mentioned above the abstractions caused some difficulty and there is some duplication. The asynchronous path ends up at IpcFileRecordBatchGenerator::ReadRecordBatch and the messages are all queued up to be read in IpcFileRecordBatchGenerator::operator(). This can be handled in a follow-up. If you want to do that then please create a JIRA ticket for that work so we don't lose track.

westonpace · 2021-10-28T21:00:10Z

@lidavidm Thoughts on the above?

lidavidm · 2021-10-28T21:07:58Z

Broadly I'm in agreement. I like the approach here but it does seem we will want to 'fuse' some of the layers to get the best implementation. The duplication with the asynchronous path is one such issue.

I agree we will want a unit test to ensure the bytes read is as expected. Additionally, another candidate for a follow-up item is to include the I/O coalescer so that we don't suffer on remote filesystems. (This could also be folded into ARROW-14429; another thing we could fold into there is to make ReadRangeCache not hold on to memory for the use case of linearly scanning a file.)

Also, I think IoRecordedRandomAccessFile should be moved into reader.cc - it has fairly specific use and I don't see a reason to expose it more broadly, especially since we aren't subjecting it to unit tests separately.

niyue · 2021-10-29T01:30:52Z

@westonpace

I would like a unit test showing that a RecordBatchFileReader truly does not read the entire file

Sure. Let me see how I can add more tests for it.

Maybe the ArrayLoader can move into message.cc

I considered this approach as well, and I found this will introduce more changes to the existing reader/message APIs and I am not quite sure if this is desirable. It seems to me the current implementation places all concepts about arrow's structures in reader.cc while all flatbuffers structures are kept in the lower layer message.cc file. ArrayLoader involves quite a lot arrow structures, and I am not familiar with some of them, so I try to follow current organization to make it work so far.

This approach currently does not work for the asynchronous version of the reader... This can be handled in a follow-up.
If you want to do that then please create a JIRA ticket for that work so we don't lose track.

I didn't realize this previously since in my project I only use the sync version of the reader. I will look into it later. Since ARROW-12683 is not specific to sync version of the reader, if this PR is accepted, I think probably we can close ARROW-12683 and I will create a JIRA issue to track the async version of the reader enhancement as follow-up. What do you think?

niyue · 2021-10-29T01:33:09Z

@lidavidm

we will want a unit test to ensure the bytes read is as expected

Sure. I will look into it how more unit tests can be added.

Additionally, another candidate for a follow-up item is to include the I/O coalescer so that we don't suffer on remote filesystems.

In my test under Linux, I found Linux will do read ahead IO. In my limited testing, depending on read ahead configuration in Linux, the IO may be 2x than the minimum necessary if the access pattern is random access and the persisted record batch is small. I don't look into how S3FileSystem handles this, but even on local file system, posix_fadvise is desirable to advise operating system the access pattern. Currently, file.cc has some support for POSIX_FADV_WILLNEED, it will be great if other patterns can be supported there, but this is likely another independent area to improve.

I think IoRecordedRandomAccessFile should be moved into reader.cc

No problem. I will move it into reader.cc.

westonpace · 2021-10-29T02:17:12Z

ArrayLoader involves quite a lot arrow structures, and I am not familiar with some of them, so I try to follow current organization to make it work so far.

Ok. That is fine. Thank you for considering.

I think probably we can close ARROW-12683 and I will create a JIRA issue to track the async version of the reader enhancement as follow-up. What do you think?

Sounds great.

In my test under Linux, I found Linux will do read ahead IO...

I did some testing with POSIX_FADV_WILLNEED and didn't ever see much benefit over Linux's builtin readahead.

I don't look into how S3FileSystem handles this

It does not currently handle this. We get pretty poor performance with the IPC reader on S3 because there is no readahead / batching (and there is a high latency per request). Handling this at the filesystem level is an interesting thought. The challenge will be that the filesystem is parallel so we sometimes want to allow multiple reads (instead of queuing and plugging/merging) but the filesystem doesn't know the access pattern. Maybe we can still come up with a good strategy. We have ARROW-14429 for this already so no need to solve this problem right now.

lidavidm · 2021-11-01T21:57:18Z

Sorry, so just to be clear:

I think the issues are because ARROW_FILESYSTEM needs to be ON. You shouldn't need to mess with ARROW_EXPORT or anything (I commented that before seeing the rest of what had been added). However, instead of all that, I think wrapping RandomAccessFile is easier than using the whole filesystem machinery.

niyue · 2021-11-01T23:52:15Z

cpp/src/arrow/ipc/read_write_test.cc

@lidavidm I copy the TrackedRandomAccessFile into this PR, and tracking the read ranges using a vector, since I think the num_reads is just the length of this vector, I remove the read_ member variable in https://github.com/apache/arrow/pull/11535/files#diff-900c46995b5706697d6e4b010f610f1a1cf27d4d865afe48de0a800830ac676bL1708

niyue · 2021-11-01T23:59:19Z

@westonpace @lidavidm Instead of using mockfs, I simplified the unit testing by using the TrackedRandomAccessFile suggested by David, now there is no change for CMake file, and there is no change to the io::BufferReader API. Could you please help to review?

BTW, is there any documentation describing how I can run the clang-format like the CI job? I find my change sometimes breaks CI lint job, but I've no idea what the recommended approach is for running clang format locally for this project.

kou · 2021-11-02T00:03:47Z

BTW, is there any documentation describing how I can run the clang-format like the CI job? I find my change sometimes breaks CI lint job, but I've no idea what the recommended approach is for running clang format locally for this project.

Here is the documentation:

Code Style, Linting, and CI
https://arrow.apache.org/docs/developers/cpp/development.html#code-style-linting-and-ci

westonpace

Thanks for adding the tests. This is looking good to me, minus one nit about advancing the position.

cpp/src/arrow/ipc/reader.cc

westonpace · 2021-11-02T10:49:18Z

I goofed slightly. @lidavidm contacted me externally and pointed out that only Read should advance the position_ (and not ReadAt). I've submitted a PR to your branch that should address this. Feel free to use that or make the change some other way. Sorry for the mixup.

niyue · 2021-11-02T12:54:15Z

I goofed slightly. @lidavidm contacted me externally and pointed out that only Read should advance the position_ (and not ReadAt). I've submitted a PR to your branch that should address this. Feel free to use that or make the change some other way. Sorry for the mixup.

It is me that really should do more research on this topic. Thanks so much for the PR, and I've merged it and squashed the commits into single one, please check it out.

lidavidm

Thanks for doing this. Overall this looks good, I left some feedback on style things.

cpp/src/arrow/ipc/message.cc

cpp/src/arrow/ipc/reader.h

cpp/src/arrow/ipc/message.cc

cpp/src/arrow/ipc/message.h

cpp/src/arrow/ipc/reader.cc

cpp/src/arrow/ipc/read_write_test.cc

lidavidm

Thanks for fixing things!

I left a couple more comments just based on looking at CI.

cpp/src/arrow/ipc/reader_internal.h

cpp/src/arrow/ipc/message.h

westonpace · 2021-11-03T02:41:17Z

Looks like one last CI formatting thing:

/arrow/cpp/src/arrow/ipc/reader_internal.h:84:  Could not find a newline character at the end of the file.  [whitespace/ending_newline] [5]

…ndom access file to record the read ranges and replay only the necessary read operation.

niyue · 2021-11-03T03:03:02Z

Looks like one last CI formatting thing:

/arrow/cpp/src/arrow/ipc/reader_internal.h:84:  Could not find a newline character at the end of the file.  [whitespace/ending_newline] [5]

Fixed. For some reason, this format issue was not reported by the lint program in docker image, I ran it like docker-compose run ubuntu-lint.

westonpace · 2021-11-03T03:07:45Z

Hmm, I'll have to take a look and see what's up there. Some other ways to run lint are ninja lint and archery lint --cpplint. The former requires you to use the ninja generator when you create your build directory. The latter requires you to install archery (which is a pretty helpful tool for a variety of Arrow development tasks).

There is also ninja format and archery lint --cpplint --fix which will apply some formatting changes automatically.

lidavidm · 2021-11-03T12:23:10Z

@niyue do you have a JIRA account? You can register at https://issues.apache.org/jira/secure/Signup!default.jspa. Then let us know your username and we can assign you the ticket and merge.

niyue · 2021-11-03T14:09:39Z

@lidavidm my JIRA account is niyue, thanks.

ursabot · 2021-11-03T14:25:16Z

Benchmark runs are scheduled for baseline = 16af17c and contender = 09b79a1. 09b79a1 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.51% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.22% ⬆️0.0%] ursa-thinkcentre-m75q
Supported benchmarks:
ursa-i9-9960x: langs = Python, R, JavaScript
ursa-thinkcentre-m75q: langs = C++, Java
ec2-t3-xlarge-us-east-2: cloud = True

lidavidm · 2021-11-03T14:25:43Z

Congrats on your first contribution! 🎉

niyue · 2021-11-03T14:53:19Z

@lidavidm
As discussed above, I created ARROW-14577 for tracking the IPC reader async read API on this topic, and I linked it with ARROW-12683.

github-actions bot added the Component: C++ label Oct 20, 2021

niyue commented Oct 20, 2021

View reviewed changes

niyue changed the title ~~Support reading arrow IPC file with fine grained IO~~ ARROW-12683 [C++] Enable fine-grained I/O (coalescing) in IPC reader Oct 20, 2021

niyue force-pushed the feature/fine_grained_io branch from e6f44b2 to 2bcaeb7 Compare October 21, 2021 00:31

niyue force-pushed the feature/fine_grained_io branch from 2bcaeb7 to f21831a Compare October 21, 2021 07:34

niyue force-pushed the feature/fine_grained_io branch from f21831a to 83a2969 Compare October 22, 2021 05:01

emkornfield requested a review from westonpace October 22, 2021 23:29

niyue force-pushed the feature/fine_grained_io branch 2 times, most recently from 23b8a34 to bfe443e Compare October 23, 2021 00:39

westonpace reviewed Oct 28, 2021

View reviewed changes

niyue force-pushed the feature/fine_grained_io branch from bfe443e to 6e8c665 Compare November 1, 2021 13:13

niyue force-pushed the feature/fine_grained_io branch 2 times, most recently from 8b8f157 to 42094f9 Compare November 1, 2021 23:49

niyue commented Nov 1, 2021

View reviewed changes

westonpace reviewed Nov 2, 2021

View reviewed changes

cpp/src/arrow/ipc/reader.cc Outdated Show resolved Hide resolved

niyue force-pushed the feature/fine_grained_io branch 2 times, most recently from fee5df5 to 1e17dd7 Compare November 2, 2021 03:12

niyue force-pushed the feature/fine_grained_io branch from c22d68b to 49cc30c Compare November 2, 2021 12:49

lidavidm reviewed Nov 2, 2021

View reviewed changes

niyue force-pushed the feature/fine_grained_io branch from 49cc30c to 515381f Compare November 2, 2021 14:10

lidavidm reviewed Nov 2, 2021

View reviewed changes

cpp/src/arrow/ipc/reader_internal.h Outdated Show resolved Hide resolved

cpp/src/arrow/ipc/message.h Outdated Show resolved Hide resolved

niyue force-pushed the feature/fine_grained_io branch from 515381f to 7ab7ed5 Compare November 2, 2021 23:02

Support reading arrow IPC file with fine grained IO. Using a no-op ra…

6e7bfbc

…ndom access file to record the read ranges and replay only the necessary read operation.

niyue force-pushed the feature/fine_grained_io branch from 7ab7ed5 to 6e7bfbc Compare November 3, 2021 03:00

lidavidm changed the title ~~ARROW-12683 [C++] Enable fine-grained I/O (coalescing) in IPC reader~~ ARROW-12683: [C++] Enable fine-grained I/O (coalescing) in IPC reader Nov 3, 2021

lidavidm closed this in 09b79a1 Nov 3, 2021

westonpace mentioned this pull request Nov 4, 2021

ARROW-14548: [C++] Add madvise random support for memory mapped file #11588

Closed

asfimport mentioned this pull request Nov 4, 2021

[C++] Enable fine-grained I/O (coalescing) in IPC reader #28430

Closed

ARROW-12683: [C++] Enable fine-grained I/O (coalescing) in IPC reader #11486

ARROW-12683: [C++] Enable fine-grained I/O (coalescing) in IPC reader #11486

Uh oh!

Conversation

niyue commented Oct 20, 2021

Uh oh!

github-actions bot commented Oct 20, 2021

Uh oh!

niyue Oct 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

niyue Oct 20, 2021

Choose a reason for hiding this comment

Uh oh!

niyue Oct 20, 2021

Choose a reason for hiding this comment

Uh oh!

niyue Oct 20, 2021

Choose a reason for hiding this comment

Uh oh!

niyue Oct 20, 2021

Choose a reason for hiding this comment

Uh oh!

niyue Oct 20, 2021

Choose a reason for hiding this comment

Uh oh!

emkornfield commented Oct 20, 2021

Uh oh!

github-actions bot commented Oct 20, 2021

Uh oh!

github-actions bot commented Oct 20, 2021

Uh oh!

niyue commented Oct 20, 2021

Uh oh!

niyue commented Oct 21, 2021

Uh oh!

niyue commented Oct 21, 2021

Uh oh!

emkornfield commented Oct 21, 2021

Uh oh!

niyue commented Oct 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

westonpace left a comment

Choose a reason for hiding this comment

Uh oh!

westonpace commented Oct 28, 2021

Uh oh!

lidavidm commented Oct 28, 2021

Uh oh!

niyue commented Oct 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

niyue commented Oct 29, 2021

Uh oh!

westonpace commented Oct 29, 2021

Uh oh!

lidavidm commented Nov 1, 2021

Uh oh!

niyue Nov 1, 2021

Choose a reason for hiding this comment

Uh oh!

niyue commented Nov 1, 2021

Uh oh!

kou commented Nov 2, 2021

Uh oh!

westonpace left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

westonpace commented Nov 2, 2021

Uh oh!

niyue commented Nov 2, 2021

Uh oh!

lidavidm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

niyue Oct 20, 2021 •

edited

Loading

niyue commented Oct 22, 2021 •

edited

Loading

niyue commented Oct 29, 2021 •

edited

Loading

ursabot commented Nov 3, 2021 •

edited

Loading

niyue commented Nov 3, 2021 •

edited

Loading