Skip to content

Conversation

@emkornfield
Copy link
Contributor

@emkornfield emkornfield commented Apr 20, 2020

This change does the following:

  • Vectorizes computation for validity bitmaps with no repeated parents for all little-endian architectures
  • Vectorizes computation for validity bitmaps with repeated parents for AVX2 + architectures (really it requires BMI2)
  • Exposes some building blocks to do vectorized computation for all validity bitmaps in for parquet level data (more need to be handled for generating appropriate bitmaps and offset data for lists). These will be added level_conversions.h in a future PR.
  • Replaces loops over bitmaps with SetBitsTo
  • Leaves a fallback for non BMI2/little-endian capable machines.

With AVX2 enabled this seems to improve benchmarks by 20% for nullable columns (BM_Read..) on my box. I didn't see any impacts in other benchmarkmarks.

See checklist for what is remaining. If possible i'd like to get early feedback on naming and my approach to checking for BMI2.

Still needed:

  • Still adding more focused unit tests for what I've added.
  • Move the changes to Bitmap::ToString to there own PR.

@github-actions
Copy link

@emkornfield emkornfield force-pushed the ARROW-8413 branch 2 times, most recently from 39280c5 to fe2f609 Compare April 20, 2020 03:25
@wesm wesm changed the title ARROW-8413: [C++][WIP] Refactor Generating validity bitmap for values column ARROW-8413: [C++][Parquet][WIP] Refactor Generating validity bitmap for values column Apr 20, 2020
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indentation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Comment on lines 158 to 161
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: AVx2 -> AVX2, o detect -> to detect, BIM2 -> BMI2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the comment as it was stale.

@emkornfield
Copy link
Contributor Author

emkornfield commented Apr 23, 2020

CC @wesm @pitrou I think this is ready for review now. Will take a closer look at CI failures tomorrow.

@emkornfield emkornfield changed the title ARROW-8413: [C++][Parquet][WIP] Refactor Generating validity bitmap for values column ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column Apr 23, 2020
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't comment on the correctness of the levels handling (someone competent will have to review). I would like to see some code cleanup and organization changes, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem useful :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be removed now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to move the low-level handling of levels to a dedicated C++ file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved all everything in the anonymous namespace to level_conversion.cc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this prevents inlining of the Scalar version which was happening before, but I'm not sure that is a big deal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"instructions"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"corresponding"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense for this to be a helper function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree now that the AppendWord is moved to FirstTimeBitmapWriter

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then please only define this function if ARROW_HAVE_BMI2 is defined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, a more specific name (such as BitmapToString).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then just don't define the test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: max_repitition_level -> max_repetition_level

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Comment on lines 175 to 176
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to move these two lines before line 186?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just 1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this path is executed only when #if defined(ARROW_HAVE_BMI2) is true, it would be good to add such as code

#if defined(ARROW_HAVE_BMI2)
...
#else
  assert(false && "must not execute this without BMI2");
#endif

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Member

@kiszk kiszk Apr 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: Most of these tests fail on big-endian platform

[  FAILED  ] TestAppendBitmap.TestOffsetOverwritesCorrectBitsOnExistingByte
[  FAILED  ] TestAppendBitmap.TestOffsetShiftBitsCorrectly
[  FAILED  ] TestAppendBitmap.AllBytesAreWrittenWithEnoughSpace
[  FAILED  ] TestAppendBitmap.OnlyApproriateBytesWrittenWhenLessThen8BytesAvailable
[  FAILED  ] TestAppendToValidityBitmap.BasicOperation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a substantial refactoring, I tried to provide guards but without a big-endian machine in CI it will be very hard to to catch all of these issues.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to use union? bytes[] is used only at line 106.
It looks simple to explicitly extract the lowest 8bit instead of using union.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally thought I would have to do more. I removed the union.

@emkornfield
Copy link
Contributor Author

@pitrou I think I addressed your comments. One of them that went stale was the complexity for "AppendWord", I tried to remove parts that did not seem to affect performance on benchmarks for parquet column reading but I spent some time trying to maximize word level parallelism for the unaligned case, because I think at least for repeated fields I expect this case to be fairly common.

As a point of comparison using the AppendWord implementation I put in place for BigEndian shows much smaller improvements. Really this should use CopyBitmap but I didn't want to start moving a move code then I needed to in bit_util.h for this PR. I opened
a JIRA (ARROW-8595) to track some improvements in that regard.

If more comments are needed or you think there is a cleaner way of writing the code, I'm happy for input.

@wesm do you want to look at the parquet specific logic/comments to make sure I captured them correctly?

@wesm
Copy link
Member

wesm commented Apr 27, 2020

I'll try to have a closer look tomorrow or Tuesday

@emkornfield
Copy link
Contributor Author

@wesm wanted to make sure this is still on your radar

@wesm
Copy link
Member

wesm commented Apr 30, 2020

Yep sorry thanks for the nudge, will look today

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a high level this looks good.

One problem with introducing more SIMD code is that we won't yet have a runtime dispatching strategy. We will need to go through all of our SIMD accelerations in this library and refactor things so that we can build a "fat" binary that includes both AVX2/BMI2-accelerated versions and the non-SIMD versions. That way we can reap the benefits of this work in portable packages like Python wheels / conda packages.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is a bit too much bit manipulation for me to carefully scrutinize but I trust you. There are some typos in the comments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, if there is anything I can add for documentation please let me know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cryptic. Can you rewrite it in a more understandable way? For example (untested):

if (bit_mask_ == 0x1) {
  current_byte_ = 0;
} else {
  current_byte_ = *(append_position + bytes_for_word - 1);
}

@wesm
Copy link
Member

wesm commented Apr 30, 2020

I think this is fine to merge once most of the typos in the comments are fixed. A rebase will probably fix the Rust lint error

Copy link
Contributor Author

@emkornfield emkornfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebased.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, if there is anything I can add for documentation please let me know.

@emkornfield
Copy link
Contributor Author

@pitrou I think the C GLib build failure looks unrelated to me. I fixed the s390x build.

@emkornfield
Copy link
Contributor Author

@wesm I expect your PR to get merged first but if it doesn't consider using PopCount methods in this one.

@wesm
Copy link
Member

wesm commented Jun 4, 2020

+1. I'm going to go ahead and merge this and then I'll rebase #7352

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants