Skip to content

Conversation

@emkornfield
Copy link
Contributor

@emkornfield emkornfield commented Sep 10, 2020

This adds helper methods for reconstructing all necessary metadata
for arrow types. For now this doesn't handle null_slot_usage (i.e.
children of FixedSizeList), it throws exceptions when nulls are
encountered in this case.

The unit tests demonstrate how to use the helper methods in combination
with LevelInfo (generated from parquet/arrow/schema.h) to reconstruct
the metadata.

  • Refactors necessary APIs to use LevelInfo and makes use of them in
    column_reader
  • Adds implementation for reconstructing list validity bitmaps using rep/def levels..
  • Adds implementation for reconstruction list lengths (for rep/dev levels)
  • Adds dynamic dispatch for level comparison algorithms for AVX2 and BMI2.
  • Adds a pextract alternative that uses BitRunReader that can be
    used as a fallback.

This adds helper methods for reconstructing all necessary metadata
for arrow types.  For now this doesn't handle null_slot_usage (i.e.
children of FixedSizeList), it throws exceptions when nulls are
encountered in this case.

The unit tests demonstrate how to use the helper methods in combination
with LevelInfo (generated from parquet/arrow/schema.h) to reconstruct
the metadata.

- Refactors necessary APIs to use LevelInfo and makes use of them in
  column_reader
- Adds implementations for reconstructing list validity bitmaps
  (one uses rep/def levels.  one uses greater then bitmap generated
   from rep/def levels).
- Adds implementations for reconstruction list lengths
 (one uses rep/def levels.  one uses greater then bitmap generated
   from rep/def levels).
- Adds dynamic dispatch for level comparison algorithms for AVX2
  and BMI2.
- Adds a pextract alternative that uses BitRunReader that can be
  used as a fallback.
@emkornfield emkornfield requested a review from pitrou September 10, 2020 05:23
@github-actions
Copy link

return ::arrow::BitUtil::ToLittleEndian(mask);
}

/// Builds a bitmap where each set bit indicates the corresponding level is greater
Copy link
Member

@pitrou pitrou Sep 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why you're doing this. Integer comparisons are cheap, while writing and reading bitmaps is expensive. Why go through an intermediate bitmap (and then spend some time trying to optimize it?)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is rearranging code from a prior which on my box showed 20% end-to-end parquet to arrow read performance on existing benchmarks, indicating that while cheap the comparison take a considerable amount of time in decoding.

Why generalize in this way?
The generation of bitmaps can be made cheaper than it currently is. Right now parquet RLE encodes rep/def levels. We fully decode these levels into int16_t and then try to reconstruct nested metadata from them. For places where there are actually runs, the bitmaps become much less expensive to generate (its a simple mask of N-bits). For cases when max def/rep-level is one, I don't think comparison would even be necessary for non-RLE encoded data (the data is already in bitmap form).

Working at the bitmaps level allows taking advantage of bit-level parallelism at the cost of doing extra for each column (The scalar version of the algorithm I include actually also does the same extra work when compared with the current reconstruction algorithm).

It is also worth pointing out there are roughly three cases to consider:

  1. No-nested lists (in which case using this method will directly generate validity bitmaps and be a strict performance win).
  2. Single list (when BMI2 is usable I believe this approach will also be faster than any other approach, I'm a little bit less confident when BMI2 isn't usable, but I would expect at least equal performance).
  3. Nested lists. At a certain point using something similar to the batch approach will start to win versus the bitmap approach. This could be as early as 2 lists, but I would guess it is likely >= 3 lists.

@pitrou
Copy link
Member

pitrou commented Sep 10, 2020

Hmm... I don't know if this is easy to tear apart, but I'd appreciate if the PR concentrated on the functionality, and micro-optimizations (SIMD, BMI or otherwise) separated into later PRs. Would that be reasonably doable?

@emkornfield
Copy link
Contributor Author

Hmm... I don't know if this is easy to tear apart, but I'd appreciate if the PR concentrated on the functionality, and micro-optimizations (SIMD, BMI or otherwise) separated into later PRs. Would that be reasonably doable?

How about I remove the net new bitmap related code? The SIMD/BMI code was already present hidden behind compiler flags instead of runtime dispatched. The runtime dispatch is cleaning up technical debt to allow users to make use of these in prepackaged binaries (the PR that introduced them showed a 20% improvement on our end to end parquet->Arrow reading benchmarks).

@emkornfield
Copy link
Contributor Author

@pitrou I've removed the code for reconstructing list information based on bitmaps, I would appreciate keeping the rest of the in this PR.

@emkornfield
Copy link
Contributor Author

Closing in favor of: #8177

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants