GH-35498: [C++] Fix source node batch realignment #35541

rtpsw · 2023-05-11T08:06:30Z

Rationale for this change

Currently, source node uses too high a value for realignment, which caused a performance degradation.

What changes are included in this PR?

The source node batch realigns an array to its byte width (if it is more than 1).

Are these changes tested?

The changes are tested for correctness by the existing tests. The performance is checked by regression jobs.

Are there any user-facing changes?

Only performance.

This PR contains a "Critical Fix".

Closes: [C++][Parquet] Parquet write_to_dataset performance regression #35498

github-actions · 2023-05-11T08:06:51Z

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

In the case of PARQUET issues on JIRA the title also supports:

PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

See also:

github-actions · 2023-05-11T08:11:41Z

Closes: [C++][Parquet] Parquet write_to_dataset performance regression #35498

github-actions · 2023-05-11T08:11:44Z

⚠️ GitHub issue #35498 has been automatically assigned in GitHub to PR creator.

jorisvandenbossche · 2023-05-11T10:26:20Z

@ursabot please benchmark

ursabot · 2023-05-11T10:26:33Z

Benchmark runs are scheduled for baseline = 1a038ad and contender = 892e400. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️2.24% ⬆️0.44%] test-mac-arm
[Failed] ursa-i9-9960x
[Finished ⬇️0.87% ⬆️0.12%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 892e4002 ec2-t3-xlarge-us-east-2
[Finished] 892e4002 test-mac-arm
[Failed] 892e4002 ursa-i9-9960x
[Finished] 892e4002 ursa-thinkcentre-m75q
[Finished] 1a038ada ec2-t3-xlarge-us-east-2
[Finished] 1a038ada test-mac-arm
[Failed] 1a038ada ursa-i9-9960x
[Finished] 1a038ada ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

rtpsw · 2023-05-11T10:33:18Z

I see a bunch of CI job (here, here, here and here) are failing due to OOM, presumably on realignment, but the allocation sizes are not large. More concerning is this failure, which shows an invalid alignment of 3. Presumably, the data type of the value being realigned is some kind of struct. @westonpace, what are your thoughts? Perhaps there is an API (or idiom) to get the byte width of the largest field of the data type, which I think would be the correct alignment in such a case?

rtpsw · 2023-05-11T12:55:52Z

There is one CI job with unexpected results, which appear to be unrelated to this PR, but are worth noting. There is a second CI job which segfaults, and it's not clear whether they are related. @westonpace, have you seen these segfaults before?

westonpace · 2023-05-11T13:45:16Z

Structs do not have any buffers themselves beyond the validity buffer. The validity buffer will have a bit width of 1. I don't think we need to worry about aligning any buffers with a width of 8 or less (e.g. only worry about it once the width is 16)

One problem with this PR is that it doesn't address offsets buffers. For example, in a string / list / binary array there is a 32-bit offsets buffer which should be aligned to 32. These arrays won't show up as fixed width arrays.

I was working on this a bit yesterday as well. I ended up with something like this...

bool CheckMallocAlignment(const ArrayData& array) {
  auto type_id = array.type->storage_id();
  if (type_id == Type::DICTIONARY) {
    // The values array will be checked separately
    type_id = ::arrow::internal::checked_pointer_cast<DictionaryType>(array.type)
                  ->index_type()
                  ->id();
  }
  switch (array.type->id()) {
    case Type::NA:
    case Type::FIXED_SIZE_LIST:
    case Type::FIXED_SIZE_BINARY:
    case Type::BOOL:
    case Type::INT8:
    case Type::UINT8:
    case Type::DECIMAL128:
    case Type::DECIMAL256:
    case Type::SPARSE_UNION:
    case Type::RUN_END_ENCODED:
    case Type::STRUCT:
      // These have no buffers or all buffers need only byte alignment
      return true;
    case Type::INT16:
    case Type::UINT16:
    case Type::HALF_FLOAT:
      return CheckValuesAlignment(array, 16);
    case Type::INT32:
    case Type::UINT32:
    case Type::FLOAT:
    case Type::STRING:
    case Type::BINARY:
    case Type::DATE32:
    case Type::TIME32:
    case Type::LIST:
    case Type::MAP:
    case Type::DENSE_UNION:
    case Type::INTERVAL_MONTHS:
      return CheckValuesAlignment(array, 32);
    case Type::INT64:
    case Type::UINT64:
    case Type::DOUBLE:
    case Type::LARGE_BINARY:
    case Type::LARGE_LIST:
    case Type::LARGE_STRING:
    case Type::DATE64:
    case Type::TIME64:
    case Type::TIMESTAMP:
    case Type::DURATION:
    case Type::INTERVAL_DAY_TIME:
      return CheckValuesAlignment(array, 64);
    case Type::INTERVAL_MONTH_DAY_NANO:
      return CheckValuesAlignment(array, 128);
    default:
      return true;
  }
}

It's not pretty. I also tried using a visitor pattern but it was ending up to be more complicated (though arguably a bit more future proof were we to add more fixed-width types). CheckValuesAlignment checks the buffer at index 1. Fortunately, it seems that in all cases, the buffer that needs alignment seems to be at that position.

rtpsw · 2023-05-11T14:59:44Z

I was thinking along the lines of your work; you're clearly ahead of me on this.

I don't think we need to worry about aligning any buffers with a width of 8 or less

We do. In failures I observed internally, the cause was misalignment of buffer addresses (which were even numerically odd) that were expected to be 4- or 8-byte aligned.

I think the switch-statement should be refactored into type_traits.h, if it ends up as part of the solution.

I think the correct idea here is that alignment applies to buffers, not arrays. That is, the format of the buffer is what determines the required alignment, if any, and I guess this format is determined by the buffer index within the array and the array's storage type.

@westonpace, should you or I take this forward?

westonpace · 2023-05-11T17:27:34Z

We do. In failures I observed internally, the cause was misalignment of buffer addresses (which were even numerically odd) that were expected to be 4- or 8-byte aligned.

Sorry, I meant 8bits or less (e.g. we can safely assume that everything has at least 1 byte of alignment)

I think the correct idea here is that alignment applies to buffers, not arrays. That is, the format of the buffer is what determines the required alignment, if any, and I guess this format is determined by the buffer index within the array and the array's storage type.

Yes, unfortunately.

@westonpace, should you or I take this forward?

I think I should have time to make a PR today. Sorry to steal a task, I should've marked yesterday that I was working on this.

rtpsw · 2023-05-11T18:41:52Z

Sorry, I meant 8bits or less (e.g. we can safely assume that everything has at least 1 byte of alignment)

Agreed. I included this condition in my recent commit here.

I think the correct idea here is that alignment applies to buffers, not arrays.

Yes, unfortunately.

I'd suggest considering adding something like DataType::GetBufferAlignment(int index) to return the required alignment of buffer at index that must be a power-of-2 or otherwise be 0 for when no alignment is required. This would force any future data type to provide this info.

Sorry to steal a task

OK, you owe me one :)

rtpsw · 2023-05-11T18:45:27Z

Also, consider using {Check/Ensure}Alignment(Buffer, int64_T) from util/align_util.cc.

ursabot · 2023-05-11T19:26:05Z

['Python', 'R'] benchmarks have high level of regressions.
test-mac-arm

pitrou · 2023-05-17T15:16:42Z

Is this obsoleted by #35565?

rtpsw · 2023-05-17T15:44:52Z

Is this obsoleted by #35565?

Yes. Closing this.

Fix source node batch realignment

bee1bb0

github-actions bot added Component: C++ awaiting review Awaiting review labels May 11, 2023

rtpsw mentioned this pull request May 11, 2023

[C++][Parquet] Parquet write_to_dataset performance regression #35498

Closed

apacheGH-35498: [C++] Fix source node batch realignment

892e400

rtpsw changed the title ~~Fix source node batch realignment~~ GH-35498: [C++] Fix source node batch realignment May 11, 2023

realign to power-of-2 only

f46d304

rtpsw closed this May 17, 2023

rtpsw deleted the GH-35498 branch May 25, 2023 14:27

GH-35498: [C++] Fix source node batch realignment #35541

GH-35498: [C++] Fix source node batch realignment #35541

Uh oh!

Conversation

rtpsw commented May 11, 2023 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

github-actions bot commented May 11, 2023

Uh oh!

github-actions bot commented May 11, 2023

Uh oh!

github-actions bot commented May 11, 2023

Uh oh!

jorisvandenbossche commented May 11, 2023

Uh oh!

ursabot commented May 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rtpsw commented May 11, 2023

Uh oh!

rtpsw commented May 11, 2023

Uh oh!

westonpace commented May 11, 2023

Uh oh!

rtpsw commented May 11, 2023

Uh oh!

westonpace commented May 11, 2023

Uh oh!

rtpsw commented May 11, 2023

Uh oh!

rtpsw commented May 11, 2023

Uh oh!

ursabot commented May 11, 2023

Uh oh!

pitrou commented May 17, 2023

Uh oh!

rtpsw commented May 17, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

rtpsw commented May 11, 2023 •

edited by github-actions bot

Loading

ursabot commented May 11, 2023 •

edited

Loading