Arrow: FIXED type support #3029

nastra · 2021-08-26T07:18:37Z

No description provided.

arrow/src/main/java/org/apache/iceberg/arrow/ArrowSchemaUtil.java

RussellSpitzer · 2021-10-20T16:48:52Z

arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorizedArrowReader.java

            vectorizedColumnIterator.varWidthTypeBatchReader().nextBatch(vec, -1, nullabilityHolder);
            break;
          case FIXED_WIDTH_BINARY:
-            vectorizedColumnIterator.fixedWidthTypeBinaryBatchReader().nextBatch(vec, typeWidth, nullabilityHolder);


What is the difference between these two readers?

the difference is in the way stuff is being read: FixedSizeBinary vs FixedWidthBinary. For the FIXED type we should essentially be creating/using a FixedSizeBinaryVector from Arrow

I don't know the basics here, so I'm confused why case Fixed with binary is read with Fixed Size Binary, also I don't understand the difference between fixed size and width

FIXED_WIDTH_BINARY might have been misleading so I renamed it to FIXED_SIZE_BINARY. It seems that the FixedWidthBinary code path existed as a workaround for Spark as can be seen here. I checked TestParquetVectorizedReads and that seems to be testing the FIXED type with Spark+Vectorization

iceberg/spark/v3.0/spark3/src/test/java/org/apache/iceberg/spark/data/AvroDataTest.java

Line 60 in f3e6770

required(112, "fixed", Types.FixedType.ofLength(7)),

So for clarification what is being added here? Fixed width or fixed size? Or are they the same?

Can you clarify your comment on spark as well: fix width already was (partially) handled and now its fully handled?

rymurr · 2021-11-04T12:58:58Z

arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorizedArrowReader.java

            vectorizedColumnIterator.varWidthTypeBatchReader().nextBatch(vec, -1, nullabilityHolder);
            break;
          case FIXED_WIDTH_BINARY:
-            vectorizedColumnIterator.fixedWidthTypeBinaryBatchReader().nextBatch(vec, typeWidth, nullabilityHolder);


So for clarification what is being added here? Fixed width or fixed size? Or are they the same?

Can you clarify your comment on spark as well: fix width already was (partially) handled and now its fully handled?

rymurr · 2021-11-04T13:01:46Z

arrow/src/main/java/org/apache/iceberg/arrow/vectorized/parquet/VectorizedColumnIterator.java

    }
  }

-  public class FixedWidthTypeBinaryBatchReader extends BatchReader {


Maybe I am daft but it looks like you removed fixed width readers but I don't see where you added any readers?

kbendick · 2021-12-10T05:25:41Z

@nastra does this still need to be reviewed?

Somebody mentioned on slack this week (on Tuesday) that they had issues writing a fixed item as a truncated partition column. So they used Binary.

I’ve been out sick but I’ll gather the details into an issue.

I doubt this will directly solve that but made me think of this PR.

nastra · 2021-12-10T09:19:40Z

@kbendick yes this still needs to be reviewed and at this point TBH I'm uncertain the approach is correct or not because I'm not sure if this statement is still correct (https://github.com/apache/iceberg/pull/3029/files#diff-80bc724de9a4dd358c4544fcf00e00139145c6763a5d0280e0bd0793a0fd4003L366-L368):

Spark does not support fixed width binary data type. To work around this limitation, the data is read as fixed width binary from parquet and stored in a {@link VarBinaryVector} in Arrow.

FWIW the PR itself is not related to the issue that was reported on Slack.

github-actions · 2024-07-19T01:09:13Z

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

github-actions · 2024-07-27T00:13:39Z

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

github-actions bot added the arrow label Aug 26, 2021

nastra force-pushed the arrow-support-fixed branch 2 times, most recently from f4e3295 to bc1b19c Compare August 27, 2021 07:24

nastra requested a review from rymurr August 27, 2021 12:50

nastra force-pushed the arrow-support-fixed branch from bc1b19c to d07341b Compare October 14, 2021 09:46

nastra changed the title ~~Arrow: Add tests for FIXED type support~~ Arrow: FIXED type support Oct 14, 2021

nastra mentioned this pull request Oct 14, 2021

Optimized spark vectorized read parquet decimal #3249

Merged

nastra requested a review from rdblue October 14, 2021 10:45

nastra closed this Oct 14, 2021

nastra reopened this Oct 14, 2021

nastra closed this Oct 14, 2021

nastra reopened this Oct 14, 2021

nastra closed this Oct 14, 2021

nastra reopened this Oct 14, 2021

RussellSpitzer reviewed Oct 20, 2021

View reviewed changes

arrow/src/main/java/org/apache/iceberg/arrow/ArrowSchemaUtil.java Outdated Show resolved Hide resolved

nastra force-pushed the arrow-support-fixed branch from d07341b to 3fbddaf Compare October 20, 2021 16:04

RussellSpitzer reviewed Oct 20, 2021

View reviewed changes

Arrow: FIXED type support

b0ff549

nastra force-pushed the arrow-support-fixed branch from 3fbddaf to b0ff549 Compare October 21, 2021 07:23

rymurr reviewed Nov 4, 2021

View reviewed changes

github-actions bot added the stale label Jul 19, 2024

github-actions bot closed this Jul 27, 2024

pvary mentioned this pull request Jul 31, 2025

Arrow: test case for fixed type #13700

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arrow: FIXED type support #3029

Arrow: FIXED type support #3029

Uh oh!

nastra commented Aug 26, 2021 •

edited

Loading

Uh oh!

Uh oh!

RussellSpitzer Oct 20, 2021

Uh oh!

nastra Oct 20, 2021 •

edited

Loading

Uh oh!

RussellSpitzer Oct 20, 2021

Uh oh!

nastra Oct 21, 2021

Uh oh!

rymurr Nov 4, 2021

Uh oh!

rymurr Nov 4, 2021

Uh oh!

rymurr Nov 4, 2021

Uh oh!

kbendick commented Dec 10, 2021

Uh oh!

nastra commented Dec 10, 2021

Uh oh!

github-actions bot commented Jul 19, 2024

Uh oh!

github-actions bot commented Jul 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Arrow: FIXED type support #3029

Arrow: FIXED type support #3029

Uh oh!

Conversation

nastra commented Aug 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

RussellSpitzer Oct 20, 2021

Choose a reason for hiding this comment

Uh oh!

nastra Oct 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RussellSpitzer Oct 20, 2021

Choose a reason for hiding this comment

Uh oh!

nastra Oct 21, 2021

Choose a reason for hiding this comment

Uh oh!

rymurr Nov 4, 2021

Choose a reason for hiding this comment

Uh oh!

rymurr Nov 4, 2021

Choose a reason for hiding this comment

Uh oh!

rymurr Nov 4, 2021

Choose a reason for hiding this comment

Uh oh!

kbendick commented Dec 10, 2021

Uh oh!

nastra commented Dec 10, 2021

Uh oh!

github-actions bot commented Jul 19, 2024

Uh oh!

github-actions bot commented Jul 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nastra commented Aug 26, 2021 •

edited

Loading

nastra Oct 20, 2021 •

edited

Loading