Skip to content

Conversation

@elahrvivaz
Copy link
Contributor

@elahrvivaz elahrvivaz commented Mar 29, 2017

Currently only added minor type for 2-tuples

@@ -47,6 +47,7 @@ private static void writeValue(FieldReader reader, FieldWriter writer) {
switch (mt) {

case LIST:
case TUPLE_2:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some way to make the particular size not a special case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I could figure out... I think this goes back to the problem with the java minor types - I believe @julienledem was planning to remove them eventually.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this partially or fully addressed in #409? Can you take a look?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think #409 would fix it - I'll wait until that is merged and rebase

@wesm
Copy link
Member

wesm commented Apr 3, 2017

It's safe to rebase now -- I also renamed FixedWidthBinary to FixedSizeBinary for better conformity with this

@elahrvivaz elahrvivaz force-pushed the ARROW-725 branch 2 times, most recently from f6aa6f1 to f42dac9 Compare April 4, 2017 14:46
@elahrvivaz
Copy link
Contributor Author

thanks, I've rebased

@Override
public ListWriter list(int size) {
fail("FixedSizeList");
return null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's hold off adding the fixed width types to the writer APIs.
One of the goals of those APIs is to be able to ingest arbitrary data (for example JSON) and infer the schema on the fly (hence the PromotableWriter promoting vectors to union if the type is not homogenous). So it won't work well with fixed width list since nothing prevents having more than one size in this api.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't add it to the apis, how would we write fixed size lists?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, arbitrary data should still work fine and default to regular lists - you have to specifically set up the fixed size list if you want to use it. If you know you have a fixed size list, you can use that optimization, otherwise you can just use regular lists that will work the same with slightly more overhead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • you can write to the vector directly through its mutator. The Writer api is a facility to write in a row-oriented fashion to the vectors.
  • How does this interact with the PromotableWriter? what happens if we write both fixed lists and variable size lists to the same field?

* @return
*/
abstract protected FieldWriter getWriter(MinorType type);
abstract protected FieldWriter getWriter(ArrowType type);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 things:

  • this kind of change should be in its own pull request.
  • For performance reason we may not want to do it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only performance penalty is comparing an ArrowType instance instead of an enum compare - seems pretty minor.
Would you like me to split out into a separate PR?
MinorType is still used extensively in the writer framework - I think we ultimately need to pull it out since it doesn't capture the full arrow type. I noticed that the writers don't really handle decimal types, for that reason.
The other alternative is to create minor types for various fixed size lists - that would work for me, as I really only need 2-tuples at the moment. But it isn't especially flexible and @wesm asked for a more generic implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that we need a more flexible solution. A Minor type for every list size is very inconvenient.
Regarding perf: There are a few places were we convert between minorType and arrowType each way. We also use ArrowType.equals instead of == between MinorTypes in getWriter(ArrowType) which is called in startList() which is called for each list creation. Just saying that we need to actually measure the consequence of this. I'm not trying to make you do more work than necessary. Separating the refactoring from the new type and vector allows to get the later merged while we finalize the former.

@elahrvivaz
Copy link
Contributor Author

@julienledem do you have any more comments, or what do I need to do to get this approved?

Copy link
Member

@julienledem julienledem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good but we need to figure out the change to getWriter(ArrowType). Possibly we can have more than one method (getFixedWidthListWriter...) to avoid creating objects and calling equals. The UnionWriter can be limited to just arbitrary json writing type use case.

@Override
public ListWriter list(int size) {
fail("FixedSizeList");
return null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • you can write to the vector directly through its mutator. The Writer api is a facility to write in a row-oriented fashion to the vectors.
  • How does this interact with the PromotableWriter? what happens if we write both fixed lists and variable size lists to the same field?

} else if (!Objects.equals(type, this.type)) {
promoteToUnion();
((UnionWriter)writer).getWriter(type);
((UnionWriter) writer).getWriter(Types.getMinorTypeForArrowType(type));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a discrepancy here. We compare ArrowType but then we create the writer based on MinorType which the two types may have the same even if they are not equal (Decimal or FixedWidthList for example).
It looks like if we start writing fixed width lists and then we write a different width we will promote but write the different width lists to the old fixed width vector. Possibly we should fail in this case.

* @return
*/
abstract protected FieldWriter getWriter(MinorType type);
abstract protected FieldWriter getWriter(ArrowType type);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that we need a more flexible solution. A Minor type for every list size is very inconvenient.
Regarding perf: There are a few places were we convert between minorType and arrowType each way. We also use ArrowType.equals instead of == between MinorTypes in getWriter(ArrowType) which is called in startList() which is called for each list creation. Just saying that we need to actually measure the consequence of this. I'm not trying to make you do more work than necessary. Separating the refactoring from the new type and vector allows to get the later merged while we finalize the former.

@Override
public ListWriter list(int size) {
// prime the writer with the fixed size list
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't really need to got Through ArrowType nor UnionType to call getWriter(...).
We could have a getFixedWidthList(size) in writer.

protected FieldWriter getWriter(ArrowType type) {
if (state == State.UNION) {
((UnionWriter)writer).getWriter(type);
((UnionWriter) writer).getWriter(Types.getMinorTypeForArrowType(type));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this work for fixed width lists?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UnionWriter doesn't work for Decimal it looks like (https://github.com/apache/arrow/blob/master/java/vector/src/main/codegen/templates/UnionWriter.java#L124)
It's fine not to support FixedWidthList in union either. (at least for now)


public FieldWriter createNewFieldWriter(ValueVector vector) {
MinorType minorType = Types.getMinorTypeForArrowType(type);
return minorType.getNewFieldWriter(vector);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be done from the type in the vector?

@elahrvivaz
Copy link
Contributor Author

@julienledem I will revert the writer changes and just add some unit tests that show how to use the mutators directly.

@elahrvivaz
Copy link
Contributor Author

reverted - ready for another review

@wesm
Copy link
Member

wesm commented Apr 13, 2017

@julienledem @elahrvivaz I'll be happy to provide a C++ implementation whenever this goes in, then we can set up integration tests. Let me know

Copy link
Member

@julienledem julienledem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
@jacques-n ?

@julienledem
Copy link
Member

Thank you @elahrvivaz !

Copy link
Contributor

@jacques-n jacques-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there is one small spurious import reorder. Otherwise, lgtm.

import java.util.Collections;
import java.util.List;

import com.google.common.collect.ImmutableList;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reordering on purpose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's just how my ide (intellij) sorts imports alphabetically. I'll revert.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted

@elahrvivaz
Copy link
Contributor Author

hold off merging - I need to add reAlloc to FixeSizeListVector now that #534 is merged

@wesm
Copy link
Member

wesm commented Apr 14, 2017

Sounds good

@elahrvivaz
Copy link
Contributor Author

updated and rebased - should be good now

@wesm
Copy link
Member

wesm commented Apr 15, 2017

+1. This is great, thanks @elahrvivaz!

@asfgit asfgit closed this in 0f9c88f Apr 15, 2017
@elahrvivaz elahrvivaz deleted the ARROW-725 branch April 21, 2017 15:38
jeffknupp pushed a commit to jeffknupp/arrow that referenced this pull request Jun 3, 2017
~Currently only added minor type for 2-tuples~

Author: Emilio Lahr-Vivaz <elahrvivaz@ccri.com>

Closes apache#452 from elahrvivaz/ARROW-725 and squashes the following commits:

b139d3d [Emilio Lahr-Vivaz] adding reAlloc to FixedSizeListVector
229e24a [Emilio Lahr-Vivaz] re-ordering imports
594c0a2 [Emilio Lahr-Vivaz] simplifying writing of list vectors through mutator
7cb2324 [Emilio Lahr-Vivaz] reverting writer changes, adding examples of writing fixed size list using vector mutators
756dc8a [Emilio Lahr-Vivaz] ARROW-725: [Formats/Java] FixedSizeList message and java implementation
wesm pushed a commit to wesm/arrow that referenced this pull request Sep 8, 2018
Author: Deepak Majeti <deepak.majeti@hpe.com>

Closes apache#452 from majetideepak/PARQUET-1265 and squashes the following commits:

2972197 [Deepak Majeti] fix initialization error
ab07a47 [Deepak Majeti] Simplify construction of static variables
b23d3b3 [Deepak Majeti] PARQUET-1265: Segfault on static ApplicationVersion initialization

Change-Id: I012fc78dd90686309bc1f9feac446c9324e931f5
pribor pushed a commit to GlobalWebIndex/arrow that referenced this pull request Oct 24, 2025
~Currently only added minor type for 2-tuples~

Author: Emilio Lahr-Vivaz <elahrvivaz@ccri.com>

Closes apache#452 from elahrvivaz/ARROW-725 and squashes the following commits:

b139d3d [Emilio Lahr-Vivaz] adding reAlloc to FixedSizeListVector
229e24a [Emilio Lahr-Vivaz] re-ordering imports
594c0a2 [Emilio Lahr-Vivaz] simplifying writing of list vectors through mutator
7cb2324 [Emilio Lahr-Vivaz] reverting writer changes, adding examples of writing fixed size list using vector mutators
756dc8a [Emilio Lahr-Vivaz] ARROW-725: [Formats/Java] FixedSizeList message and java implementation
pribor pushed a commit to GlobalWebIndex/arrow that referenced this pull request Oct 24, 2025
Update latest Java version used by Github workflows to Java 23.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants