-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-1866: [Java] Combine MapVector classes and remove NonNullableMapVector #1371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-1866: [Java] Combine MapVector classes and remove NonNullableMapVector #1371
Conversation
|
This is a WIP, still need to make another pass through to clean up. Also, I'm not too sure about the purpose of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any point to have a SingleMapWriter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could not have this as final anymore because it depends on fieldType being set in the constructor. Previously it was set under the super class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now need to make sure the validity buffer is allocated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, it was in NonNullableMapVector so I brought it over. This doesn't seem to be used anywhere and overrides from AbstractContainerVector that returns false. @siddharthteotia do you know what this was used for and if it is still needed?
|
LGTM at high level. @BryanCutler I don't know about SingleMapWriter either. Is that orthogonal or related to this change? |
With this change @siddharthteotia and @jacques-n do you have any objections to removing the |
|
I could also remove the codegen file |
I think we probably still need to codegen the MapWriter class because it codegen methods for different types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason this should be public? This class has getValueCount and setValueCount
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
valueCount was public in NonNullableMapVector but I don't think it should be here so I made it private
|
Removing WIP, I think this should be ok to merge (pending tests) unless we decide to remove |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to do something with the name here? i.e MapReaderImpl and MapWriterImpl ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with the change, but I held off on renaming classes outside of the vectors until we have a consensus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NullableMapTransferPair -> MapTransferPair?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be ok to rename since its a protected inner class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this.mapVector?
|
I looked at @BryanCutler it seems your PR make @siddharthteotia Does Dremio code uses |
|
@BryanCutler , @icexelloss , yes Dremio uses SingleMapWriter/Reader. What is the concern here? |
|
Let us hold off making any changes to NullableMapVector and MapVector. I think this PR should just remove the newly added NonNullableMapVector class in the hierarchy and bring the hierarchy to the state it was before. Meanwhile we can assess the usage of both types of vectors in Dremio and then document the changes (if any) required to these 2 vectors. I don't think there is any harm in keeping these 2 vectors around for the time being. They are not breaking anything. |
There are only two vectors currently - |
|
My suggestion is to keep the old pair of NullableMapVector and MapVector as is and just remove the new NonNullable* class -- basically let's have the MapVector hierarchy back to what it was before with MapVector being the base class and NullableMapVector being the subclass. |
|
@siddharthteotia just to be clear, the end result of this PR is to flatten the 2 map vectors classes into 1 |
|
@siddharthteotia perhaps you can elaborate more on if combining the 2 MapVectors is something you would be ok with in the future and when you might be able to sign off on that? Also, are ok with the renaming to |
|
I thought we've reached consensus about this but maybe not. I have also put up https://docs.google.com/document/d/1n4qjO20wZyS7wSpISgYdIVuD22zstLgP-7gXxS5_7_E/edit?usp=sharing to help discuss vector naming so we can make progress. |
|
Thanks for putting that together @icexelloss , it looks good but I think the question we need to answer first is if we plan to combine |
|
I still don't see why we need to keep nullable and non nullable version of MapVector. I am also waiting on @siddharthteotia for his comment. |
|
My suggestion is to keep the old pair of NullableMapVector and MapVector as is and just remove the new NonNullable* class -- basically let's have the MapVector hierarchy back to what it was before with MapVector being the base class and NullableMapVector being the subclass. Since the implications of combining them are to the reader and writer as well and @BryanCutler proposed to remove them, I would like to assess their usage in Dremio. For example, we have a data structure writer called VectorContainerWriter which has a MapVector inside and a SingleMapWriter to populate the data structure. I am just asking to keep these 2 vectors around as is for the time being. Later on we can combine them and decide what to do with reader/writer. |
|
I did not remove anything outside of the vector hierarchy, I just brought up that we could combine the readers and writers as well. However, doing this is out of scope for what we have discussed so I only did minimal changes required to get things working. Does that alleviate some of your concern? |
|
Seems there are some code in Dremio that is going to be affected by removing non nullable MapVector and SingleMapWriter but not too many: (Hopefully this is up to date) https://github.com/dremio/dremio-oss/search?p=2&q=MapVector&type=&utf8=%E2%9C%93 I don't know what's the best way to proceed. Maybe we can give it a couple of days for @siddharthteotia to evaluate if removing non nullable vectors is OK for Dremio. Any thoughts? |
|
This PR does not remove SingleMapWriter, but since it now uses the nullable
MapVector as the container, it is functionally equivalent to
NullableMapWriter.
…On Nov 30, 2017 11:44 AM, "Li Jin" ***@***.***> wrote:
Seems there are some code in Dremio that is going to be affected by
removing non nullable MapVector and SingleMapWriter but not too many:
(Hopefully this is up to date)
https://github.com/dremio/dremio-oss/search?p=2&q=
MapVector&type=&utf8=%E2%9C%93
https://github.com/dremio/dremio-oss/search?utf8=%E2%9C%
93&q=SingleMapWriter&type=
I don't know what's the best way to proceed. Maybe we can give it a couple
of days for @siddharthteotia <https://github.com/siddharthteotia> to
evaluate if removing non nullable vectors are OK for Dremio.
Any thoughts?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1371 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AEUwdViMkgm6K1rWXuJUVdMDmhMaxmFtks5s7wWbgaJpZM4QuWKm>
.
|
|
@BryanCutler yeah I see your point but @siddharthteotia seems to be saying Dremio uses non-nullable MapVector as well so he wants to take a look. |
|
Yes, I just wanted to be clear that removing the non-nullable MapVector was the only refactoring done here. Hopefully that can help assess the impact. |
|
N.B. Dremio is open source, so others are also free to look at this code: https://github.com/dremio/dremio-oss -- I have been asking for someone to set up integration tests to be able to run Dremio's tests against Arrow master, that would be hugely helpful in situations like these. There might be some unpublished changes to Dremio trunk, though. |
0988161 to
59647f9
Compare
|
+1. I also feel the issue is also we (Bryan and I) don't know enough about Dremio codebase to assess the impact of certain changes to Dremio such as removing non nullable map vectors. I also feel maybe @siddharthteotia is too resource bound on keeping up with master refactoring PRs. I am willing to help make downstream changes to Dremio just as we make downstream changes to Spark but I don't know if https://github.com/dremio/dremio-oss is update to date. |
|
I'm also willing to make downstream changes to Dremio if that would help. I think the underlying issue here is that I (and I believe Li too) was under the assumption that removing the non-nullable MapVector was part of the agenda for the Java refactoring, while @siddharthteotia wants keep as they were before ARROW-1710. Here are the options we have, as I see it, to proceed:
Of course I prefer (1) but I understand if constraints prevent this, it would just be nice to get confirmation if that is the case and if there is disagreement with doing (2). |
|
My suggestion is to go with 3rd option here #1371 (comment) |
|
What do people think of?
@BryanCutler, I think the SingleMapWriter has more differences than Nullability. I think it is specifically used to treat the top of a record as a pseudo-map where needed. We do some fairly complex runtime code generation associated with this structure in Dremio. It relates to these pieces of code [1][2][3] (among others) and it will take some time to really remember & understand the impact of what you're proposing to provide good feedback. [1] https://github.com/dremio/dremio-oss/blob/master/sabot/kernel/src/main/java/org/apache/arrow/vector/complex/FieldIdUtil2.java |
|
In
Structs are nullable in general, so the |
|
Thanks for the details @jacques-n, I am fine with keeping the 2 level hierarchy for now and it seems like we need some more discussion to move forward with |
This merges the two
MapVectorclasses and removes theNonNullableMapVectorclass as a followup to #1341. There is only a nullable version ofMapVectornow.