-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-8948: [Java][Integration] enable duplicate field names integration tests #7289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -80,8 +80,12 @@ variable are set, the system property takes precedence. | |
|
|
||
| ## Java Properties | ||
|
|
||
| For java 9 or later, should set "-Dio.netty.tryReflectionSetAccessible=true". | ||
| * For java 9 or later, should set "-Dio.netty.tryReflectionSetAccessible=true". | ||
| This fixes `java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.(long, int) not available`. thrown by netty. | ||
| * To support duplicate fields in a `StructVector` enable "-Darrow.struct.conflict.policy=CONFLICT_APPEND". | ||
|
||
| Duplicate fields are ignored (`CONFLICT_REPLACE`) by default and overwritten. To support different policies for | ||
| conflicting or duplicate fields set this JVM flag or use the correct static constructor methods for `StructVector`s. | ||
|
|
||
| ## Java Code Style Guide | ||
|
|
||
| Arrow Java follows the Google style guide [here][3] with the following | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -19,7 +19,7 @@ | |
|
|
||
| import java.util.ArrayList; | ||
| import java.util.Arrays; | ||
| import java.util.HashMap; | ||
| import java.util.LinkedHashMap; | ||
| import java.util.List; | ||
| import java.util.Map; | ||
| import java.util.stream.Collectors; | ||
|
|
@@ -54,7 +54,7 @@ public class VectorSchemaRoot implements AutoCloseable { | |
| private Schema schema; | ||
| private int rowCount; | ||
| private final List<FieldVector> fieldVectors; | ||
| private final Map<String, FieldVector> fieldVectorsMap = new HashMap<>(); | ||
| private final Map<Field, FieldVector> fieldVectorsMap = new LinkedHashMap<>(); | ||
|
||
|
|
||
|
|
||
| /** | ||
|
|
@@ -113,7 +113,7 @@ public VectorSchemaRoot(Schema schema, List<FieldVector> fieldVectors, int rowCo | |
| for (int i = 0; i < schema.getFields().size(); ++i) { | ||
| Field field = schema.getFields().get(i); | ||
| FieldVector vector = fieldVectors.get(i); | ||
| fieldVectorsMap.put(field.getName(), vector); | ||
| fieldVectorsMap.put(field, vector); | ||
| } | ||
| } | ||
|
|
||
|
|
@@ -163,8 +163,22 @@ public List<FieldVector> getFieldVectors() { | |
| return fieldVectors.stream().collect(Collectors.toList()); | ||
| } | ||
|
|
||
| /** | ||
| * gets a vector by name. | ||
| * | ||
| * if name occurs multiple times this returns the first inserted entry for name | ||
| */ | ||
| public FieldVector getVector(String name) { | ||
|
||
| return fieldVectorsMap.get(name); | ||
| for (Map.Entry<Field, FieldVector> entry: fieldVectorsMap.entrySet()) { | ||
| if (entry.getKey().getName().equals(name)) { | ||
| return entry.getValue(); | ||
| } | ||
| } | ||
| return null; | ||
| } | ||
|
|
||
| public FieldVector getVector(Field field) { | ||
| return fieldVectorsMap.get(field); | ||
| } | ||
|
|
||
| public FieldVector getVector(int index) { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm not mistaken - this means the "correct" or "compatible" behavior is opt-in via a JVM flag? Are these flags clearly documented somewhere, I think we have a few others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @lidavidm, yes the 'compatible' behaviour is opt-in. This may be controversial but through a mix of opinionated, ignorant and lazy I don't understand the value of duplicate names in a struct. Given the scope of implementing such a change fully and the unintended consequences downstream I have opted to give the library user the option to be compatible with the c++ IPC or maintain backwards compatibility with Java. I am happy to hear what the community thinks, especially if this approach is seen as too heavy handed.
This flag wasn't document anywhere so I have added a note to the 'Java Properties' section in the Java README.md.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think as long as it's easily controllable in code (so a single application can work with both behaviors) and well-documented, that should be OK. I dislike only having global flags, but that's not the case here. (And having global flags can be useful to tweak the behavior of an application that's otherwise agnostic to the default behavior.)