Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,6 @@
import org.apache.arrow.vector.VarBinaryVector;
import org.apache.arrow.vector.VarCharVector;
import org.apache.arrow.vector.VectorSchemaRoot;
import org.apache.arrow.vector.complex.NullableMapVector;
import org.apache.arrow.vector.dictionary.Dictionary;
import org.apache.arrow.vector.dictionary.DictionaryProvider;
import org.apache.arrow.vector.schema.ArrowVectorType;
Expand Down Expand Up @@ -217,6 +216,11 @@ public VectorSchemaRoot read() throws IOException {
}
}

/*
* TODO: This method doesn't load some vectors correctly. For instance, it doesn't initialize
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can open a follow up Jira for this.

* `lastSet` in ListVector, VarCharVector, NullableVarBinaryVector A better way of implementing
* this function is to use `loadFieldBuffers` methods in FieldVector.
*/
private void readVector(Field field, FieldVector vector) throws JsonParseException, IOException {
List<ArrowVectorType> vectorTypes = field.getTypeLayout().getVectorTypes();
List<BufferBacked> fieldInnerVectors = vector.getFieldInnerVectors();
Expand All @@ -231,6 +235,8 @@ private void readVector(Field field, FieldVector vector) throws JsonParseExcepti
throw new IllegalArgumentException("Expected field " + field.getName() + " but got " + name);
}
int count = readNextField("count", Integer.class);
vector.allocateNew();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still not very familiar with when to call the allocateNew() method. Why is it necessary to set the value count?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allocateNew allocates the buffer for the vector.
setValueCount sets the number of values in the vector.

We need to call setValueCount to correctly create the vector

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should set lastSet before setting the value count else setValueCount() will corrupt the vector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately it's hard to do it here unless we do some crazy instance matching here, I don't want to do that because it's too hard to maintain.

The correct way I think is to use the proper loadFieldBuffers in this class which initialize the vectors correctly and set things like lastSet( The TODO above)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, sounds good to me. I think we need lastSet only for VarChar, VarBinary and ListVector?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not too sure, but at least, + their Nullable version.

vector.getMutator().setValueCount(count);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it work to do the following?

  1. vector.setInitialCapacity(count)
  2. vector.allocateNew()
  3. after all inner vectors are read, then vector.getMutator().setValueCount(count)

Doing this, you should be able to clean up all those similar calls in the inner vector loop too. I'm also used to setValueCount being called after values are populated, not sure if that makes a difference though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, sorry I commented too late after the merge. I can look into this at another time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we create a follow up JIRA about this? thanks!

for (int v = 0; v < vectorTypes.size(); v++) {
ArrowVectorType vectorType = vectorTypes.get(v);
BufferBacked innerVector = fieldInnerVectors.get(v);
Expand Down Expand Up @@ -266,9 +272,6 @@ private void readVector(Field field, FieldVector vector) throws JsonParseExcepti
}
readToken(END_ARRAY);
}
if (vector instanceof NullableMapVector) {
((NullableMapVector) vector).valueCount = count;
}
}
readToken(END_OBJECT);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
import org.apache.arrow.vector.dictionary.DictionaryProvider.MapDictionaryProvider;
import org.apache.arrow.vector.file.BaseFileTest;
import org.apache.arrow.vector.types.pojo.Schema;
import org.apache.arrow.vector.util.Validator;
import org.junit.Assert;
import org.junit.Test;
import org.slf4j.Logger;
Expand Down Expand Up @@ -96,28 +97,25 @@ public void testWriteReadUnionJSON() throws IOException {
try (
BufferAllocator vectorAllocator = allocator.newChildAllocator("original vectors", 0, Integer.MAX_VALUE);
NullableMapVector parent = NullableMapVector.empty("parent", vectorAllocator)) {

writeUnionData(count, parent);

printVectors(parent.getChildrenFromFields());

VectorSchemaRoot root = new VectorSchemaRoot(parent.getChild("root"));
validateUnionData(count, root);
try (VectorSchemaRoot root = new VectorSchemaRoot(parent.getChild("root"))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why changing to a nested try blocks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I need both VectorSchemaRoot to compare equality.

validateUnionData(count, root);
writeJSON(file, root, null);

writeJSON(file, root, null);
}
// read
try (
BufferAllocator readerAllocator = allocator.newChildAllocator("reader", 0, Integer.MAX_VALUE);
BufferAllocator vectorAllocator = allocator.newChildAllocator("final vectors", 0, Integer.MAX_VALUE);
) {
JsonFileReader reader = new JsonFileReader(file, readerAllocator);
Schema schema = reader.start();
LOGGER.debug("reading schema: " + schema);
// read
try (BufferAllocator readerAllocator = allocator.newChildAllocator("reader", 0, Integer.MAX_VALUE)) {
JsonFileReader reader = new JsonFileReader(file, readerAllocator);

// initialize vectors
try (VectorSchemaRoot root = reader.read();) {
validateUnionData(count, root);
Schema schema = reader.start();
LOGGER.debug("reading schema: " + schema);

try (VectorSchemaRoot rootFromJson = reader.read();) {
validateUnionData(count, rootFromJson);
Validator.compareVectorSchemaRoot(root, rootFromJson);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would fail without the JsonFileReader change

}
}
}
}
}
Expand Down