-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Search before asking
- I searched in the issues and found nothing similar.
Motivation
Basic Information
table
{
"version" : 2,
"id" : 0,
"fields" : [ {
"id" : 0,
"name" : "f0",
"type" : "BIGINT NOT NULL"
}, {
"id" : 1,
"name" : "f1",
"type" : "STRING"
}, {
"id" : 2,
"name" : "f2",
"type" : "STRING"
}, {
"id" : 3,
"name" : "f4",
"type" : "FLOAT"
}, {
"id" : 4,
"name" : "f5",
"type" : "DOUBLE"
}, {
"id" : 5,
"name" : "f6",
"type" : "BOOLEAN"
}, {
"id" : 6,
"name" : "f7",
"type" : {
"type" : "ARRAY",
"element" : "BIGINT"
}
} ],
"highestFieldId" : 6,
"partitionKeys" : [ ],
"primaryKeys" : [ "f0" ],
"options" : {
"bucket" : "1",
"file.format" : "parquet/orc"
},
"timeMillis" : 1732796139692
}
- RowCount :
400w - bucket :
1 - FileNumber :
7 - Ordered Write
The data sample :
1 : [1, h2,7c17184a-46d3-4565-aa87-b31316af2144, 1.000000, 1.000000, false, ([1,2,3,]))
2 : [2, h1,919fc338-5ef6-474c-a1fa-1b2407512731, 2.000000, 2.000000, true, ([2,3,4,]))
3 : [3, h3,6126ee02-8d74-4c3a-a00e-328af5871e4a, 3.000000, 3.000000, false, ([3,4,5]))
4 : [4, h3,c2b0e039-a468-4853-aaa1-57bcde5a49b6, 4.000000, 4.000000, trye, ([4,5,6]))
The Write Example
Table table = TableUtil.getTable(); // PrimaryKeyFileStoreTable
BatchWriteBuilder writeBuilder = table.newBatchWriteBuilder();
String[] items = new String[] {"h1", "h2", "h3"};
BatchTableWrite write = writeBuilder.newWrite(); // TableWriteImpl
long startTime = System.currentTimeMillis();
for (int i = 0; i < 4000000; i++) {
GenericRow genericRow =
GenericRow.of(
(long) i,
BinaryString.fromString(items[i % 3]),
BinaryString.fromString(UUID.randomUUID().toString()),
(float) i,
(double) i,
(i % 2) == 0,
BinaryArray.fromLongArray(
new Long[] {(long) i, (long) i + 1, (long) i + 2}));
write.write(genericRow);
if ((i % 10000) == 0) {
System.out.println("write rows : " + i);
}
}
List<CommitMessage> messages = write.prepareCommit();
BatchTableCommit commit = writeBuilder.newCommit();
commit.commit(messages);
long stopTime = System.currentTimeMillis();
System.out.println("time: " + (stopTime - startTime));
The Read Example
random 30 for read
Table table = TableUtil.getTable(); // PrimaryKeyFileStoreTable
PredicateBuilder builder =
new PredicateBuilder(
RowType.of(
DataTypes.BIGINT(),
DataTypes.STRING(),
DataTypes.STRING(),
DataTypes.FLOAT(),
DataTypes.DOUBLE(),
DataTypes.BOOLEAN(),
DataTypes.ARRAY(DataTypes.BIGINT())));
int[] projection = new int[] {0, 1, 2, 3, 4, 5, 6};
ReadBuilder readBuilder = table.newReadBuilder().withProjection(projection);
List<Split> splits = readBuilder.newScan().plan().splits();
long startTime = System.currentTimeMillis();
Random random = new Random();
for (int i = 0; i < 30; i++) {
int value = random.nextInt(4000000);
Predicate keyFilter = builder.equal(0, (long) value);
InnerTableRead read = (InnerTableRead) readBuilder.newRead();
read.withFilter(keyFilter).executeFilter();
RecordReader<InternalRow> reader = read.createReader(splits);
reader.forEachRemaining(
internalRow -> {
long f0 = internalRow.getLong(0);
String f1 = internalRow.getString(1).toString();
String f2 = internalRow.getString(2).toString();
float f3 = internalRow.getFloat(3);
double f4 = internalRow.getDouble(4);
boolean f5 = internalRow.getBoolean(5);
long[] f6 = internalRow.getArray(6).toLongArray();
System.out.println(
String.format(
"%d : [%d, %s,%s, %f, %f, %b, (%s))",
value, f0, f1, f2, f3, f4, f5, toString(f6)));
});
}
long stopTime = System.currentTimeMillis();
System.out.println("time : " + (stopTime - startTime));
Time Consumption in ORC/Parquet
PARQUETreader :17982msORCreader :1096ms
Root Cause Analysis
Under the current query predicate pushdown, in ORC, it can be pushed down to the column index level, whereas in Parquet, it is only pushed down to the row group level.
Solution
No response
Anything else?
No response
Are you willing to submit a PR?
- I'm willing to submit a PR!
ranxianglei
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request