Skip to content

[Feature] In a Paimon primary key table, using ORC offers significantly higher efficiency for point lookups based on the primary key compared to Parquet. #4586

@Aiden-Dong

Description

@Aiden-Dong

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

Basic Information

table

{
  "version" : 2,
  "id" : 0,
  "fields" : [ {
    "id" : 0,
    "name" : "f0",
    "type" : "BIGINT NOT NULL"
  }, {
    "id" : 1,
    "name" : "f1",
    "type" : "STRING"
  }, {
    "id" : 2,
    "name" : "f2",
    "type" : "STRING"
  }, {
    "id" : 3,
    "name" : "f4",
    "type" : "FLOAT"
  }, {
    "id" : 4,
    "name" : "f5",
    "type" : "DOUBLE"
  }, {
    "id" : 5,
    "name" : "f6",
    "type" : "BOOLEAN"
  }, {
    "id" : 6,
    "name" : "f7",
    "type" : {
      "type" : "ARRAY",
      "element" : "BIGINT"
    }
  } ],
  "highestFieldId" : 6,
  "partitionKeys" : [ ],
  "primaryKeys" : [ "f0" ],
  "options" : {
    "bucket" : "1",
    "file.format" : "parquet/orc"
  },
  "timeMillis" : 1732796139692
}
  • RowCount : 400w
  • bucket : 1
  • FileNumber : 7
  • Ordered Write

The data sample :

1 : [1, h2,7c17184a-46d3-4565-aa87-b31316af2144, 1.000000, 1.000000, false, ([1,2,3,]))
2 : [2, h1,919fc338-5ef6-474c-a1fa-1b2407512731, 2.000000, 2.000000, true, ([2,3,4,]))
3 : [3, h3,6126ee02-8d74-4c3a-a00e-328af5871e4a, 3.000000, 3.000000, false, ([3,4,5]))
4 : [4, h3,c2b0e039-a468-4853-aaa1-57bcde5a49b6, 4.000000, 4.000000, trye, ([4,5,6]))

The Write Example

Table table = TableUtil.getTable(); // PrimaryKeyFileStoreTable
BatchWriteBuilder writeBuilder = table.newBatchWriteBuilder();

String[] items = new String[] {"h1", "h2", "h3"};

BatchTableWrite write = writeBuilder.newWrite(); // TableWriteImpl

long startTime = System.currentTimeMillis();

for (int i = 0; i < 4000000; i++) {

    GenericRow genericRow =
            GenericRow.of(
                    (long) i,
                    BinaryString.fromString(items[i % 3]),
                    BinaryString.fromString(UUID.randomUUID().toString()),
                    (float) i,
                    (double) i,
                    (i % 2) == 0,
                    BinaryArray.fromLongArray(
                            new Long[] {(long) i, (long) i + 1, (long) i + 2}));

    write.write(genericRow);

    if ((i % 10000) == 0) {
        System.out.println("write rows : " + i);
    }
}

List<CommitMessage> messages = write.prepareCommit();
BatchTableCommit commit = writeBuilder.newCommit();

commit.commit(messages);

long stopTime = System.currentTimeMillis();
System.out.println("time: " + (stopTime - startTime));

The Read Example

random 30 for read

Table table = TableUtil.getTable(); // PrimaryKeyFileStoreTable

PredicateBuilder builder =
        new PredicateBuilder(
                RowType.of(
                        DataTypes.BIGINT(),
                        DataTypes.STRING(),
                        DataTypes.STRING(),
                        DataTypes.FLOAT(),
                        DataTypes.DOUBLE(),
                        DataTypes.BOOLEAN(),
                        DataTypes.ARRAY(DataTypes.BIGINT())));

  int[] projection = new int[] {0, 1, 2, 3, 4, 5, 6};
  ReadBuilder readBuilder = table.newReadBuilder().withProjection(projection);
  List<Split> splits = readBuilder.newScan().plan().splits();

  long startTime = System.currentTimeMillis();

  Random random = new Random();

  for (int i = 0; i < 30; i++) {

      int value = random.nextInt(4000000);
      Predicate keyFilter = builder.equal(0, (long) value);

      InnerTableRead read = (InnerTableRead) readBuilder.newRead();

      read.withFilter(keyFilter).executeFilter();
      RecordReader<InternalRow> reader = read.createReader(splits);

      reader.forEachRemaining(
              internalRow -> {
                  long f0 = internalRow.getLong(0);
                  String f1 = internalRow.getString(1).toString();
                  String f2 = internalRow.getString(2).toString();
                  float f3 = internalRow.getFloat(3);
                  double f4 = internalRow.getDouble(4);
                  boolean f5 = internalRow.getBoolean(5);

                  long[] f6 = internalRow.getArray(6).toLongArray();

                  System.out.println(
                          String.format(
                                  "%d : [%d, %s,%s, %f, %f, %b, (%s))",
                                  value, f0, f1, f2, f3, f4, f5, toString(f6)));
              });
  }
  long stopTime = System.currentTimeMillis();
  System.out.println("time : " + (stopTime - startTime));

Time Consumption in ORC/Parquet

  • PARQUET reader : 17982ms
  • ORC reader : 1096ms

Root Cause Analysis

Under the current query predicate pushdown, in ORC, it can be pushed down to the column index level, whereas in Parquet, it is only pushed down to the row group level.

Solution

No response

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions