Skip to content

Core: Orc data file not support to shouldRollToNewFile #3916

@liubo1022126

Description

@liubo1022126

@openinx @rdblue

I sort data within partitions by columns to gain performance, like insert overwrite tableA partition(pt='20220118') select id,name,age from tableA where pt='20220118' order by id;, and table's write.format.default=orc and 'write.target-file-size-bytes'='134217728'.

But the data file within partitions is only one file with a large size. and I find that ORC file now not support target file size before closed.
because there is only a large data file in every partition, so I can't filter data files at planning time like https://iceberg.apache.org/#performance/#data-filtering.

So if I want to use orc fileformat, how to RollToNewFile?

By the way, In Flink steaming job, will roll a new file when checkpoint, what is the different with batch job? why batch job can't roll ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions