-
Notifications
You must be signed in to change notification settings - Fork 3k
Description
I sort data within partitions by columns to gain performance, like insert overwrite tableA partition(pt='20220118') select id,name,age from tableA where pt='20220118' order by id;, and table's write.format.default=orc and 'write.target-file-size-bytes'='134217728'.
But the data file within partitions is only one file with a large size. and I find that ORC file now not support target file size before closed.
because there is only a large data file in every partition, so I can't filter data files at planning time like https://iceberg.apache.org/#performance/#data-filtering.
So if I want to use orc fileformat, how to RollToNewFile?
By the way, In Flink steaming job, will roll a new file when checkpoint, what is the different with batch job? why batch job can't roll ?