-
Notifications
You must be signed in to change notification settings - Fork 3k
API, Spark 4.0: Add create_file_list option to RewriteTablePathProcedure. #13837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… file-list generation.
|
@szehon-ho @dramaticlly @manuzhang @huaxingao @nastra Could you help review this pr? This is a minor improvement. Currently, when using the RewriteTablePath stored procedure, a file named file-list is generated in the target path, even though it is unrelated to the actual data. We would like to have the option to skip creating this file. This PR introduces a new option that allows skipping the generation of the file-list file. We discussed some aspects of this PR in #12844. Can we continue moving this PR forward? |
| * @param skipFileList true to skip saving the file list, false to include it | ||
| * @return this instance for method chaining | ||
| */ | ||
| default RewriteTablePath saveFileList(boolean skipFileList) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method name and the parameter name have opposite meanings.
saveFileList(true) is actually not to save file list.
| /** count of rewrite metadata files involved, default value is 0 */ | ||
| default int metadataFilesCount() { | ||
| return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think rewrite metadata files can be confused with metadata.json files, if this is count of rewritten manifest files. let's call it rewrittenManifestFilesCount? Same wise for rewrittenDeleteFilesCount above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, do we expect other counts might need for path based rewrite? I am wondering how such information can be useful for programatic access in real world scenario
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think rewrite metadata files can be confused with metadata.json files, if this is count of rewritten manifest files. let's call it rewrittenManifestFilesCount? Same wise for rewrittenDeleteFilesCount above.
Your suggestions are very valuable, and I am working on making improvements based on your feedback.
| if (skipFileList) { | ||
| return builder.fileListLocation("").build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you return builder.build() instead?
| } | ||
|
|
||
| @TestTemplate | ||
| public void testRewriteTablePathWithSkipFileList() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add another test where we rewritten delete and manifest file count is no zero and assert on the procedure output?
| * </ul> | ||
| */ | ||
| private String rebuildMetadata() { | ||
| private Result rebuildMetadata() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we shall add a corresponding test in TestRewriteTablePathsAction
| * @param skipFileList true to skip saving the file list, false to include it | ||
| * @return this instance for method chaining | ||
| */ | ||
| default RewriteTablePath skipFileList(boolean skipFileList) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not call this createFileList that always defaults to true. Also without reading the javadoc it's not clear what skip actually refers to. Does it skip scanning that file or creating/deleting/x?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your question! RewriteTablePathProcedure generates a CSV file, which is saved by default in the target directory. In #12762, I proposed a new migration approach that allows upgrading Hive tables to Iceberg tables in place. Using this method, we have already upgraded more than 2,000 tables.
Currently, after each migration, a file-list.csv file is created in the table’s metadata directory. Since this file is not actually part of the metadata, we need to clean it up manually. I suggest adding a parameter to the procedure so that when it is invoked, we can choose not to generate this file, thereby simplifying the migration process.
I think calling it createFileList is reasonable, and I will optimize this part accordingly.
| /** Name of latest metadata file version */ | ||
| String latestVersion(); | ||
|
|
||
| /** count of rewrite delete files, default value is 0 */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /** count of rewrite delete files, default value is 0 */ | |
| /** count of rewritten delete files, default value is 0 */ |
| return 0; | ||
| } | ||
|
|
||
| /** count of rewrite metadata files involved, default value is 0 */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /** count of rewrite metadata files involved, default value is 0 */ | |
| /** count of rewritten manifests involved, default value is 0 */ |
| File tmp = java.nio.file.Files.createTempDirectory("iceberg_warehouse").toFile(); | ||
| tmp.deleteOnExit(); | ||
| warehouseLocation = new File(tmp, "iceberg_data").getAbsolutePath(); | ||
| warehouseLocation = new File(tmp, "iceberg_data").toURI().toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what error is this causing? Can you please paste a stack trace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I provided a more detailed explanation in #13882. The issue arises because the path generated by REST uses getAbsolutePath(), while the current unit tests use targetTableDir.toFile().toURI().toString(). Since the two paths do not match, the unit tests fail.
Path file:/var/folders/2k/21gv5vmx6z7dlr1g0_jtm8r80000ks/T/iceberg_warehouse10689209309257803697/iceberg_data/default/table/data/deletes.parquet does not start with /var/folders/2k/21gv5vmx6z7dlr1g0_jtm8r80000ks/T/iceberg_warehouse10689209309257803697/iceberg_data/default/table/
java.lang.IllegalArgumentException: Path file:/var/folders/2k/21gv5vmx6z7dlr1g0_jtm8r80000ks/T/iceberg_warehouse10689209309257803697/iceberg_data/default/table/data/deletes.parquet does not start with /var/folders/2k/21gv5vmx6z7dlr1g0_jtm8r80000ks/T/iceberg_warehouse10689209309257803697/iceberg_data/default/table/
at org.apache.iceberg.RewriteTablePathUtil.relativize(RewriteTablePathUtil.java:686)
at org.apache.iceberg.RewriteTablePathUtil.newPath(RewriteTablePathUtil.java:663)
at org.apache.iceberg.RewriteTablePathUtil.writeDeleteFileEntry(RewriteTablePathUtil.java:492)
at org.apache.iceberg.RewriteTablePathUtil.lambda$rewriteDeleteManifest$5(RewriteTablePathUtil.java:438)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1845)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we remove the change from this patch and let's use #13882 to fix testFixture?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dramaticlly Thank you for your message! I have carefully readed the relevant code and confirmed that this change is necessary. The reason is that the base path for the Hive library in RESTCatalog is inconsistent with other types of Catalogs. Currently, HIVE, HADOOP, and SPARK_SESSION all use URI format, so RESTCatalog needs to be adjusted to the same format. Additionally, since the TestRewriteTablePathProcedure test involves parameter type testing and simultaneously tests four different Catalogs, we must standardize the base path of the Hive library across all four Catalogs to URI format in order to conduct the test successfully. Otherwise, there could be issues with prefix validation failures.
Path file:/var/folders/2k/21gv5vmx6z7dlr1g0_jtm8r80000ks/T/iceberg_warehouse10689209309257803697/iceberg_data/default/table/data/deletes.parquet does not start with /var/folders/2k/21gv5vmx6z7dlr1g0_jtm8r80000ks/T/iceberg_warehouse10689209309257803697/iceberg_data/default/table/
java.lang.IllegalArgumentException: Path file:/var/folders/2k/21gv5vmx6z7dlr1g0_jtm8r80000ks/T/iceberg_warehouse10689209309257803697/iceberg_data/default/table/data/deletes.parquet does not start with /var/folders/2k/21gv5vmx6z7dlr1g0_jtm8r80000ks/T/iceberg_warehouse10689209309257803697/iceberg_data/default/table/
at org.apache.iceberg.RewriteTablePathUtil.relativize(RewriteTablePathUtil.java:686)
.....
I have provided a detailed description in #13882 and hope someone from the team can assist in reviewing this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RussellSpitzer I initially intended to make this change in PR #13837, but since this change seems inconsistent with the core modifications of #13837, I have extracted it into PR #13882 for processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@slfan1989 I think this is actually because how position delete was written in unit test with File file = new File(removePrefix(table.location()) + "/data/deletes.parquet");. I suggested alternative way so that we can still get what we want without need to change this (at least for this PR).
If we want to update the behavior in REST test fixture, let's use #13882 instead with proper unit tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dramaticlly Thank you for your suggestion, which gave me some insights. I have added a logic in the unit tests for the REST Catalog. When encountering REST Catalog, it will use getAbsolutePath to obtain the path for deleting the file. After local testing, the solution works well, and I have added comments to explain the issue.
if (SparkCatalogConfig.REST.catalogName().equals(catalogName)) {
// We applied this special handling because the base path for
// matching the RESTCATALOG's Hive BaseLocation is represented
// in the form of an AbsolutePath.
filePath = file.getAbsolutePath().toString();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dramaticlly Can I make this modification? I would like to hear your opinion. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this is just for unit tests, apply special casing on REST catalog is fine to collect path is fine. We can later focus on #13882 to see if we need to resolve the differences between catalogs.
| } | ||
|
|
||
| @TestTemplate | ||
| public void testRewriteTablePathWithSkipFileList() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| public void testRewriteTablePathWithSkipFileList() { | |
| public void testRewriteTablePathWithoutFileList() { |
|
|
||
| @TestTemplate | ||
| public void testRewriteTablePathWithManifestAndDeleteCounts() throws IOException { | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary newline
| sql("INSERT INTO %s VALUES (3, 'c')", tableName); | ||
|
|
||
| Table table = validationCatalog.loadTable(tableIdent); | ||
| List<Pair<CharSequence, Long>> deletes = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this says deletes but it actually refers to data files
| new StructField("file_list_location", DataTypes.StringType, true, Metadata.empty()) | ||
| new StructField("file_list_location", DataTypes.StringType, true, Metadata.empty()), | ||
| new StructField( | ||
| "rewrite_metadata_files_count", DataTypes.IntegerType, true, Metadata.empty()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "rewrite_metadata_files_count", DataTypes.IntegerType, true, Metadata.empty()), | |
| "rewritten_manifests_count", DataTypes.IntegerType, true, Metadata.empty()), |
| new StructField( | ||
| "rewrite_metadata_files_count", DataTypes.IntegerType, true, Metadata.empty()), | ||
| new StructField( | ||
| "rewrite_delete_files_count", DataTypes.IntegerType, true, Metadata.empty()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "rewrite_delete_files_count", DataTypes.IntegerType, true, Metadata.empty()) | |
| "rewritten_delete_files_count", DataTypes.IntegerType, true, Metadata.empty()) |
| String latestVersion(); | ||
|
|
||
| /** count of rewrite delete files, default value is 0 */ | ||
| default int rewrittenDeleteFilesCount() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think rewrittenDeleteFilesCount might be a bit misleading? Aren't we just rewriting table paths and not actually compacting files together? Other Actions/Procedures use a similar naming but there it actually means that files have been compacted together
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your suggestion is reasonable. Would it be fine to name them deleteFilesPathRewrittenCount and manifestFilesPathRewrittenCount? This naming would accurately reflect that only the paths are being rewritten, rather than the files themselves being merged or compacted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that makes sense with a slight update to the naming, so maybe rewrittenDeleteFilePathsCount and rewrittenManifestFilePathsCount. @szehon-ho do you think that makes sense here?
| return saveFileList(copyPlan); | ||
| return builder | ||
| .stagingLocation(stagingDir) | ||
| .fileListLocation(fileListLocation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think another issue is actually that fileListLocation is as mandatory property in
| String fileListLocation(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nastra I am curious, does API contract allow String typed fileListLocation to have value null? I assume line 319 once built without fileListLocation will have its value = null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the code based on nastra’s suggestion and confirmed that his analysis is correct. I also verified it locally: if fileListLocation is not set, an exception will indeed be thrown.
Cannot build Result, some of required attributes are not set [fileListLocation]
java.lang.IllegalStateException: Cannot build Result, some of required attributes are not set [fileListLocation]
at org.apache.iceberg.actions.ImmutableRewriteTablePath$Result$Builder.build(ImmutableRewriteTablePath.java:514)
I’ve been thinking—if fileListLocation is empty, can we simply treat it as N/A? My idea is to assign N/A directly during the check, without changing the original definition. Do you think this approach would work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep I think assigning a n/a string when building the result object makes sense to me, cc @szehon-ho
| .isGreaterThan(0); | ||
| assertThat(rewrittenManifestFilesCount) | ||
| .as("Should rewrite at least one manifest file") | ||
| .isGreaterThan(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we assert on equality of those rewritten file count? This is to ensure we have number correctly in procedure output
| return saveFileList(copyPlan); | ||
| return builder | ||
| .stagingLocation(stagingDir) | ||
| .fileListLocation(fileListLocation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nastra I am curious, does API contract allow String typed fileListLocation to have value null? I assume line 319 once built without fileListLocation will have its value = null
|
@nastra @dramaticlly Thank you for reviewing the code! Your suggestions have been very helpful to me. I have optimized the code based on both of your suggestions. Could you please take another look? Thank you so much! |
| private String endVersionName; | ||
| private String stagingDir; | ||
| private boolean skipFileList; | ||
| private boolean fileListEnabled = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we call it createFileList so we can have consist naming in interface, action and procedure?
| } | ||
|
|
||
| @Test | ||
| public void testRewritePathWithCreateFileListFalse() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: how about testRewritePathWithoutCreateFileList
|
|
||
| // file list generation disabled | ||
| if (!fileListEnabled) { | ||
| return builder.fileListLocation("N/A").build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we pull this up as a package local variable to represent not applicable and use it to assert in spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteTablePathsAction.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the suggestion! I have extracted a variable to represent 'N/A'.
| /** Number of delete files whose paths were updated. Default value is 0. */ | ||
| default int deleteFilesPathRewrittenCount() { | ||
| return 0; | ||
| } | ||
|
|
||
| /** Number of manifest files whose paths were updated. Default value is 0. */ | ||
| default int manifestFilesPathRewrittenCount() { | ||
| return 0; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /** Number of delete files whose paths were updated. Default value is 0. */ | |
| default int deleteFilesPathRewrittenCount() { | |
| return 0; | |
| } | |
| /** Number of manifest files whose paths were updated. Default value is 0. */ | |
| default int manifestFilesPathRewrittenCount() { | |
| return 0; | |
| } | |
| /** Number of delete files with rewritten paths. */ | |
| default int deleteFilesPathRewrittenCount() { | |
| return 0; | |
| } | |
| /** Number of manifest files with rewritten paths. */ | |
| default int manifestFilesPathRewrittenCount() { | |
| return 0; | |
| } |
|
@nastra Could you please take another look at this PR? Thank you very much! |
|
@nastra Could you please help review it? Thanks a lot! dramaticlly has already given a +1 |
| String latestVersion(); | ||
|
|
||
| /** Number of delete files with rewritten paths. */ | ||
| default int deleteFilesPathRewrittenCount() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we rename these two methods to what I suggested earlier in #13837 (comment)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for reviewing the code. I’ve made improvements based on your suggestions.
| // in the form of an AbsolutePath. | ||
| filePath = file.getAbsolutePath().toString(); | ||
| } | ||
| DeleteFile positionDeletes = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please add a newline after the }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have improved this part of the code.
...k/v4.0/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteTablePathsAction.java
Outdated
Show resolved
Hide resolved
...k/v4.0/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteTablePathsAction.java
Outdated
Show resolved
Hide resolved
| if (stagingLocation != null) { | ||
| action.stagingLocation(stagingLocation); | ||
| } | ||
| action.createFileList(createFileList); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: newline after }
| new StructField("file_list_location", DataTypes.StringType, true, Metadata.empty()) | ||
| new StructField("file_list_location", DataTypes.StringType, true, Metadata.empty()), | ||
| new StructField( | ||
| "rewritten_metadata_files_count", DataTypes.IntegerType, true, Metadata.empty()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we would need to update the naming here as well to indicate that these are referring to paths
| .manifestFilesPathRewrittenCount(metaFiles.size()) | ||
| .latestVersion(RewriteTablePathUtil.fileName(endVersionName)); | ||
|
|
||
| // file list generation disabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment can be removed as it doesn't add much value imo
| return builder | ||
| .stagingLocation(stagingDir) | ||
| .fileListLocation(fileListLocation) | ||
| .latestVersion(RewriteTablePathUtil.fileName(endVersionName)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we setting latestVersion and stagingLocation again here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the question! The code isn’t duplicated — we moved the doExecute logic into rebuildMetadata so that fileListLocation and other Builder variables can be set in one place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what I mean is: we are setting latestVersion and stagingLocation already in L309, so why do it here again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the misunderstanding, and thank you for your feedback. You are correct that this code is redundant, as the relevant variables are already set in line 309, and we don't need to set them again.
nastra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once the remaining comment has been applied. @szehon-ho can you also take a look please?
|
@slfan1989 can you please update the PR title to reflect the latest changes? |
@nastra Thanks a lot for the review! I’ve updated the title to |
@nastra Could we consider moving this PR forward? Many thanks! I’d also like to contribute a new feature on top of it — in-place upgrades from Hive tables to Iceberg tables. Over the past four months, we’ve used this internally to upgrade more than 2,800 tables, delivering significant performance and cost benefits. I hope this feature can help more users perform in-place upgrades from Hive to Iceberg. |
|
@nastra @dramaticlly @manuzhang Thanks a lot for your help throughout the process of completing this PR! |
|
Sorry for delay in reply, late looks good to me, thanks @slfan1989 |
@szehon-ho Thank you for helping with the review! |
|
|
||
| private static final Logger LOG = LoggerFactory.getLogger(RewriteTablePathSparkAction.class); | ||
| private static final String RESULT_LOCATION = "file-list"; | ||
| static final String NOT_APPLICABLE = "N/A"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry just noticed one thing, this seems a bit awkward. Was there some discussion on just having an empty string, or does the result object not allow it? In any case, I feel it can be private here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your message. After discussion, it was ultimately decided to be a package-level private member. We also used this variable in the unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of N/A has also been discussed. From a personal perspective, I believe it should be acceptable. The specific message link is as follows:
This is a minor feature improvement. The background is that we are using RewriteTablePathProcedure to convert Hive tables to Iceberg tables, as detailed in #12762. RewriteTablePathProcedure generates a file-list file, and I need to manually clean up this file after each conversion. I understand that the file-list is mainly used to check data integrity, but since it is not essential for metadata, I believe allowing users to decide whether to generate this file would offer greater flexibility.