From 996e3c3b674be56b616f2a6d33b0f4d5f9da1df7 Mon Sep 17 00:00:00 2001 From: Kyle Bendickson Date: Sat, 29 Jan 2022 21:39:48 -0800 Subject: [PATCH 1/2] Document `max_concurrent_deletes` in remove_orphan_files and expire_snapshots spark procedures --- site/docs/spark-procedures.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/site/docs/spark-procedures.md b/site/docs/spark-procedures.md index eb22163a0ded..d7b829e69113 100644 --- a/site/docs/spark-procedures.md +++ b/site/docs/spark-procedures.md @@ -190,6 +190,7 @@ the `expire_snapshots` procedure will never remove files which are still require | `table` | ✔️ | string | Name of the table to update | | `older_than` | ️ | timestamp | Timestamp before which snapshots will be removed (Default: 5 days ago) | | `retain_last` | | int | Number of ancestor snapshots to preserve regardless of `older_than` (defaults to 1) | +| `max_concurrent_deletes` | | int | Size of the thread pool used for delete file actions (defaults to null, which deletes files serially in the current thread without instantiating a dedicated thread pool) | If `older_than` and `retain_last` are omitted, the table's [expiration properties](./configuration/#table-behavior-properties) will be used. @@ -227,6 +228,7 @@ Used to remove files which are not referenced in any metadata files of an Iceber | `older_than` | ️ | timestamp | Remove orphan files created before this timestamp (Defaults to 3 days ago) | | `location` | | string | Directory to look for files in (defaults to the table's location) | | `dry_run` | | boolean | When true, don't actually remove files (defaults to false) | +| `max_concurrent_deletes` | | int | Size of the thread pool used for delete file actions (defaults to null, which deletes files serially in the current thread without instantiating a dedicated thread pool) | #### Output From e91b97651826bd0126346ea4826eaa039b788243 Mon Sep 17 00:00:00 2001 From: Kyle Bendickson Date: Mon, 31 Jan 2022 12:50:43 -0800 Subject: [PATCH 2/2] Shorten description --- site/docs/spark-procedures.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/site/docs/spark-procedures.md b/site/docs/spark-procedures.md index d7b829e69113..5d4c2e087232 100644 --- a/site/docs/spark-procedures.md +++ b/site/docs/spark-procedures.md @@ -190,7 +190,7 @@ the `expire_snapshots` procedure will never remove files which are still require | `table` | ✔️ | string | Name of the table to update | | `older_than` | ️ | timestamp | Timestamp before which snapshots will be removed (Default: 5 days ago) | | `retain_last` | | int | Number of ancestor snapshots to preserve regardless of `older_than` (defaults to 1) | -| `max_concurrent_deletes` | | int | Size of the thread pool used for delete file actions (defaults to null, which deletes files serially in the current thread without instantiating a dedicated thread pool) | +| `max_concurrent_deletes` | | int | Size of the thread pool used for delete file actions (by default, no thread pool is used) | If `older_than` and `retain_last` are omitted, the table's [expiration properties](./configuration/#table-behavior-properties) will be used. @@ -228,7 +228,7 @@ Used to remove files which are not referenced in any metadata files of an Iceber | `older_than` | ️ | timestamp | Remove orphan files created before this timestamp (Defaults to 3 days ago) | | `location` | | string | Directory to look for files in (defaults to the table's location) | | `dry_run` | | boolean | When true, don't actually remove files (defaults to false) | -| `max_concurrent_deletes` | | int | Size of the thread pool used for delete file actions (defaults to null, which deletes files serially in the current thread without instantiating a dedicated thread pool) | +| `max_concurrent_deletes` | | int | Size of the thread pool used for delete file actions (by default, no thread pool is used) | #### Output