From afce136231223fa38ccadc4712186728d59d96f0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20Nobbe?= Date: Thu, 14 Aug 2025 14:44:07 +0200 Subject: [PATCH 01/18] remove mention of num_iterations (see #1212) --- auto3dseg/notebooks/auto_runner.ipynb | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/auto3dseg/notebooks/auto_runner.ipynb b/auto3dseg/notebooks/auto_runner.ipynb index 0c1bc5ba48..2d3bc1b266 100644 --- a/auto3dseg/notebooks/auto_runner.ipynb +++ b/auto3dseg/notebooks/auto_runner.ipynb @@ -273,13 +273,9 @@ "\n", "`set_training_params` in `AutoRunner` provides an interface to change all algorithms' training parameters in one line. \n", "\n", - "NOTE: \n", - "**Auto3DSeg** uses MONAI bundle templates to perform training, validation, and inference.\n", - "The number of epochs/iterations of training is specified by the config files in each template.\n", - "Users can override these these values in the bundle templates.\n", - "But users should consider that some bundle templates may use `num_iterations` and other may use `num_epochs` to iterate.\n", + "As an example, see the code block below, which specifies e.g. the number of epochs used for training. Note that some algorithms may treat this as a maximum number of epochs.\n", "\n", - "For demo purposes, below is a code block to convert num_epoch to iteration style and override all algorithms with the same training parameters.\n", + "NOTE: \n", "The setup works fine for a machine that has GPUs less than or equal to 8.\n", "The datalist in this example is only using a subset of the original dataset.\n", "Users need to ensure the number of GPUs is not greater than the number that the training dataset can be partitioned.\n", From ffb501dd57b5465de313cf2682068c32dd5736d6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20Nobbe?= Date: Thu, 14 Aug 2025 14:47:18 +0200 Subject: [PATCH 02/18] clarify relevance of datalist generator notebook --- ...nerator.ipynb => msd_crossval_datalist_generator.ipynb} | 7 +++++++ 1 file changed, 7 insertions(+) rename auto3dseg/notebooks/{msd_datalist_generator.ipynb => msd_crossval_datalist_generator.ipynb} (98%) diff --git a/auto3dseg/notebooks/msd_datalist_generator.ipynb b/auto3dseg/notebooks/msd_crossval_datalist_generator.ipynb similarity index 98% rename from auto3dseg/notebooks/msd_datalist_generator.ipynb rename to auto3dseg/notebooks/msd_crossval_datalist_generator.ipynb index 3b9be29e8c..0980a50450 100644 --- a/auto3dseg/notebooks/msd_datalist_generator.ipynb +++ b/auto3dseg/notebooks/msd_crossval_datalist_generator.ipynb @@ -22,6 +22,13 @@ "# Datalist Generator" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This notebook contains an example to add cross-validation folds to an existing Medical Segmentation Decathlon datalist, in this case the one of Task09_Spleen." + ] + }, { "cell_type": "markdown", "metadata": { From 5bf0ed69c3266cd309dbc95db237b700a724c3e8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20Nobbe?= Date: Thu, 14 Aug 2025 14:54:05 +0200 Subject: [PATCH 03/18] clarify relevance of datalist generator notebook --- auto3dseg/notebooks/msd_crossval_datalist_generator.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/auto3dseg/notebooks/msd_crossval_datalist_generator.ipynb b/auto3dseg/notebooks/msd_crossval_datalist_generator.ipynb index 0980a50450..97f620714c 100644 --- a/auto3dseg/notebooks/msd_crossval_datalist_generator.ipynb +++ b/auto3dseg/notebooks/msd_crossval_datalist_generator.ipynb @@ -19,7 +19,7 @@ "See the License for the specific language governing permissions and \n", "limitations under the License. \n", "\n", - "# Datalist Generator" + "# Datalist Cross-Validation Folds Generator" ] }, { From 9e3d55752ba0e1ca629bf2ce79b33dd2e6991280 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20Nobbe?= Date: Thu, 14 Aug 2025 15:41:48 +0200 Subject: [PATCH 04/18] update readme with information on datalist format --- auto3dseg/README.md | 29 ++++++++++++++++++++++++++++- 1 file changed, 28 insertions(+), 1 deletion(-) diff --git a/auto3dseg/README.md b/auto3dseg/README.md index e13996b07d..bd93ece615 100644 --- a/auto3dseg/README.md +++ b/auto3dseg/README.md @@ -56,13 +56,40 @@ We provide [a two-minute example](notebooks/auto3dseg_hello_world.ipynb) for use To further demonstrate the capabilities of **Auto3DSeg**, [here](./tasks/instance22/README.md) is the detailed performance of the algorithm in **Auto3DSeg**, which won 2nd place in the MICCAI 2022 challenge **[INSTANCE22: The 2022 Intracranial Hemorrhage Segmentation Challenge on Non-Contrast Head CT (NCCT)](https://instance.grand-challenge.org/)** +## Running with Own Data + +To run Auto3DSeg on your own dataset, all you need to do is build a `datalist.json` file for your dataset, and run the AutoRunner on it. + +The datalist format is based on the datasets released by the (Medical Segmentation Decathlon)[http://medicaldecathlon.com]. +See the function `load_decathlon_datalist` in `monai/data/decathlon_datalist.py` for a description of the format. + +For the AutoRunner, we only need the `training` data, since it will automatically create cross-validation folds. +Any other metadata, such as `modality`, `numTraining`, `name`, etc. will not be used by the AutoRunner, but we do recommend to keep track of names and versions of the dataset. +In short, your `datalist.json` file should look like this: + +``` +{ + "name": "Example datalist.json" + "training": + [ + {"image": "/path/to/image_1.nii.gz", "label": "/path/to/label_1.nii.gz"}, + {"image": "/path/to/image_2.nii.gz", "label": "/path/to/label_2.nii.gz"}, + ... + ] +} + +``` + +The AutoRunner will create a `work_dir` folder in the directory from which it is ran, with the resulting models and the copied datalist file _with_ cross-validation folds. This allows you to see which datalist file the models are trained on. +You are free to add the cross-validation folds beforehand, these should align with the number of folds set in the configuration of the AutoRunner (by default 5, see [notebook](notebooks/auto_runner.ipynb)). + ## Reference Python APIs for Auto3DSeg **Auto3DSeg** offers users different levels of APIs to run pipelines that suit their needs. ### 1. Run with Minimal Input using ```AutoRunner``` -The user needs to provide a data list (".json" file) for the new task and data root. A typical data list is as this [example](tasks/msd/Task05_Prostate/msd_task05_prostate_folds.json). A sample datalist for an existing MSD formatted dataset can be created using [this notebook](notebooks/msd_datalist_generator.ipynb). After creating the data list, the user can create a simple "task.yaml" file (shown below) as the minimum input for **Auto3DSeg**. +The user needs to provide a data list (".json" file) for the new task and data root. A typical data list is as this [example](tasks/msd/Task05_Prostate/msd_task05_prostate_folds.json). [This notebook](notebooks/msd_crossval_datalist_generator.ipynb) features an example to create a datalist with cross-validation folds from an existing MSD dataset. After creating the data list, the user can create a simple "task.yaml" file (shown below) as the minimum input for **Auto3DSeg**. ``` modality: CT From 79484f7e66cee00f87190f85956be3442a593639 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20Nobbe?= Date: Thu, 14 Aug 2025 15:42:22 +0200 Subject: [PATCH 05/18] update run_with_minimal_inpu with a clearer description of the data format --- auto3dseg/docs/run_with_minimal_input.md | 56 +++++++----------------- 1 file changed, 16 insertions(+), 40 deletions(-) diff --git a/auto3dseg/docs/run_with_minimal_input.md b/auto3dseg/docs/run_with_minimal_input.md index 0ec8d82872..4e187c280d 100644 --- a/auto3dseg/docs/run_with_minimal_input.md +++ b/auto3dseg/docs/run_with_minimal_input.md @@ -18,55 +18,31 @@ if os.path.exists(root): download_and_extract(resource, compressed_file, root) ``` -**Step 1.** Provide the following data list (a ".json" file) for a new task and the data root. The typical data list is shown as follows. +**Step 1.** Provide a `datalist.json` file. +See the documentation under the `load_decathlon_datalist` function in `monai.data.decathlon_datalist` for details on the file format. +For the AutoRunner, you only need the `training` field with its list of training files: ``` { - "training": [ - { - "fold": 0, - "image": "image_001.nii.gz", - "label": "label_001.nii.gz" - }, - { - "fold": 0, - "image": "image_002.nii.gz", - "label": "label_002.nii.gz" - }, - { - "fold": 1, - "image": "image_003.nii.gz", - "label": "label_001.nii.gz" - }, - { - "fold": 2, - "image": "image_004.nii.gz", - "label": "label_002.nii.gz" - }, - { - "fold": 3, - "image": "image_005.nii.gz", - "label": "label_003.nii.gz" - }, - { - "fold": 4, - "image": "image_006.nii.gz", - "label": "label_004.nii.gz" - } - ], - "testing": [ - { - "image": "image_010.nii.gz" - } - ] + "training": + [ + {"image": "/path/to/image_1.nii.gz", "label": "/path/to/label_1.nii.gz"}, + {"image": "/path/to/image_2.nii.gz", "label": "/path/to/label_2.nii.gz"}, + ... + ] } + ``` +In each training item, you can add a `fold` field (with an integer starting at 0) to pre-specify the cross-validation folds, otherwise the AutoRunner will generate its own folds. All trained algorithms will use the same generated or pre-specified folds, the file can be found in the `work_dir` folder that the AutoRunner generates. +It is recommended to add a `name` field and any other metadata fields that allow you to track which version of your dataset the models are trained on. + +Save the file to `./datalist.json`. **Step 2.** Prepare "task.yaml" with the necessary information as follows. ``` -modality: CT -datalist: "./task.json" +modality: CT # or MRI +datalist: "./datalist.json" dataroot: "/workspace/data/task" ``` From c969d108da2d05ae3a657e90d42a016a6e10fe44 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20Nobbe?= Date: Thu, 14 Aug 2025 16:49:29 +0200 Subject: [PATCH 06/18] =?UTF-8?q?DCO=20Remediation=20Commit=20for=20Dani?= =?UTF-8?q?=C3=ABl=20Nobbe=20?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit I, Daniël Nobbe , hereby add my Signed-off-by to this commit: afce136231223fa38ccadc4712186728d59d96f0 I, Daniël Nobbe , hereby add my Signed-off-by to this commit: ffb501dd57b5465de313cf2682068c32dd5736d6 I, Daniël Nobbe , hereby add my Signed-off-by to this commit: 5bf0ed69c3266cd309dbc95db237b700a724c3e8 I, Daniël Nobbe , hereby add my Signed-off-by to this commit: 9e3d55752ba0e1ca629bf2ce79b33dd2e6991280 I, Daniël Nobbe , hereby add my Signed-off-by to this commit: 79484f7e66cee00f87190f85956be3442a593639 Signed-off-by: Daniël Nobbe --- auto3dseg/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/auto3dseg/README.md b/auto3dseg/README.md index bd93ece615..9cfa891945 100644 --- a/auto3dseg/README.md +++ b/auto3dseg/README.md @@ -64,6 +64,7 @@ The datalist format is based on the datasets released by the (Medical Segmentati See the function `load_decathlon_datalist` in `monai/data/decathlon_datalist.py` for a description of the format. For the AutoRunner, we only need the `training` data, since it will automatically create cross-validation folds. +You are free to add the cross-validation folds beforehand, these should align with the number of folds set in the configuration of the AutoRunner (by default 5, see [notebook](notebooks/auto_runner.ipynb)). Any other metadata, such as `modality`, `numTraining`, `name`, etc. will not be used by the AutoRunner, but we do recommend to keep track of names and versions of the dataset. In short, your `datalist.json` file should look like this: @@ -81,7 +82,6 @@ In short, your `datalist.json` file should look like this: ``` The AutoRunner will create a `work_dir` folder in the directory from which it is ran, with the resulting models and the copied datalist file _with_ cross-validation folds. This allows you to see which datalist file the models are trained on. -You are free to add the cross-validation folds beforehand, these should align with the number of folds set in the configuration of the AutoRunner (by default 5, see [notebook](notebooks/auto_runner.ipynb)). ## Reference Python APIs for Auto3DSeg From 5106732cb4dc671530453cdf6c6e555efab41694 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20Nobbe?= Date: Thu, 14 Aug 2025 16:53:06 +0200 Subject: [PATCH 07/18] minor language fix MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Daniël Nobbe --- auto3dseg/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/auto3dseg/README.md b/auto3dseg/README.md index 9cfa891945..846b13336c 100644 --- a/auto3dseg/README.md +++ b/auto3dseg/README.md @@ -65,7 +65,7 @@ See the function `load_decathlon_datalist` in `monai/data/decathlon_datalist.py` For the AutoRunner, we only need the `training` data, since it will automatically create cross-validation folds. You are free to add the cross-validation folds beforehand, these should align with the number of folds set in the configuration of the AutoRunner (by default 5, see [notebook](notebooks/auto_runner.ipynb)). -Any other metadata, such as `modality`, `numTraining`, `name`, etc. will not be used by the AutoRunner, but we do recommend to keep track of names and versions of the dataset. +Any other metadata, such as `modality`, `numTraining`, `name`, etc. will not be used by the AutoRunner, but we do recommend adding them, to keep track of names and versions of the dataset. In short, your `datalist.json` file should look like this: ``` From 4cfa3f6a40e8e79ab8082d9654a2e94c0b0d2760 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20Nobbe?= Date: Thu, 14 Aug 2025 16:57:20 +0200 Subject: [PATCH 08/18] specify where to find next steps MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Daniël Nobbe --- auto3dseg/README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/auto3dseg/README.md b/auto3dseg/README.md index 846b13336c..c15a07f896 100644 --- a/auto3dseg/README.md +++ b/auto3dseg/README.md @@ -83,6 +83,8 @@ In short, your `datalist.json` file should look like this: The AutoRunner will create a `work_dir` folder in the directory from which it is ran, with the resulting models and the copied datalist file _with_ cross-validation folds. This allows you to see which datalist file the models are trained on. +See the description below or the file [run_with_minimal_input.md](docs/run_with_minimal_input.md) how to use your datalist with the AutoRunner. + ## Reference Python APIs for Auto3DSeg **Auto3DSeg** offers users different levels of APIs to run pipelines that suit their needs. From 8f3405adcceb45a025521a8d4666e07d147ff5b1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20Nobbe?= Date: Thu, 14 Aug 2025 16:58:53 +0200 Subject: [PATCH 09/18] fix title MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Daniël Nobbe --- auto3dseg/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/auto3dseg/README.md b/auto3dseg/README.md index c15a07f896..0afcc42cf4 100644 --- a/auto3dseg/README.md +++ b/auto3dseg/README.md @@ -56,7 +56,7 @@ We provide [a two-minute example](notebooks/auto3dseg_hello_world.ipynb) for use To further demonstrate the capabilities of **Auto3DSeg**, [here](./tasks/instance22/README.md) is the detailed performance of the algorithm in **Auto3DSeg**, which won 2nd place in the MICCAI 2022 challenge **[INSTANCE22: The 2022 Intracranial Hemorrhage Segmentation Challenge on Non-Contrast Head CT (NCCT)](https://instance.grand-challenge.org/)** -## Running with Own Data +## Running With Your Own Data To run Auto3DSeg on your own dataset, all you need to do is build a `datalist.json` file for your dataset, and run the AutoRunner on it. From c31841a71c66d2427e1f9626e873541019a93db4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20Nobbe?= Date: Thu, 14 Aug 2025 17:07:58 +0200 Subject: [PATCH 10/18] modify description MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Daniël Nobbe --- auto3dseg/README.md | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/auto3dseg/README.md b/auto3dseg/README.md index 0afcc42cf4..62fdaceb3d 100644 --- a/auto3dseg/README.md +++ b/auto3dseg/README.md @@ -58,14 +58,15 @@ To further demonstrate the capabilities of **Auto3DSeg**, [here](./tasks/instanc ## Running With Your Own Data -To run Auto3DSeg on your own dataset, all you need to do is build a `datalist.json` file for your dataset, and run the AutoRunner on it. +To run Auto3DSeg on your own dataset, you need to build a `datalist.json` file for your dataset, and run the AutoRunner on it. -The datalist format is based on the datasets released by the (Medical Segmentation Decathlon)[http://medicaldecathlon.com]. +The datalist format is based on the datasets released by the [Medical Segmentation Decathlon](http://medicaldecathlon.com). See the function `load_decathlon_datalist` in `monai/data/decathlon_datalist.py` for a description of the format. -For the AutoRunner, we only need the `training` data, since it will automatically create cross-validation folds. -You are free to add the cross-validation folds beforehand, these should align with the number of folds set in the configuration of the AutoRunner (by default 5, see [notebook](notebooks/auto_runner.ipynb)). -Any other metadata, such as `modality`, `numTraining`, `name`, etc. will not be used by the AutoRunner, but we do recommend adding them, to keep track of names and versions of the dataset. +For the AutoRunner, we only need the `training` list in the JSON, it does not use any other fields. +The `fold` key for each image is not required, as the AutoRunner will automatically create cross-validation folds. +If you do add the cross-validation folds beforehand, these should align with the number of folds set in the configuration of the AutoRunner (by default 5, see [notebook](notebooks/auto_runner.ipynb)). +Any other metadata, such as `modality`, `numTraining`, `name`, etc. will not be used by the AutoRunner, but we do recommend using metadata fields to keep track of names and versions of your dataset. In short, your `datalist.json` file should look like this: ``` @@ -81,9 +82,9 @@ In short, your `datalist.json` file should look like this: ``` -The AutoRunner will create a `work_dir` folder in the directory from which it is ran, with the resulting models and the copied datalist file _with_ cross-validation folds. This allows you to see which datalist file the models are trained on. +The AutoRunner will create a `work_dir` folder in the directory from which it is ran, which will contain the resulting models and the copied datalist file _with_ cross-validation folds. This allows you to keep track of which datalist file the models are trained on. -See the description below or the file [run_with_minimal_input.md](docs/run_with_minimal_input.md) how to use your datalist with the AutoRunner. +See the description below or the file [run_with_minimal_input.md](docs/run_with_minimal_input.md) to use your datalist with the AutoRunner. ## Reference Python APIs for Auto3DSeg From 4bd67b7e92ac3cf82a864d0a67115c889aecff05 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20Nobbe?= Date: Thu, 14 Aug 2025 17:09:50 +0200 Subject: [PATCH 11/18] formatting MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Daniël Nobbe --- auto3dseg/README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/auto3dseg/README.md b/auto3dseg/README.md index 62fdaceb3d..1405022308 100644 --- a/auto3dseg/README.md +++ b/auto3dseg/README.md @@ -58,15 +58,15 @@ To further demonstrate the capabilities of **Auto3DSeg**, [here](./tasks/instanc ## Running With Your Own Data -To run Auto3DSeg on your own dataset, you need to build a `datalist.json` file for your dataset, and run the AutoRunner on it. +To run Auto3DSeg on your own dataset, you need to build a `datalist.json` file, and pass it to the AutoRunner. The datalist format is based on the datasets released by the [Medical Segmentation Decathlon](http://medicaldecathlon.com). See the function `load_decathlon_datalist` in `monai/data/decathlon_datalist.py` for a description of the format. For the AutoRunner, we only need the `training` list in the JSON, it does not use any other fields. -The `fold` key for each image is not required, as the AutoRunner will automatically create cross-validation folds. -If you do add the cross-validation folds beforehand, these should align with the number of folds set in the configuration of the AutoRunner (by default 5, see [notebook](notebooks/auto_runner.ipynb)). -Any other metadata, such as `modality`, `numTraining`, `name`, etc. will not be used by the AutoRunner, but we do recommend using metadata fields to keep track of names and versions of your dataset. +The `fold` key for each image is not required, as the AutoRunner will automatically create cross-validation folds. +If you do add the cross-validation folds beforehand, these should align with the number of folds set in the configuration of the AutoRunner (by default 5, see [notebook](notebooks/auto_runner.ipynb)). +Any other metadata, such as `modality`, `numTraining`, `name`, etc. will not be used by the AutoRunner, but we do recommend using metadata fields to keep track of names and versions of your dataset. In short, your `datalist.json` file should look like this: ``` From 1c739062830231199b4255211ae861aae67e9835 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20Nobbe?= Date: Fri, 15 Aug 2025 17:18:30 +0200 Subject: [PATCH 12/18] specify validation key and fixed cross-val behavior MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Daniël Nobbe --- auto3dseg/README.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/auto3dseg/README.md b/auto3dseg/README.md index 1405022308..b951160670 100644 --- a/auto3dseg/README.md +++ b/auto3dseg/README.md @@ -64,9 +64,10 @@ The datalist format is based on the datasets released by the [Medical Segmentati See the function `load_decathlon_datalist` in `monai/data/decathlon_datalist.py` for a description of the format. For the AutoRunner, we only need the `training` list in the JSON, it does not use any other fields. -The `fold` key for each image is not required, as the AutoRunner will automatically create cross-validation folds. -If you do add the cross-validation folds beforehand, these should align with the number of folds set in the configuration of the AutoRunner (by default 5, see [notebook](notebooks/auto_runner.ipynb)). -Any other metadata, such as `modality`, `numTraining`, `name`, etc. will not be used by the AutoRunner, but we do recommend using metadata fields to keep track of names and versions of your dataset. +The `fold` key for each image is not required, as the AutoRunner will automatically create cross-validation folds (the number of folds is hard-coded to 5). +If you do add the cross-validation folds beforehand, the AutoRunner will use these by default. +You can also choose to include a `validation` list in the JSON file, in which case the AutoRunner will disable cross-validation and use the specified validation set. +Any other metadata, such as `modality`, `numTraining`, `name`, etc. will not be used by the AutoRunner, but we do recommend using metadata fields to keep track of names and versions of your dataset. If you are using multi-modal scans, it is possible to enter lists of image paths for both the `image` and `label` keys; MONAI will stack them into channels. In short, your `datalist.json` file should look like this: ``` From 4c93d922a688ab5eb334d54c83afd1bb85d2679e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20Nobbe?= Date: Fri, 15 Aug 2025 17:18:38 +0200 Subject: [PATCH 13/18] specify validation key and fixed cross-val behavior MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Daniël Nobbe --- auto3dseg/docs/run_with_minimal_input.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/auto3dseg/docs/run_with_minimal_input.md b/auto3dseg/docs/run_with_minimal_input.md index 4e187c280d..20a7655c65 100644 --- a/auto3dseg/docs/run_with_minimal_input.md +++ b/auto3dseg/docs/run_with_minimal_input.md @@ -33,7 +33,8 @@ For the AutoRunner, you only need the `training` field with its list of training } ``` -In each training item, you can add a `fold` field (with an integer starting at 0) to pre-specify the cross-validation folds, otherwise the AutoRunner will generate its own folds. All trained algorithms will use the same generated or pre-specified folds, the file can be found in the `work_dir` folder that the AutoRunner generates. +In each training item, you can add a `fold` field (with an integer starting at 0) to pre-specify the cross-validation folds, otherwise the AutoRunner will generate its own folds (always 5). All trained algorithms will use the same generated or pre-specified folds, the file can be found in the `work_dir` folder that the AutoRunner generates. +If you have a validation set, you can include it under a `validation` key with the same format as the `training` list. This will disable cross-validation. It is recommended to add a `name` field and any other metadata fields that allow you to track which version of your dataset the models are trained on. Save the file to `./datalist.json`. From 24493b0a74d45a73c03b110a5931e62fe08897bb Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20Nobbe?= Date: Fri, 15 Aug 2025 17:18:45 +0200 Subject: [PATCH 14/18] small comment MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Daniël Nobbe --- auto3dseg/notebooks/msd_crossval_datalist_generator.ipynb | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/auto3dseg/notebooks/msd_crossval_datalist_generator.ipynb b/auto3dseg/notebooks/msd_crossval_datalist_generator.ipynb index 97f620714c..7a76409603 100644 --- a/auto3dseg/notebooks/msd_crossval_datalist_generator.ipynb +++ b/auto3dseg/notebooks/msd_crossval_datalist_generator.ipynb @@ -26,7 +26,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "This notebook contains an example to add cross-validation folds to an existing Medical Segmentation Decathlon datalist, in this case the one of Task09_Spleen." + "This notebook contains an example to add cross-validation folds to an existing Medical Segmentation Decathlon datalist, in this case the one of Task09_Spleen. \n", + "When running repeated experiments, it can be beneficial to create cross-validation folds beforehand." ] }, { From 314df90712f8eb02a017ad878abf38f530bbbf1e Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Fri, 15 Aug 2025 15:19:51 +0000 Subject: [PATCH 15/18] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- auto3dseg/README.md | 6 +++--- auto3dseg/docs/run_with_minimal_input.md | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/auto3dseg/README.md b/auto3dseg/README.md index b951160670..64c68dac62 100644 --- a/auto3dseg/README.md +++ b/auto3dseg/README.md @@ -60,12 +60,12 @@ To further demonstrate the capabilities of **Auto3DSeg**, [here](./tasks/instanc To run Auto3DSeg on your own dataset, you need to build a `datalist.json` file, and pass it to the AutoRunner. -The datalist format is based on the datasets released by the [Medical Segmentation Decathlon](http://medicaldecathlon.com). +The datalist format is based on the datasets released by the [Medical Segmentation Decathlon](http://medicaldecathlon.com). See the function `load_decathlon_datalist` in `monai/data/decathlon_datalist.py` for a description of the format. For the AutoRunner, we only need the `training` list in the JSON, it does not use any other fields. -The `fold` key for each image is not required, as the AutoRunner will automatically create cross-validation folds (the number of folds is hard-coded to 5). -If you do add the cross-validation folds beforehand, the AutoRunner will use these by default. +The `fold` key for each image is not required, as the AutoRunner will automatically create cross-validation folds (the number of folds is hard-coded to 5). +If you do add the cross-validation folds beforehand, the AutoRunner will use these by default. You can also choose to include a `validation` list in the JSON file, in which case the AutoRunner will disable cross-validation and use the specified validation set. Any other metadata, such as `modality`, `numTraining`, `name`, etc. will not be used by the AutoRunner, but we do recommend using metadata fields to keep track of names and versions of your dataset. If you are using multi-modal scans, it is possible to enter lists of image paths for both the `image` and `label` keys; MONAI will stack them into channels. In short, your `datalist.json` file should look like this: diff --git a/auto3dseg/docs/run_with_minimal_input.md b/auto3dseg/docs/run_with_minimal_input.md index 20a7655c65..8a43e3fd19 100644 --- a/auto3dseg/docs/run_with_minimal_input.md +++ b/auto3dseg/docs/run_with_minimal_input.md @@ -33,8 +33,8 @@ For the AutoRunner, you only need the `training` field with its list of training } ``` -In each training item, you can add a `fold` field (with an integer starting at 0) to pre-specify the cross-validation folds, otherwise the AutoRunner will generate its own folds (always 5). All trained algorithms will use the same generated or pre-specified folds, the file can be found in the `work_dir` folder that the AutoRunner generates. -If you have a validation set, you can include it under a `validation` key with the same format as the `training` list. This will disable cross-validation. +In each training item, you can add a `fold` field (with an integer starting at 0) to pre-specify the cross-validation folds, otherwise the AutoRunner will generate its own folds (always 5). All trained algorithms will use the same generated or pre-specified folds, the file can be found in the `work_dir` folder that the AutoRunner generates. +If you have a validation set, you can include it under a `validation` key with the same format as the `training` list. This will disable cross-validation. It is recommended to add a `name` field and any other metadata fields that allow you to track which version of your dataset the models are trained on. Save the file to `./datalist.json`. From b8b8464812de67ba9cd9643fa90fcafe947fcdbf Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20Nobbe?= Date: Fri, 5 Sep 2025 18:27:08 +0200 Subject: [PATCH 16/18] Update auto3dseg/README.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit grammar correction Co-authored-by: Eric Kerfoot <17726042+ericspod@users.noreply.github.com> Signed-off-by: Daniël Nobbe --- auto3dseg/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/auto3dseg/README.md b/auto3dseg/README.md index 64c68dac62..6e3d8f1882 100644 --- a/auto3dseg/README.md +++ b/auto3dseg/README.md @@ -83,7 +83,7 @@ In short, your `datalist.json` file should look like this: ``` -The AutoRunner will create a `work_dir` folder in the directory from which it is ran, which will contain the resulting models and the copied datalist file _with_ cross-validation folds. This allows you to keep track of which datalist file the models are trained on. +The AutoRunner will create a `work_dir` folder in the directory from which it is run, which will contain the resulting models and the copied datalist file _with_ cross-validation folds. This allows you to keep track of which datalist file the models are trained on. See the description below or the file [run_with_minimal_input.md](docs/run_with_minimal_input.md) to use your datalist with the AutoRunner. From 9a00bf42bcf868935ba30ab13c37582425b564f3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20Nobbe?= Date: Fri, 5 Sep 2025 18:37:39 +0200 Subject: [PATCH 17/18] Update run_with_minimal_input.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add "testing" set to datalist description Signed-off-by: Daniël Nobbe --- auto3dseg/docs/run_with_minimal_input.md | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/auto3dseg/docs/run_with_minimal_input.md b/auto3dseg/docs/run_with_minimal_input.md index 8a43e3fd19..917079ccbd 100644 --- a/auto3dseg/docs/run_with_minimal_input.md +++ b/auto3dseg/docs/run_with_minimal_input.md @@ -29,12 +29,19 @@ For the AutoRunner, you only need the `training` field with its list of training {"image": "/path/to/image_1.nii.gz", "label": "/path/to/label_1.nii.gz"}, {"image": "/path/to/image_2.nii.gz", "label": "/path/to/label_2.nii.gz"}, ... + ], + "testing": + [ + "/path/to/test_image_1.nii.gz", + "/path/to/test_image_2.nii.gz", + ... ] } ``` In each training item, you can add a `fold` field (with an integer starting at 0) to pre-specify the cross-validation folds, otherwise the AutoRunner will generate its own folds (always 5). All trained algorithms will use the same generated or pre-specified folds, the file can be found in the `work_dir` folder that the AutoRunner generates. -If you have a validation set, you can include it under a `validation` key with the same format as the `training` list. This will disable cross-validation. +If you have a validation set, you can include it under a `validation` key with the same format as the `training` list. This will disable cross-validation. +A "testing" list can also be added, which only requires the image files, not the labels. If it is included, the AutoRunner will output predictions on the testing set after training. It is recommended to add a `name` field and any other metadata fields that allow you to track which version of your dataset the models are trained on. Save the file to `./datalist.json`. From e19883d3c2b30ec20f38d453477d7baf3f3db07b Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Fri, 5 Sep 2025 16:38:40 +0000 Subject: [PATCH 18/18] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- auto3dseg/docs/run_with_minimal_input.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/auto3dseg/docs/run_with_minimal_input.md b/auto3dseg/docs/run_with_minimal_input.md index 917079ccbd..6e97ebefa9 100644 --- a/auto3dseg/docs/run_with_minimal_input.md +++ b/auto3dseg/docs/run_with_minimal_input.md @@ -40,8 +40,8 @@ For the AutoRunner, you only need the `training` field with its list of training ``` In each training item, you can add a `fold` field (with an integer starting at 0) to pre-specify the cross-validation folds, otherwise the AutoRunner will generate its own folds (always 5). All trained algorithms will use the same generated or pre-specified folds, the file can be found in the `work_dir` folder that the AutoRunner generates. -If you have a validation set, you can include it under a `validation` key with the same format as the `training` list. This will disable cross-validation. -A "testing" list can also be added, which only requires the image files, not the labels. If it is included, the AutoRunner will output predictions on the testing set after training. +If you have a validation set, you can include it under a `validation` key with the same format as the `training` list. This will disable cross-validation. +A "testing" list can also be added, which only requires the image files, not the labels. If it is included, the AutoRunner will output predictions on the testing set after training. It is recommended to add a `name` field and any other metadata fields that allow you to track which version of your dataset the models are trained on. Save the file to `./datalist.json`.