diff --git a/figures/gds_metric_time_epochs.png b/figures/gds_metric_time_epochs.png new file mode 100644 index 0000000000..c70b57b71c Binary files /dev/null and b/figures/gds_metric_time_epochs.png differ diff --git a/figures/gds_total_epoch_time_comparison.png b/figures/gds_total_epoch_time_comparison.png new file mode 100644 index 0000000000..d79a78874e Binary files /dev/null and b/figures/gds_total_epoch_time_comparison.png differ diff --git a/modules/GDS_dataset.ipynb b/modules/GDS_dataset.ipynb new file mode 100644 index 0000000000..3116567eb6 --- /dev/null +++ b/modules/GDS_dataset.ipynb @@ -0,0 +1,472 @@ +{ + "cells": [ + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) MONAI Consortium \n", + "Licensed under the Apache License, Version 2.0 (the \"License\"); \n", + "you may not use this file except in compliance with the License. \n", + "You may obtain a copy of the License at \n", + "    http://www.apache.org/licenses/LICENSE-2.0 \n", + "Unless required by applicable law or agreed to in writing, software \n", + "distributed under the License is distributed on an \"AS IS\" BASIS, \n", + "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. \n", + "See the License for the specific language governing permissions and \n", + "limitations under the License." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# GDS Integration\n", + "\n", + "This notebook introduces how to integrate GDS into MONAI. It mainly includes several parts as shown below.\n", + "- What is GPUDirect Storage(GDS)?\n", + "\n", + " GDS is the newest addition to the GPUDirect family. Like GPUDirect peer to peer (https://developer.nvidia.com/gpudirect) that enables a direct memory access (DMA) path between the memory of two graphics processing units (GPUs) and GPUDirect RDMA that enables a direct DMA path to a network interface card (NIC), GDS enables a direct DMA data path between GPU memory and storage, thus avoiding a bounce buffer through the CPU. This direct path can increase system bandwidth while decreasing latency and utilization load on the CPU and GPU.\n", + "\n", + "- GDS hardware and software requirements and how to install GDS.\n", + "\n", + " 1. GDS has been tested on following NVIDIA GPUs: T10x, T4, A10, Quadro P6000, A100, and V100. For a full list of GPUs that GDS works with, refer to the [Known Limitations](https://docs.nvidia.com/gpudirect-storage/release-notes/index.html#known-limitations) section. For more requirements, you can refer to the 3 and 4 in this [link](https://docs.nvidia.com/gpudirect-storage/release-notes/index.html#mofed-fs-req).\n", + "\n", + " 2. To install GDS, follow the detailed steps provided in this [section](https://docs.nvidia.com/gpudirect-storage/troubleshooting-guide/index.html#troubleshoot-install). To verify successful GDS installation, run the following command:\n", + " \n", + " ```/usr/local/cuda-./gds/tools/gdscheck.py -p``` \n", + " \n", + " (Replace X with the major version of the CUDA toolkit, and Y with the minor version.)\n", + "\n", + "- `GDSDataset` inherited from `PersistentDataset`.\n", + "\n", + " In this tutorial, we have implemented a `GDSDataset` that inherits from `PersistentDataset`. We have re-implemented the `_cachecheck` method to create and save cache using GDS.\n", + "\n", + "- A simple demo comparing the time taken with and without GDS.\n", + "\n", + " In this tutorial, we are creating a conda environment to install `kvikio`, which provides a Python API for GDS. To install `kvikio` using other methods, refer to https://github.com/rapidsai/kvikio#install.\n", + "\n", + " ```conda create -n gds_env -c rapidsai-nightly -c conda-forge python=3.10 cuda-version=11.8 kvikio```\n", + "\n", + "- An End-to-end workflow Profiling Comparison" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Setup environment" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!python -c \"import monai\" || pip install -q \"monai-weekly[nibabel, matplotlib]\"" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Setup imports" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import time\n", + "import cupy\n", + "import torch\n", + "import shutil\n", + "import tempfile\n", + "import numpy as np\n", + "from typing import Any\n", + "from pathlib import Path\n", + "from copy import deepcopy\n", + "from collections.abc import Callable, Sequence\n", + "from kvikio.numpy import fromfile, tofile\n", + "\n", + "import monai\n", + "import monai.transforms as mt\n", + "from monai.config import print_config\n", + "from monai.data.utils import pickle_hashing, SUPPORTED_PICKLE_MOD\n", + "from monai.utils import convert_to_tensor, set_determinism, look_up_option\n", + "\n", + "print_config()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Setup data directory\n", + "\n", + "You can specify a directory with the `MONAI_DATA_DIRECTORY` environment variable. \n", + "This allows you to save results and reuse downloads. \n", + "If not specified, a temporary directory will be used." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "/raid/yliu/test_tutorial\n" + ] + } + ], + "source": [ + "directory = os.environ.get(\"MONAI_DATA_DIRECTORY\")\n", + "root_dir = tempfile.mkdtemp() if directory is None else directory\n", + "print(root_dir)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## GDSDataset" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "class GDSDataset(monai.data.PersistentDataset):\n", + " def __init__(\n", + " self,\n", + " data: Sequence,\n", + " transform: Sequence[Callable] | Callable,\n", + " cache_dir: Path | str | None,\n", + " hash_func: Callable[..., bytes] = pickle_hashing,\n", + " hash_transform: Callable[..., bytes] | None = None,\n", + " reset_ops_id: bool = True,\n", + " device: int = None,\n", + " **kwargs: Any,\n", + " ) -> None:\n", + " super().__init__(\n", + " data=data,\n", + " transform=transform,\n", + " cache_dir=cache_dir,\n", + " hash_func=hash_func,\n", + " hash_transform=hash_transform,\n", + " reset_ops_id=reset_ops_id,\n", + " **kwargs,\n", + " )\n", + " self.device = device\n", + "\n", + " def _cachecheck(self, item_transformed):\n", + " \"\"\"given the input dictionary ``item_transformed``, return a transformed version of it\"\"\"\n", + " hashfile = None\n", + " # compute a cache id\n", + " if self.cache_dir is not None:\n", + " data_item_md5 = self.hash_func(item_transformed).decode(\"utf-8\")\n", + " data_item_md5 += self.transform_hash\n", + " hashfile = self.cache_dir / f\"{data_item_md5}.pt\"\n", + "\n", + " if hashfile is not None and hashfile.is_file(): # cache hit\n", + " with cupy.cuda.Device(self.device):\n", + " item = {}\n", + " for k in item_transformed:\n", + " meta_k = torch.load(self.cache_dir / f\"{hashfile.name}-{k}-meta\")\n", + " item[k] = fromfile(f\"{hashfile}-{k}\", dtype=np.float32, like=cupy.empty(()))\n", + " item[k] = convert_to_tensor(item[k].reshape(meta_k[\"shape\"]), device=f\"cuda:{self.device}\")\n", + " item[f\"{k}_meta_dict\"] = meta_k\n", + " return item\n", + "\n", + " # create new cache\n", + " _item_transformed = self._pre_transform(deepcopy(item_transformed)) # keep the original hashed\n", + " if hashfile is None:\n", + " return _item_transformed\n", + "\n", + " for k in _item_transformed: # {'image': ..., 'label': ...}\n", + " _item_transformed_meta = _item_transformed[k].meta\n", + " _item_transformed_data = _item_transformed[k].array\n", + " _item_transformed_meta[\"shape\"] = _item_transformed_data.shape\n", + " tofile(_item_transformed_data, f\"{hashfile}-{k}\")\n", + " try:\n", + " # NOTE: Writing to a temporary directory and then using a nearly atomic rename operation\n", + " # to make the cache more robust to manual killing of parent process\n", + " # which may leave partially written cache files in an incomplete state\n", + " with tempfile.TemporaryDirectory() as tmpdirname:\n", + " meta_hash_file_name = f\"{hashfile.name}-{k}-meta\"\n", + " meta_hash_file = self.cache_dir / meta_hash_file_name\n", + " temp_hash_file = Path(tmpdirname) / meta_hash_file_name\n", + " torch.save(\n", + " obj=_item_transformed_meta,\n", + " f=temp_hash_file,\n", + " pickle_module=look_up_option(self.pickle_module, SUPPORTED_PICKLE_MOD),\n", + " pickle_protocol=self.pickle_protocol,\n", + " )\n", + " if temp_hash_file.is_file() and not meta_hash_file.is_file():\n", + " # On Unix, if target exists and is a file, it will be replaced silently if the\n", + " # user has permission.\n", + " # for more details: https://docs.python.org/3/library/shutil.html#shutil.move.\n", + " try:\n", + " shutil.move(str(temp_hash_file), meta_hash_file)\n", + " except FileExistsError:\n", + " pass\n", + " except PermissionError: # project-monai/monai issue #3613\n", + " pass\n", + " open(hashfile, \"a\").close() # store cacheid\n", + "\n", + " return _item_transformed" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## A simple demo to show how to use the GDS" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Download dataset and set dataset path" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2023-07-12 09:26:17,878 - INFO - Expected md5 is None, skip md5 check for file samples.zip.\n", + "2023-07-12 09:26:17,878 - INFO - File exists: samples.zip, skipped downloading.\n", + "2023-07-12 09:26:17,879 - INFO - Writing into directory: /raid/yliu/test_tutorial.\n" + ] + } + ], + "source": [ + "sample_url = \"https://github.com/Project-MONAI/MONAI-extra-test-data/releases\"\n", + "sample_url += \"/download/0.8.1/totalSegmentator_mergedLabel_samples.zip\"\n", + "monai.apps.download_and_extract(sample_url, output_dir=root_dir, filepath=\"samples.zip\")\n", + "\n", + "base_name = os.path.join(root_dir, \"totalSegmentator_mergedLabel_samples\")\n", + "input_data = []\n", + "for filename in os.listdir(os.path.join(base_name, \"imagesTr\")):\n", + " input_data.append(\n", + " {\n", + " \"image\": os.path.join(base_name, \"imagesTr\", filename),\n", + " \"label\": os.path.join(base_name, \"labelsTr\", filename),\n", + " }\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Set deterministic for reproducibility" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "set_determinism(seed=0)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Setup transforms" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "device = \"cuda:0\" if torch.cuda.is_available() else \"cpu\"\n", + "transform = mt.Compose(\n", + " [\n", + " mt.LoadImageD(keys=(\"image\", \"label\"), image_only=True, ensure_channel_first=True),\n", + " mt.SpacingD(keys=(\"image\", \"label\"), pixdim=1.5),\n", + " mt.EnsureTypeD(keys=(\"image\", \"label\"), device=device),\n", + " mt.RandRotateD(\n", + " keys=(\"image\", \"label\"),\n", + " prob=1.0,\n", + " range_x=0.1,\n", + " range_y=0.1,\n", + " range_z=0.3,\n", + " mode=(\"bilinear\", \"nearest\"),\n", + " ),\n", + " mt.RandZoomD(keys=(\"image\", \"label\"), prob=1.0, min_zoom=0.8, max_zoom=1.2, mode=(\"trilinear\", \"nearest\")),\n", + " mt.ResizeWithPadOrCropD(keys=(\"image\", \"label\"), spatial_size=(200, 210, 220)),\n", + " ]\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Using GDSDataset" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch0 time 19.746733903884888\n", + "epoch1 time 0.9976603984832764\n", + "epoch2 time 0.982248067855835\n", + "epoch3 time 0.9838874340057373\n", + "epoch4 time 0.9793403148651123\n", + "total time 23.69102692604065\n" + ] + } + ], + "source": [ + "cache_dir = os.path.join(root_dir, \"gds_cache_dir\")\n", + "dataset = GDSDataset(data=input_data, transform=transform, cache_dir=cache_dir, device=0)\n", + "\n", + "data_loader = monai.data.ThreadDataLoader(dataset, batch_size=1)\n", + "\n", + "s = time.time()\n", + "for i in range(5):\n", + " e = time.time()\n", + " for _x in data_loader:\n", + " pass\n", + " print(f\"epoch{i} time\", time.time() - e)\n", + "print(\"total time\", time.time() - s)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Using PersistentDataset" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch0 time 21.206729650497437\n", + "epoch1 time 1.510526180267334\n", + "epoch2 time 1.588256597518921\n", + "epoch3 time 1.4431262016296387\n", + "epoch4 time 1.4594802856445312\n", + "total time 27.20927882194519\n" + ] + } + ], + "source": [ + "cache_dir_per = os.path.join(root_dir, \"persistent_cache_dir\")\n", + "dataset = monai.data.PersistentDataset(data=input_data, transform=transform, cache_dir=cache_dir_per)\n", + "data_loader = monai.data.ThreadDataLoader(dataset, batch_size=1)\n", + "\n", + "s = time.time()\n", + "for i in range(5):\n", + " e = time.time()\n", + " for _x in data_loader:\n", + " pass\n", + " print(f\"epoch{i} time\", time.time() - e)\n", + "print(\"total time\", time.time() - s)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## An End-to-end workflow Profiling Comparison\n", + "\n", + "We also conducted a quantitative analysis of the end-to-end workflow performence using the brats dataset. To learn how to implement the full pipeline, please follow this [tutorial](/home/lab/yliu/tutorials/acceleration/distributed_training/brats_training_ddp.py). The only step that requires modification is the dataset part. The end-to-end pipeline was benchmarked on a V100 32G GPU." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Total time and every epoch time comparison\n", + "![gds_benchmark_total_epoch_time_comparison](../figures/gds_total_epoch_time_comparison.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Total time to achieve metrics comparison\n", + "![gds_benchmark_achieve_metrics_comparison](../figures/gds_metric_time_epochs.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Cleanup data directory\n", + "\n", + "Remove directory if a temporary was used." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "if directory is None:\n", + " shutil.rmtree(root_dir)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.12" + }, + "orig_nbformat": 4 + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/runner.sh b/runner.sh index 89f313a1bf..9e4eb0b0e9 100755 --- a/runner.sh +++ b/runner.sh @@ -70,6 +70,7 @@ doesnt_contain_max_epochs=("${doesnt_contain_max_epochs[@]}" lazy_resampling_com doesnt_contain_max_epochs=("${doesnt_contain_max_epochs[@]}" TensorRT_inference_acceleration.ipynb) doesnt_contain_max_epochs=("${doesnt_contain_max_epochs[@]}" lazy_resampling_benchmark.ipynb) doesnt_contain_max_epochs=("${doesnt_contain_max_epochs[@]}" modular_patch_inferer.ipynb) +doesnt_contain_max_epochs=("${doesnt_contain_max_epochs[@]}" GDS_dataset.ipynb) # Execution of the notebook in these folders / with the filename cannot be automated skip_run_papermill=() @@ -107,6 +108,7 @@ skip_run_papermill=("${skip_run_papermill[@]}" .*transform_visualization*) # ht skip_run_papermill=("${skip_run_papermill[@]}" .*TensorRT_inference_acceleration*) skip_run_papermill=("${skip_run_papermill[@]}" .*mednist_classifier_ray*) # https://github.com/Project-MONAI/tutorials/issues/1307 skip_run_papermill=("${skip_run_papermill[@]}" .*TorchIO_MONAI_PyTorch_Lightning*) # https://github.com/Project-MONAI/tutorials/issues/1324 +skip_run_papermill=("${skip_run_papermill[@]}" .*GDS_dataset*) # https://github.com/Project-MONAI/tutorials/issues/1324 # output formatting separator=""