diff --git a/figures/gds_metric_time_epochs.png b/figures/gds_metric_time_epochs.png
new file mode 100644
index 0000000000..c70b57b71c
Binary files /dev/null and b/figures/gds_metric_time_epochs.png differ
diff --git a/figures/gds_total_epoch_time_comparison.png b/figures/gds_total_epoch_time_comparison.png
new file mode 100644
index 0000000000..d79a78874e
Binary files /dev/null and b/figures/gds_total_epoch_time_comparison.png differ
diff --git a/modules/GDS_dataset.ipynb b/modules/GDS_dataset.ipynb
new file mode 100644
index 0000000000..3116567eb6
--- /dev/null
+++ b/modules/GDS_dataset.ipynb
@@ -0,0 +1,472 @@
+{
+ "cells": [
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Copyright (c) MONAI Consortium  \n",
+    "Licensed under the Apache License, Version 2.0 (the \"License\");  \n",
+    "you may not use this file except in compliance with the License.  \n",
+    "You may obtain a copy of the License at  \n",
+    "&nbsp;&nbsp;&nbsp;&nbsp;http://www.apache.org/licenses/LICENSE-2.0  \n",
+    "Unless required by applicable law or agreed to in writing, software  \n",
+    "distributed under the License is distributed on an \"AS IS\" BASIS,  \n",
+    "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  \n",
+    "See the License for the specific language governing permissions and  \n",
+    "limitations under the License."
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# GDS Integration\n",
+    "\n",
+    "This notebook introduces how to integrate GDS into MONAI. It mainly includes several parts as shown below.\n",
+    "- What is GPUDirect Storage(GDS)?\n",
+    "\n",
+    "    GDS is the newest addition to the GPUDirect family. Like GPUDirect peer to peer (https://developer.nvidia.com/gpudirect) that enables a direct memory access (DMA) path between the memory of two graphics processing units (GPUs) and GPUDirect RDMA that enables a direct DMA path to a network interface card (NIC), GDS enables a direct DMA data path between GPU memory and storage, thus avoiding a bounce buffer through the CPU. This direct path can increase system bandwidth while decreasing latency and utilization load on the CPU and GPU.\n",
+    "\n",
+    "- GDS hardware and software requirements and how to install GDS.\n",
+    "\n",
+    "    1. GDS has been tested on following NVIDIA GPUs: T10x, T4, A10, Quadro P6000, A100, and V100. For a full list of GPUs that GDS works with, refer to the [Known Limitations](https://docs.nvidia.com/gpudirect-storage/release-notes/index.html#known-limitations) section. For more requirements, you can refer to the 3 and 4 in this [link](https://docs.nvidia.com/gpudirect-storage/release-notes/index.html#mofed-fs-req).\n",
+    "\n",
+    "    2. To install GDS, follow the detailed steps provided in this [section](https://docs.nvidia.com/gpudirect-storage/troubleshooting-guide/index.html#troubleshoot-install). To verify successful GDS installation, run the following command:\n",
+    "        \n",
+    "        ```/usr/local/cuda-<x>.<y>/gds/tools/gdscheck.py -p``` \n",
+    "        \n",
+    "        (Replace X with the major version of the CUDA toolkit, and Y with the minor version.)\n",
+    "\n",
+    "- `GDSDataset` inherited from `PersistentDataset`.\n",
+    "\n",
+    "    In this tutorial, we have implemented a `GDSDataset` that inherits from `PersistentDataset`. We have re-implemented the `_cachecheck` method to create and save cache using GDS.\n",
+    "\n",
+    "- A simple demo comparing the time taken with and without GDS.\n",
+    "\n",
+    "   In this tutorial, we are creating a conda environment to install `kvikio`, which provides a Python API for GDS. To install `kvikio` using other methods, refer to https://github.com/rapidsai/kvikio#install.\n",
+    "\n",
+    "    ```conda create -n gds_env -c rapidsai-nightly -c conda-forge python=3.10 cuda-version=11.8 kvikio```\n",
+    "\n",
+    "- An End-to-end workflow Profiling Comparison"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Setup environment"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!python -c \"import monai\" || pip install -q \"monai-weekly[nibabel, matplotlib]\""
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Setup imports"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import time\n",
+    "import cupy\n",
+    "import torch\n",
+    "import shutil\n",
+    "import tempfile\n",
+    "import numpy as np\n",
+    "from typing import Any\n",
+    "from pathlib import Path\n",
+    "from copy import deepcopy\n",
+    "from collections.abc import Callable, Sequence\n",
+    "from kvikio.numpy import fromfile, tofile\n",
+    "\n",
+    "import monai\n",
+    "import monai.transforms as mt\n",
+    "from monai.config import print_config\n",
+    "from monai.data.utils import pickle_hashing, SUPPORTED_PICKLE_MOD\n",
+    "from monai.utils import convert_to_tensor, set_determinism, look_up_option\n",
+    "\n",
+    "print_config()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Setup data directory\n",
+    "\n",
+    "You can specify a directory with the `MONAI_DATA_DIRECTORY` environment variable.  \n",
+    "This allows you to save results and reuse downloads.  \n",
+    "If not specified, a temporary directory will be used."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "/raid/yliu/test_tutorial\n"
+     ]
+    }
+   ],
+   "source": [
+    "directory = os.environ.get(\"MONAI_DATA_DIRECTORY\")\n",
+    "root_dir = tempfile.mkdtemp() if directory is None else directory\n",
+    "print(root_dir)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## GDSDataset"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class GDSDataset(monai.data.PersistentDataset):\n",
+    "    def __init__(\n",
+    "        self,\n",
+    "        data: Sequence,\n",
+    "        transform: Sequence[Callable] | Callable,\n",
+    "        cache_dir: Path | str | None,\n",
+    "        hash_func: Callable[..., bytes] = pickle_hashing,\n",
+    "        hash_transform: Callable[..., bytes] | None = None,\n",
+    "        reset_ops_id: bool = True,\n",
+    "        device: int = None,\n",
+    "        **kwargs: Any,\n",
+    "    ) -> None:\n",
+    "        super().__init__(\n",
+    "            data=data,\n",
+    "            transform=transform,\n",
+    "            cache_dir=cache_dir,\n",
+    "            hash_func=hash_func,\n",
+    "            hash_transform=hash_transform,\n",
+    "            reset_ops_id=reset_ops_id,\n",
+    "            **kwargs,\n",
+    "        )\n",
+    "        self.device = device\n",
+    "\n",
+    "    def _cachecheck(self, item_transformed):\n",
+    "        \"\"\"given the input dictionary ``item_transformed``, return a transformed version of it\"\"\"\n",
+    "        hashfile = None\n",
+    "        # compute a cache id\n",
+    "        if self.cache_dir is not None:\n",
+    "            data_item_md5 = self.hash_func(item_transformed).decode(\"utf-8\")\n",
+    "            data_item_md5 += self.transform_hash\n",
+    "            hashfile = self.cache_dir / f\"{data_item_md5}.pt\"\n",
+    "\n",
+    "        if hashfile is not None and hashfile.is_file():  # cache hit\n",
+    "            with cupy.cuda.Device(self.device):\n",
+    "                item = {}\n",
+    "                for k in item_transformed:\n",
+    "                    meta_k = torch.load(self.cache_dir / f\"{hashfile.name}-{k}-meta\")\n",
+    "                    item[k] = fromfile(f\"{hashfile}-{k}\", dtype=np.float32, like=cupy.empty(()))\n",
+    "                    item[k] = convert_to_tensor(item[k].reshape(meta_k[\"shape\"]), device=f\"cuda:{self.device}\")\n",
+    "                    item[f\"{k}_meta_dict\"] = meta_k\n",
+    "                return item\n",
+    "\n",
+    "        # create new cache\n",
+    "        _item_transformed = self._pre_transform(deepcopy(item_transformed))  # keep the original hashed\n",
+    "        if hashfile is None:\n",
+    "            return _item_transformed\n",
+    "\n",
+    "        for k in _item_transformed:  # {'image': ..., 'label': ...}\n",
+    "            _item_transformed_meta = _item_transformed[k].meta\n",
+    "            _item_transformed_data = _item_transformed[k].array\n",
+    "            _item_transformed_meta[\"shape\"] = _item_transformed_data.shape\n",
+    "            tofile(_item_transformed_data, f\"{hashfile}-{k}\")\n",
+    "            try:\n",
+    "                # NOTE: Writing to a temporary directory and then using a nearly atomic rename operation\n",
+    "                #       to make the cache more robust to manual killing of parent process\n",
+    "                #       which may leave partially written cache files in an incomplete state\n",
+    "                with tempfile.TemporaryDirectory() as tmpdirname:\n",
+    "                    meta_hash_file_name = f\"{hashfile.name}-{k}-meta\"\n",
+    "                    meta_hash_file = self.cache_dir / meta_hash_file_name\n",
+    "                    temp_hash_file = Path(tmpdirname) / meta_hash_file_name\n",
+    "                    torch.save(\n",
+    "                        obj=_item_transformed_meta,\n",
+    "                        f=temp_hash_file,\n",
+    "                        pickle_module=look_up_option(self.pickle_module, SUPPORTED_PICKLE_MOD),\n",
+    "                        pickle_protocol=self.pickle_protocol,\n",
+    "                    )\n",
+    "                    if temp_hash_file.is_file() and not meta_hash_file.is_file():\n",
+    "                        # On Unix, if target exists and is a file, it will be replaced silently if the\n",
+    "                        # user has permission.\n",
+    "                        # for more details: https://docs.python.org/3/library/shutil.html#shutil.move.\n",
+    "                        try:\n",
+    "                            shutil.move(str(temp_hash_file), meta_hash_file)\n",
+    "                        except FileExistsError:\n",
+    "                            pass\n",
+    "            except PermissionError:  # project-monai/monai issue #3613\n",
+    "                pass\n",
+    "        open(hashfile, \"a\").close()  # store cacheid\n",
+    "\n",
+    "        return _item_transformed"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## A simple demo to show how to use the GDS"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Download dataset and set dataset path"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "2023-07-12 09:26:17,878 - INFO - Expected md5 is None, skip md5 check for file samples.zip.\n",
+      "2023-07-12 09:26:17,878 - INFO - File exists: samples.zip, skipped downloading.\n",
+      "2023-07-12 09:26:17,879 - INFO - Writing into directory: /raid/yliu/test_tutorial.\n"
+     ]
+    }
+   ],
+   "source": [
+    "sample_url = \"https://github.com/Project-MONAI/MONAI-extra-test-data/releases\"\n",
+    "sample_url += \"/download/0.8.1/totalSegmentator_mergedLabel_samples.zip\"\n",
+    "monai.apps.download_and_extract(sample_url, output_dir=root_dir, filepath=\"samples.zip\")\n",
+    "\n",
+    "base_name = os.path.join(root_dir, \"totalSegmentator_mergedLabel_samples\")\n",
+    "input_data = []\n",
+    "for filename in os.listdir(os.path.join(base_name, \"imagesTr\")):\n",
+    "    input_data.append(\n",
+    "        {\n",
+    "            \"image\": os.path.join(base_name, \"imagesTr\", filename),\n",
+    "            \"label\": os.path.join(base_name, \"labelsTr\", filename),\n",
+    "        }\n",
+    "    )"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Set deterministic for reproducibility"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "set_determinism(seed=0)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Setup transforms"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "device = \"cuda:0\" if torch.cuda.is_available() else \"cpu\"\n",
+    "transform = mt.Compose(\n",
+    "    [\n",
+    "        mt.LoadImageD(keys=(\"image\", \"label\"), image_only=True, ensure_channel_first=True),\n",
+    "        mt.SpacingD(keys=(\"image\", \"label\"), pixdim=1.5),\n",
+    "        mt.EnsureTypeD(keys=(\"image\", \"label\"), device=device),\n",
+    "        mt.RandRotateD(\n",
+    "            keys=(\"image\", \"label\"),\n",
+    "            prob=1.0,\n",
+    "            range_x=0.1,\n",
+    "            range_y=0.1,\n",
+    "            range_z=0.3,\n",
+    "            mode=(\"bilinear\", \"nearest\"),\n",
+    "        ),\n",
+    "        mt.RandZoomD(keys=(\"image\", \"label\"), prob=1.0, min_zoom=0.8, max_zoom=1.2, mode=(\"trilinear\", \"nearest\")),\n",
+    "        mt.ResizeWithPadOrCropD(keys=(\"image\", \"label\"), spatial_size=(200, 210, 220)),\n",
+    "    ]\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Using GDSDataset"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch0 time 19.746733903884888\n",
+      "epoch1 time 0.9976603984832764\n",
+      "epoch2 time 0.982248067855835\n",
+      "epoch3 time 0.9838874340057373\n",
+      "epoch4 time 0.9793403148651123\n",
+      "total time 23.69102692604065\n"
+     ]
+    }
+   ],
+   "source": [
+    "cache_dir = os.path.join(root_dir, \"gds_cache_dir\")\n",
+    "dataset = GDSDataset(data=input_data, transform=transform, cache_dir=cache_dir, device=0)\n",
+    "\n",
+    "data_loader = monai.data.ThreadDataLoader(dataset, batch_size=1)\n",
+    "\n",
+    "s = time.time()\n",
+    "for i in range(5):\n",
+    "    e = time.time()\n",
+    "    for _x in data_loader:\n",
+    "        pass\n",
+    "    print(f\"epoch{i} time\", time.time() - e)\n",
+    "print(\"total time\", time.time() - s)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Using PersistentDataset"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch0 time 21.206729650497437\n",
+      "epoch1 time 1.510526180267334\n",
+      "epoch2 time 1.588256597518921\n",
+      "epoch3 time 1.4431262016296387\n",
+      "epoch4 time 1.4594802856445312\n",
+      "total time 27.20927882194519\n"
+     ]
+    }
+   ],
+   "source": [
+    "cache_dir_per = os.path.join(root_dir, \"persistent_cache_dir\")\n",
+    "dataset = monai.data.PersistentDataset(data=input_data, transform=transform, cache_dir=cache_dir_per)\n",
+    "data_loader = monai.data.ThreadDataLoader(dataset, batch_size=1)\n",
+    "\n",
+    "s = time.time()\n",
+    "for i in range(5):\n",
+    "    e = time.time()\n",
+    "    for _x in data_loader:\n",
+    "        pass\n",
+    "    print(f\"epoch{i} time\", time.time() - e)\n",
+    "print(\"total time\", time.time() - s)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## An End-to-end workflow Profiling Comparison\n",
+    "\n",
+    "We also conducted a quantitative analysis of the end-to-end workflow performence using the brats dataset. To learn how to implement the full pipeline, please follow this [tutorial](/home/lab/yliu/tutorials/acceleration/distributed_training/brats_training_ddp.py). The only step that requires modification is the dataset part. The end-to-end pipeline was benchmarked on a V100 32G GPU."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Total time and every epoch time comparison\n",
+    "![gds_benchmark_total_epoch_time_comparison](../figures/gds_total_epoch_time_comparison.png)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Total time to achieve metrics comparison\n",
+    "![gds_benchmark_achieve_metrics_comparison](../figures/gds_metric_time_epochs.png)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Cleanup data directory\n",
+    "\n",
+    "Remove directory if a temporary was used."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "if directory is None:\n",
+    "    shutil.rmtree(root_dir)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.12"
+  },
+  "orig_nbformat": 4
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/runner.sh b/runner.sh
index 89f313a1bf..9e4eb0b0e9 100755
--- a/runner.sh
+++ b/runner.sh
@@ -70,6 +70,7 @@ doesnt_contain_max_epochs=("${doesnt_contain_max_epochs[@]}" lazy_resampling_com
 doesnt_contain_max_epochs=("${doesnt_contain_max_epochs[@]}" TensorRT_inference_acceleration.ipynb)
 doesnt_contain_max_epochs=("${doesnt_contain_max_epochs[@]}" lazy_resampling_benchmark.ipynb)
 doesnt_contain_max_epochs=("${doesnt_contain_max_epochs[@]}" modular_patch_inferer.ipynb)
+doesnt_contain_max_epochs=("${doesnt_contain_max_epochs[@]}" GDS_dataset.ipynb)
 
 # Execution of the notebook in these folders / with the filename cannot be automated
 skip_run_papermill=()
@@ -107,6 +108,7 @@ skip_run_papermill=("${skip_run_papermill[@]}" .*transform_visualization*)  # ht
 skip_run_papermill=("${skip_run_papermill[@]}" .*TensorRT_inference_acceleration*)
 skip_run_papermill=("${skip_run_papermill[@]}" .*mednist_classifier_ray*)  # https://github.com/Project-MONAI/tutorials/issues/1307
 skip_run_papermill=("${skip_run_papermill[@]}" .*TorchIO_MONAI_PyTorch_Lightning*)  # https://github.com/Project-MONAI/tutorials/issues/1324
+skip_run_papermill=("${skip_run_papermill[@]}" .*GDS_dataset*)  # https://github.com/Project-MONAI/tutorials/issues/1324
 
 # output formatting
 separator=""