From e21731ac30b984145a9c20dcdb0edbc3982e236b Mon Sep 17 00:00:00 2001
From: Katya Macedo <katya.macedo@imply.io>
Date: Fri, 16 Dec 2022 16:22:07 -0600
Subject: [PATCH] Add Partitioning tutorial

---
 .../partitioned-by-tutorial.ipynb             | 428 ++++++++++++++++++
 1 file changed, 428 insertions(+)
 create mode 100644 examples/quickstart/jupyter-notebooks/partitioned-by-tutorial.ipynb

diff --git a/examples/quickstart/jupyter-notebooks/partitioned-by-tutorial.ipynb b/examples/quickstart/jupyter-notebooks/partitioned-by-tutorial.ipynb
new file mode 100644
index 000000000000..02028c3e0517
--- /dev/null
+++ b/examples/quickstart/jupyter-notebooks/partitioned-by-tutorial.ipynb
@@ -0,0 +1,428 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "ad4e60b6",
+   "metadata": {
+    "deletable": true,
+    "editable": true,
+    "tags": []
+   },
+   "source": [
+    "# Tutorial: Druid SQL segment sizing and partitioning\n",
+    "\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n",
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "  \n",
+    "Partitioning is a method of organizing a large datasource into independent partitions.\n",
+    "Partitioning reduces the size of your data and increases query performance.\n",
+    "\n",
+    "At ingestion, Apache Druid always partitions its data by time.\n",
+    "Each time chunk is then divided into one or more [segments](https://druid.apache.org/docs/latest/design/segments.html).\n",
+    "\n",
+    "This tutorial describes how to configure partitioning for the Druid SQL ingestion method. For information about partitioning configurations supported by other ingestion methods, see [How to configure partitioning](https://druid.apache.org/docs/latest/ingestion/partitioning.html#how-to-configure-partitioning)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8d6bbbcb",
+   "metadata": {
+    "deletable": true,
+    "tags": []
+   },
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "Make sure that you meet the requirements outlined in the README.md file of the [apache/druid repo](https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/).\n",
+    "Specifically, you need the following:\n",
+    "- Knowledge of SQL\n",
+    "- [Python3](https://www.python.org/downloads/)\n",
+    "- [The `requests` package for Python](https://requests.readthedocs.io/en/latest/user/install/)\n",
+    "- [JupyterLab](https://jupyter.org/install#jupyterlab) (recommended) or [Jupyter Notebook](https://jupyter.org/install#jupyter-notebook) running on a non-default port. Druid and Jupyter both default to port `8888`, so you need to start Jupyter on a different port. \n",
+    "- An available Druid instance. This tutorial uses the `micro-quickstart` configuration described in the [Druid quickstart](https://druid.apache.org/docs/latest/tutorials/index.html), so no authentication or authorization is required unless explicitly mentioned. If you haven’t already, download Druid version 24.0 or higher and start Druid services as described in the quickstart."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8f8e64f0-c29a-473c-8783-a2ff8648acd7",
+   "metadata": {},
+   "source": [
+    "## Prepare your environment\n",
+    "\n",
+    "Start by running the following cell. It imports the required Python packages and defines a variable for the Druid host."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b7f08a52",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import requests\n",
+    "import json\n",
+    "\n",
+    "# druid_host is the hostname and port for your Druid deployment. \n",
+    "# In a distributed environment, use the Router service  as the `druid_host`. \n",
+    "druid_host = \"http://localhost:8888\"\n",
+    "dataSourceName = \"partitioning-tutorial\"\n",
+    "print(f\"\\033[1mDruid host\\033[0m: {druid_host}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e893ef7d-7136-442f-8bd9-31b5a5276518",
+   "metadata": {},
+   "source": [
+    "In the rest of the tutorial, the `endpoint`, `http_method`, and `payload` variables are updated to accomplish different tasks."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ebd8c7db-c39f-4ef7-86ec-81f405e02550",
+   "metadata": {},
+   "source": [
+    "## Segment size\n",
+    "\n",
+    "A segment is the smallest unit of storage in Druid.\n",
+    "It is recommended that you optimize your segment file size at ingestion time for Druid to operate well under a heavy query load.\n",
+    "\n",
+    "Consider the following to optimize your segment file size:\n",
+    "\n",
+    "- The number of rows per segment should be around five million. You can set the number of rows per segment using the `rowsPerSegment` query context parameter in the [Druid SQL API](https://druid.apache.org/docs/latest/querying/sql-api.html) or as a [JDBC connection properties object](https://druid.apache.org/docs/latest/querying/sql-jdbc.html). To specify the `rowsPerSegment` parameters in the Druid web console, navigate to the **Query** page, then click **Engine > Edit context** to bring up the **Edit query context** dialog. For more information on how to specify query context parameters, see [Setting the query context](https://druid.apache.org/docs/latest/querying/sql-query-context.html#setting-the-query-context).\n",
+    "- Segment file size should be within the range of 300-700 MB. The number of rows per segment takes precedence over the segment byte size. \n",
+    "\n",
+    "For more information on segment sizing, see [Segment size optimization](https://druid.apache.org/docs/latest/operations/segment-optimization.html)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "84cb68a0-beb1-47d5-9fd5-384ea0caa35d",
+   "metadata": {},
+   "source": [
+    "## PARTITIONED BY\n",
+    "\n",
+    "In Druid SQL, the granularity of a segment is defined by the granularity of the PARTITIONED BY clause.\n",
+    "\n",
+    "[INSERT](https://druid.apache.org/docs/latest/multi-stage-query/reference.html#insert) and [REPLACE](https://druid.apache.org/docs/latest/multi-stage-query/reference.html#replace) statements both require the PARTITIONED BY clause.\n",
+    "\n",
+    "PARTITIONED BY accepts the following time granularity arguments:\n",
+    "- `time_unit`\n",
+    "- `TIME_FLOOR(__time, period)` \n",
+    "- `FLOOR(__time TO time_unit)`\n",
+    "- `ALL` or `ALL TIME`\n",
+    "\n",
+    "Continue reading to learn about each of the supported arguments.\n",
+    "\n",
+    "### Time unit\n",
+    "\n",
+    "`PARTITIONED BY(time_unit)`. Partition by `SECOND`, `MINUTE`, `HOUR`, `DAY`, `WEEK`, `MONTH`, `QUARTER`, or `YEAR`.\n",
+    "\n",
+    "For example, run the following cell to ingest data from an external source into a table named `partitioning-tutorial` and partition the datasource by `DAY`:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "045f782c-74d8-4447-9487-529071812b51",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql/task\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "# If you already have an existing datasource named partitioning-tutorial, use REPLACE INTO instead of INSERT INTO.\n",
+    "payload = json.dumps({\n",
+    "\"query\": \"INSERT INTO \\\"partitioning-tutorial\\\" SELECT TIME_PARSE(\\\"timestamp\\\") \\\n",
+    "          AS __time, * FROM TABLE \\\n",
+    "          (EXTERN('{\\\"type\\\": \\\"http\\\", \\\"uris\\\": [\\\"https://druid.apache.org/data/wikipedia.json.gz\\\"]}', '{\\\"type\\\": \\\"json\\\"}', '[{\\\"name\\\": \\\"added\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"channel\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"cityName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"comment\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"commentLength\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"countryIsoCode\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"countryName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"deleted\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"delta\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"deltaBucket\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"diffUrl\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"flags\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isAnonymous\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isMinor\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isNew\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isRobot\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isUnpatrolled\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"metroCode\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"namespace\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"page\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"regionIsoCode\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"regionName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"timestamp\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"user\\\", \\\"type\\\": \\\"string\\\"}]')) \\\n",
+    "          PARTITIONED BY DAY\",\n",
+    "  \"context\": {\n",
+    "    \"maxNumTasks\": 3\n",
+    "  }\n",
+    "})\n",
+    "\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
+    "ingestion_taskId_response = response\n",
+    "ingestion_taskId = json.loads(ingestion_taskId_response.text)['taskId']\n",
+    "\n",
+    "print(f\"\\033[1mQuery\\033[0m:\\n\" + payload)\n",
+    "print(f\"\\nInserting data into the table named {dataSourceName}\")\n",
+    "print(\"\\nThe response includes the task ID and the status: \" + response.text + \".\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ceb86ce0-85f6-4c63-8fd6-883033ee96e9",
+   "metadata": {},
+   "source": [
+    "To check on the status of your ingestion task, run the following cell. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "df12d12c-a067-4759-bae0-0410c24b6205",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import time\n",
+    "\n",
+    "endpoint = f\"/druid/indexer/v1/task/{ingestion_taskId}/status\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"GET\"\n",
+    "\n",
+    "payload = {}\n",
+    "headers = {}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
+    "ingestion_status = json.loads(response.text)['status']['status']\n",
+    "# If you only want to fetch the status once and print it, \n",
+    "# uncomment the print statement and comment out the if and while loops\n",
+    "# print(json.dumps(response.json(), indent=4))\n",
+    "\n",
+    "if ingestion_status == \"RUNNING\":\n",
+    "  print(\"The ingestion is running...\")\n",
+    "\n",
+    "while ingestion_status != \"SUCCESS\":\n",
+    "  response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
+    "  ingestion_status = json.loads(response.text)['status']['status']\n",
+    "  time.sleep(15)  \n",
+    "  \n",
+    "if ingestion_status == \"SUCCESS\": \n",
+    "  print(\"The ingestion is complete:\")\n",
+    "  print(json.dumps(response.json(), indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "240b0ad5-48f2-4737-b12b-5fd5f98da300",
+   "metadata": {},
+   "source": [
+    "### TIME_FLOOR\n",
+    "\n",
+    "`PARTITIONED BY(TIME_FLOOR(__time, period))`. Partition by a timestamp rounded to the specified period.\n",
+    "\n",
+    "`period` can be any of the following  ISO 8601 periods:\n",
+    "- `PT1S`: one second\n",
+    "- `PT1M`: one minute\n",
+    "- `PT5M`: five minutes\n",
+    "- `PT10M`: ten minutes\n",
+    "- `PT15M`: fifteen minutes\n",
+    "- `PT30M`: thirty minutes\n",
+    "- `PT1H`: one hour\n",
+    "- `PT6H`: six hours\n",
+    "- `PT8H`: eight hours \n",
+    "- `P1D`: one day\n",
+    "- `P1W`: one week\n",
+    "- `P1M`: one month\n",
+    "- `P3M`: three months\n",
+    "- `P1Y`: one year\n",
+    "\n",
+    "Run the following cell to partition the `partitioning-tutorial` datasource by a timestamp rounded to thirty minutes:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "91dd255a-4d55-493e-a067-4cef5c659657",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql/task\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "\"query\": \"REPLACE INTO \\\"partitioning-tutorial\\\" OVERWRITE ALL SELECT TIME_PARSE(\\\"timestamp\\\") \\\n",
+    "          AS __time, * FROM TABLE \\\n",
+    "          (EXTERN('{\\\"type\\\": \\\"http\\\", \\\"uris\\\": [\\\"https://druid.apache.org/data/wikipedia.json.gz\\\"]}', '{\\\"type\\\": \\\"json\\\"}', '[{\\\"name\\\": \\\"added\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"channel\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"cityName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"comment\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"commentLength\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"countryIsoCode\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"countryName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"deleted\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"delta\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"deltaBucket\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"diffUrl\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"flags\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isAnonymous\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isMinor\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isNew\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isRobot\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isUnpatrolled\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"metroCode\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"namespace\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"page\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"regionIsoCode\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"regionName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"timestamp\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"user\\\", \\\"type\\\": \\\"string\\\"}]')) \\\n",
+    "          PARTITIONED BY TIME_FLOOR(__time, 'PT30M')\",\n",
+    "  \"context\": {\n",
+    "    \"maxNumTasks\": 3\n",
+    "  }\n",
+    "})\n",
+    "\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
+    "ingestion_taskId_response = response\n",
+    "ingestion_taskId = json.loads(response.text)['taskId']\n",
+    "\n",
+    "print(f\"\\033[1mQuery\\033[0m:\\n\" + payload)\n",
+    "print(f\"\\nInserting data into the table named {dataSourceName}\")\n",
+    "print(\"\\nThe response includes the task ID and the status: \" + response.text + \".\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cbeb5a63",
+   "metadata": {
+    "deletable": true,
+    "tags": []
+   },
+   "source": [
+    "### FLOOR\n",
+    "\n",
+    "`PARTITIONED BY(FLOOR(__time TO time_unit))`. Partition by the largest timestamp value that is less than or equal to the specified time unit, where `time_unit` can be any of the following values: `SECOND`, `MINUTE`, `HOUR`, `DAY`, `WEEK`, `MONTH`, `QUARTER`, `YEAR`.\n",
+    "\n",
+    "Run the following cell to partition the `partitioning-tutorial` datasource by a timestamp value less than or equal to `HOUR`:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b9227d6c-1d8c-4169-b13b-a08625c4011f",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql/task\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "\"query\": \"REPLACE INTO \\\"partitioning-tutorial\\\" OVERWRITE ALL SELECT TIME_PARSE(\\\"timestamp\\\") \\\n",
+    "          AS __time, * FROM TABLE \\\n",
+    "          (EXTERN('{\\\"type\\\": \\\"http\\\", \\\"uris\\\": [\\\"https://druid.apache.org/data/wikipedia.json.gz\\\"]}', '{\\\"type\\\": \\\"json\\\"}', '[{\\\"name\\\": \\\"added\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"channel\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"cityName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"comment\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"commentLength\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"countryIsoCode\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"countryName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"deleted\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"delta\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"deltaBucket\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"diffUrl\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"flags\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isAnonymous\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isMinor\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isNew\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isRobot\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isUnpatrolled\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"metroCode\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"namespace\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"page\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"regionIsoCode\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"regionName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"timestamp\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"user\\\", \\\"type\\\": \\\"string\\\"}]')) \\\n",
+    "          PARTITIONED BY FLOOR(__time TO HOUR)\",\n",
+    "  \"context\": {\n",
+    "    \"maxNumTasks\": 3\n",
+    "  }\n",
+    "})\n",
+    "\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
+    "ingestion_taskId_response = response\n",
+    "ingestion_taskId = json.loads(response.text)['taskId']\n",
+    "\n",
+    "print(f\"\\033[1mQuery\\033[0m:\\n\" + payload)\n",
+    "print(f\"\\nInserting data into the table named {dataSourceName}\")\n",
+    "print(\"\\nThe response includes the task ID and the status: \" + response.text + \".\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c59ca797-dd91-442b-8d02-67b711b3fcc6",
+   "metadata": {},
+   "source": [
+    "### ALL and ALL TIME\n",
+    "\n",
+    "`PARTITIONED BY ALL`. Disable time partitioning by placing all data in a single time chunk.\n",
+    "\n",
+    "PARTITIONED BY ALL and PARTITIONED BY ALL TIME clauses are suitable for datasets that do not have a primary timestamp. In this case, Druid creates a `__time` column in your Druid datasource and sets all timestamps to `1970-01-01T00:00:00Z`.\n",
+    "\n",
+    "> To use LIMIT or OFFSET at the outer level of your INSERT or REPLACE query, you must set PARTITIONED BY to ALL or ALL TIME.\n",
+    "\n",
+    "Run the following cell to skip time partitioning and place all data into a single time chunk:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f7e3d62a-1325-4992-8bcd-c0f1925704bc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql/task\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "\"query\": \"REPLACE INTO \\\"partitioning-tutorial\\\" OVERWRITE ALL SELECT TIME_PARSE(\\\"timestamp\\\") \\\n",
+    "          AS __time, * FROM TABLE \\\n",
+    "          (EXTERN('{\\\"type\\\": \\\"http\\\", \\\"uris\\\": [\\\"https://druid.apache.org/data/wikipedia.json.gz\\\"]}', '{\\\"type\\\": \\\"json\\\"}', '[{\\\"name\\\": \\\"added\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"channel\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"cityName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"comment\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"commentLength\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"countryIsoCode\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"countryName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"deleted\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"delta\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"deltaBucket\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"diffUrl\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"flags\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isAnonymous\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isMinor\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isNew\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isRobot\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isUnpatrolled\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"metroCode\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"namespace\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"page\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"regionIsoCode\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"regionName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"timestamp\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"user\\\", \\\"type\\\": \\\"string\\\"}]')) \\\n",
+    "          PARTITIONED BY ALL\",\n",
+    "  \"context\": {\n",
+    "    \"maxNumTasks\": 3\n",
+    "  }\n",
+    "})\n",
+    "\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
+    "ingestion_taskId_response = response\n",
+    "ingestion_taskId = json.loads(response.text)['taskId']\n",
+    "\n",
+    "print(f\"\\033[1mQuery\\033[0m:\\n\" + payload)\n",
+    "print(f\"\\nInserting data into the table named {dataSourceName}\")\n",
+    "print(\"\\nThe response includes the task ID and the status: \" + response.text + \".\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8fbfa1fa-2cde-46d5-8107-60bd436fb64e",
+   "metadata": {
+    "deletable": true,
+    "editable": true,
+    "tags": []
+   },
+   "source": [
+    "## Learn more\n",
+    "\n",
+    "To learn more about Druid segment sizing and partitioning, see the following topics:\n",
+    "\n",
+    "- [Segments](https://druid.apache.org/docs/latest/design/segments.html) for general information about segments in Druid. \n",
+    "- [Partitioning](https://druid.apache.org/docs/latest/ingestion/partitioning.html) to learn how to set up partitions within a single datasource.\n",
+    "- [Context parameters](https://druid.apache.org/docs/latest/multi-stage-query/reference.html#context-parameters) for context parameters specific to the multi-stage query task engine."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.0"
+  },
+  "toc-autonumbering": false,
+  "toc-showcode": false,
+  "toc-showmarkdowntxt": false,
+  "toc-showtags": false,
+  "vscode": {
+   "interpreter": {
+    "hash": "b0fa6594d8f4cbf19f97940f81e996739fb7646882a419484c72d19e05852a7e"
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}