apache · kfaraz · Dec 12, 2022 · Dec 9, 2022
diff --git a/docs/operations/python.md b/docs/operations/python.md
@@ -0,0 +1,49 @@
+---
+id: python
+title: "Python Installation"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+Apache Druid startup script requires Python2 or Python3 interpreter. 
+Since Python2 is deprecated, this document has instructions to install Python3 interpreter.
+
+## Python3 interpreter installation instructions
+
+### Linux
+
+#### Debian or Ubuntu
+    - `sudo apt update`
+    - `sudo apt install -y python3-pip`
+#### RHEL
+    - `sudo yum install -y epel-release`
+    - `sudo yum install -y python3-pip`
+
+### MacOS
+
+#### Install with Homebrew
+Refer [Installing Python 3 on Mac OS X](https://docs.python-guide.org/starting/install3/osx/)
+
+#### Install the official Python release
+* Browse to the [Python Downloads Page](https://www.python.org/downloads/) and download the latest version (3.x.x)
+
+Verify if Python3 is installed by issuing `python3 --version` command.
+
+
diff --git a/docs/operations/single-server.md b/docs/operations/single-server.md
@@ -23,14 +23,17 @@ title: "Single server deployment"
   -->
 
 
-Druid includes a set of reference configurations and launch scripts for single-machine deployments:
-
-- `nano-quickstart`
-- `micro-quickstart`
-- `small`
-- `medium`
-- `large`
-- `xlarge`
+Druid includes a set of reference configurations and launch scripts for single-machine deployments.
+These configuration bundles are located in `conf/druid/single-server/`.
+
+The `auto` configuration sizes runtime parameters based on available processors and memory. Other configurations include hard-coded runtime parameters for various server sizes. Most users should stick with `auto`. Refer below [Druid auto start](#druid-auto-start)
+- `auto` (run script: `bin/start-druid`)
+- `nano-quickstart` (run script: `bin/start-nano-quickstart`)
+- `micro-quickstart` (run script: `bin/start-micro-quickstart`)
+- `small` (run script: `bin/start-single-server-small`)
+- `medium` (run script: `bin/start-single-server-medium`)
+- `large` (run script: `bin/start-single-server-large`)
+- `xlarge` (run script: `bin/start-single-server-xlarge`)
 
 The `micro-quickstart` is sized for small machines like laptops and is intended for quick evaluation use-cases.
 
@@ -44,6 +47,18 @@ The example configurations run the Druid Coordinator and Overlord together in a
 
 While example configurations are provided for very large single machines, at higher scales we recommend running Druid in a [clustered deployment](../tutorials/cluster.md), for fault-tolerance and reduced resource contention.
 
+## Druid auto start
+
+Druid includes a launch script, `bin/start-druid` that automatically sets various memory-related parameters based on available processors and memory. It accepts optional arguments such as list of services, total memory and a config directory to override default JVM arguments and service-specific runtime properties.
+
+`start-druid` is a generic launch script capable of starting any set of Druid services on a server.
+It accepts optional arguments such as list of services, total memory and a config directory to override default JVM arguments and service-specific runtime properties.
+Druid services will use all processors and up to 80% memory on the system.
+For details about possible arguments, run `bin/start-druid --help`.
+
+The corresponding launch scripts (e.g. `start-micro-quickstart`) are now deprecated.
+
+
 ## Single server reference configurations
 
 ### Nano-Quickstart: 1 CPU, 4GiB RAM
@@ -74,5 +89,4 @@ While example configurations are provided for very large single machines, at hig
 ### X-Large: 64 CPU, 512GiB RAM (~i3.16xlarge)
 
 - Launch command: `bin/start-xlarge`
-- Configuration directory: `conf/druid/single-server/xlarge`
-
+- Configuration directory: `conf/druid/single-server/xlarge`
diff --git a/docs/tutorials/cluster.md b/docs/tutorials/cluster.md
@@ -130,7 +130,10 @@ The [basic cluster tuning guide](../operations/basic-cluster-tuning.md) has info
 
 ## Select OS
 
-We recommend running your favorite Linux distribution. You will also need [Java 8 or 11](../operations/java.md).
+We recommend running your favorite Linux distribution. You will also need 
+
+* [Java 8 or 11](../operations/java.md).
+* [Python2 or Python3](../operations/python.md)  
 
 > If needed, you can specify where to find Java using the environment variables
 > `DRUID_JAVA_HOME` or `JAVA_HOME`. For more details run the `bin/verify-java` script.

diff --git a/docs/tutorials/index.md b/docs/tutorials/index.md
@@ -22,8 +22,7 @@ title: "Quickstart (local)"
   ~ under the License.
   -->
 
-
-This quickstart gets you started with Apache Druid using the [`micro-quickstart`](../operations/single-server.md#micro-quickstart-4-cpu-16gib-ram) configuration, and introduces you to Druid ingestion and query features.
+This quickstart gets you started with Apache Druid and introduces you to Druid ingestion and query features. For this tutorial, we recommend a machine with at least 6 GB of RAM.
 
 In this quickstart, you'll do the following:
 - install Druid
@@ -37,15 +36,16 @@ Druid supports a variety of ingestion options. Once you're done with this tutori
 
 You can follow these steps on a relatively modest machine, such as a workstation or virtual server with 16 GiB of RAM.
 
-Druid comes equipped with several [startup configuration profiles](../operations/single-server.md) for a
-range of machine sizes. These range from `nano` (1 CPU, 4GiB RAM) to `x-large` (64 CPU, 512GiB RAM). For more
-information, see [Single server deployment](../operations/single-server.md). For information on deploying Druid services
-across clustered machines, see [Clustered deployment](./cluster.md).
+Druid comes equipped with launch scripts that can be used to start all processes on a single server. Here, we will use [`auto`](../operations/single-server.md#druid-auto-start), which automatically sets various runtime properties based on available processors and memory.
+
+In addition, Druid includes several [bundled non-automatic profiles](../operations/single-server.md) for a range of machine sizes. These range from nano (1 CPU, 4GiB RAM) to x-large (64 CPU, 512GiB RAM). 
+We won't use those here, but for more information, see [Single server deployment](../operations/single-server.md). For additional information on deploying Druid services across clustered machines, see [Clustered deployment](./cluster.md).
 
 The software requirements for the installation machine are:
 
 * Linux, Mac OS X, or other Unix-like OS. (Windows is not supported.)
 * Java 8u92+ or Java 11.
+* [Python2 or Python3](../operations/python.md)
 
 > Druid relies on the environment variables `JAVA_HOME` or `DRUID_JAVA_HOME` to find Java on the machine. You can set
 `DRUID_JAVA_HOME` if there is more than one instance of Java. To verify Java requirements for your environment, run the 
@@ -72,38 +72,39 @@ The distribution directory contains `LICENSE` and `NOTICE` files and subdirector
 
 ## Start up Druid services
 
-Start up Druid services using the `micro-quickstart` single-machine configuration.
+Start up Druid services using the `auto` single-machine configuration.
 This configuration includes default settings that are appropriate for this tutorial, such as loading the `druid-multi-stage-query` extension by default so that you can use the MSQ task engine.
 
-You can view that setting and others in the configuration files in the `conf/druid/single-server/micro-quickstart/`. 
+You can view that setting and others in the configuration files in the `conf/druid/auto`. 
 
 From the apache-druid-{{DRUIDVERSION}} package root, run the following command:
 
 ```bash
-./bin/start-micro-quickstart
+./bin/start-druid
 ```
 
 This brings up instances of ZooKeeper and the Druid services:
 
 ```bash
-$ ./bin/start-micro-quickstart
-[Thu Sep  8 18:30:00 2022] Starting Apache Druid.
-[Thu Sep  8 18:30:00 2022] Open http://localhost:8888/ in your browser to access the web console.
-[Thu Sep  8 18:30:00 2022] Or, if you have enabled TLS, use https on port 9088.
-[Thu Sep  8 18:30:00 2022] Running command[zk], logging to[/apache-druid-{{DRUIDVERSION}}/var/sv/zk.log]: bin/run-zk conf
-[Thu Sep  8 18:30:00 2022] Running command[coordinator-overlord], logging to[/apache-druid-{{DRUIDVERSION}}/var/sv/coordinator-overlord.log]: bin/run-druid coordinator-overlord conf/druid/single-server/micro-quickstart
-[Thu Sep  8 18:30:00 2022] Running command[broker], logging to[/apache-druid-{{DRUIDVERSION}}/var/sv/broker.log]: bin/run-druid broker conf/druid/single-server/micro-quickstart
-[Thu Sep  8 18:30:00 2022] Running command[router], logging to[/apache-druid-{{DRUIDVERSION}}/var/sv/router.log]: bin/run-druid router conf/druid/single-server/micro-quickstart
-[Thu Sep  8 18:30:00 2022] Running command[historical], logging to[/apache-druid-{{DRUIDVERSION}}/var/sv/historical.log]: bin/run-druid historical conf/druid/single-server/micro-quickstart
-[Thu Sep  8 18:30:00 2022] Running command[middleManager], logging to[/apache-druid-{{DRUIDVERSION}}/var/sv/middleManager.log]: bin/run-druid middleManager conf/druid/single-server/micro-quickstart
+$ ./bin/start-druid
+[Tue Nov 29 16:31:06 2022] Starting Apache Druid.
+[Tue Nov 29 16:31:06 2022] Open http://localhost:8888/ in your browser to access the web console.
+[Tue Nov 29 16:31:06 2022] Or, if you have enabled TLS, use https on port 9088.
+[Tue Nov 29 16:31:06 2022] Starting services with log directory [/apache-druid-{{DRUIDVERSION}}/log].
+[Tue Nov 29 16:31:06 2022] Running command[zk]: bin/run-zk conf
+[Tue Nov 29 16:31:06 2022] Running command[broker]: bin/run-druid broker /apache-druid-{{DRUIDVERSION}}/conf/druid/single-server/quickstart '-Xms1187m -Xmx1187m -XX:MaxDirectMemorySize=791m'
+[Tue Nov 29 16:31:06 2022] Running command[router]: bin/run-druid router /apache-druid-{{DRUIDVERSION}}/conf/druid/single-server/quickstart '-Xms128m -Xmx128m'
+[Tue Nov 29 16:31:06 2022] Running command[coordinator-overlord]: bin/run-druid coordinator-overlord /apache-druid-{{DRUIDVERSION}}/conf/druid/single-server/quickstart '-Xms1290m -Xmx1290m'
+[Tue Nov 29 16:31:06 2022] Running command[historical]: bin/run-druid historical /apache-druid-{{DRUIDVERSION}}/conf/druid/single-server/quickstart '-Xms1376m -Xmx1376m -XX:MaxDirectMemorySize=2064m'
+[Tue Nov 29 16:31:06 2022] Running command[middleManager]: bin/run-druid middleManager /apache-druid-{{DRUIDVERSION}}/conf/druid/single-server/quickstart '-Xms64m -Xmx64m' '-Ddruid.worker.capacity=2 -Ddruid.indexer.runner.javaOptsArray=["-server","-Duser.timezone=UTC","-Dfile.encoding=UTF-8","-XX:+ExitOnOutOfMemoryError","-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager","-Xms256m","-Xmx256m","-XX:MaxDirectMemorySize=256m"]'
 ```
 
 All persistent state, such as the cluster metadata store and segments for the services, are kept in the `var` directory under 
 the Druid root directory, apache-druid-{{DRUIDVERSION}}. Each service writes to a log file under `var/sv`.
 
 At any time, you can revert Druid to its original, post-installation state by deleting the entire `var` directory. You may want to do this, for example, between Druid tutorials or after experimentation, to start with a fresh instance. 
 
-To stop Druid at any time, use CTRL+C in the terminal. This exits the `bin/start-micro-quickstart` script and terminates all Druid processes.
+To stop Druid at any time, use CTRL+C in the terminal. This exits the `bin/start-druid` script and terminates all Druid processes.
 
 ## Open the web console 
 
@@ -222,4 +223,4 @@ See the following topics for more information:
 * [Tutorial: Load stream data from Apache Kafka](./tutorial-kafka.md) to load streaming data from a Kafka topic.
 * [Extensions](../development/extensions.md) for details on Druid extensions.
 
-Remember that after stopping Druid services, you can start clean next time by deleting the `var` directory from the Druid root directory and running the `bin/start-micro-quickstart` script again. You may want to do this before using other data ingestion tutorials, since they use the same Wikipedia datasource.
+Remember that after stopping Druid services, you can start clean next time by deleting the `var` directory from the Druid root directory and running the `bin/start-druid` script again. You may want to do this before using other data ingestion tutorials, since they use the same Wikipedia datasource.
diff --git a/docs/tutorials/tutorial-batch-hadoop.md b/docs/tutorials/tutorial-batch-hadoop.md
@@ -28,7 +28,7 @@ This tutorial shows you how to load data files into Apache Druid using a remote
 
 For this tutorial, we'll assume that you've already completed the previous
 [batch ingestion tutorial](tutorial-batch.md) using Druid's native batch ingestion system and are using the
-`micro-quickstart` single-machine configuration as described in the [quickstart](index.md).
+`auto` single-machine configuration as described in the [quickstart](../operations/single-server.md#druid-auto-start).
 
 ## Install Docker
 
@@ -156,7 +156,7 @@ cp /tmp/shared/hadoop_xml/*.xml {PATH_TO_DRUID}/conf/druid/single-server/micro-q
 
 ### Update Druid segment and log storage
 
-In your favorite text editor, open `conf/druid/single-server/micro-quickstart/_common/common.runtime.properties`, and make the following edits:
+In your favorite text editor, open `conf/druid/auto/_common/common.runtime.properties`, and make the following edits:
 
 #### Disable local deep storage and enable HDFS deep storage
 
@@ -196,7 +196,7 @@ druid.indexer.logs.directory=/druid/indexing-logs
 
 Once the Hadoop .xml files have been copied to the Druid cluster and the segment/log storage configuration has been updated to use HDFS, the Druid cluster needs to be restarted for the new configurations to take effect.
 
-If the cluster is still running, CTRL-C to terminate the `bin/start-micro-quickstart` script, and re-run it to bring the Druid services back up.
+If the cluster is still running, CTRL-C to terminate the `bin/start-druid` script, and re-run it to bring the Druid services back up.
 
 ## Load batch data
 
@@ -221,7 +221,7 @@ This tutorial is only meant to be used together with the [query tutorial](../tut
 
 If you wish to go through any of the other tutorials, you will need to:
 * Shut down the cluster and reset the cluster state by removing the contents of the `var` directory under the druid package.
-* Revert the deep storage and task storage config back to local types in `conf/druid/single-server/micro-quickstart/_common/common.runtime.properties`
+* Revert the deep storage and task storage config back to local types in `conf/druid/auto/_common/common.runtime.properties`
 * Restart the cluster
 
 This is necessary because the other ingestion tutorials will write to the same "wikipedia" datasource, and later tutorials expect the cluster to use local deep storage.

diff --git a/docs/tutorials/tutorial-kafka.md b/docs/tutorials/tutorial-kafka.md
@@ -30,7 +30,7 @@ The tutorial guides you through the steps to load sample nested clickstream data
 
 ## Prerequisites
 
-Before you follow the steps in this tutorial, download Druid as described in the [quickstart](index.md) using the [micro-quickstart](../operations/single-server.md#micro-quickstart-4-cpu-16gib-ram) single-machine configuration and have it running on your local machine. You don't need to have loaded any data.
+Before you follow the steps in this tutorial, download Druid as described in the [quickstart](index.md) using the [auto](../operations/single-server.md#druid-auto-start) single-machine configuration and have it running on your local machine. You don't need to have loaded any data.
 
 ## Download and start Kafka
 

diff --git a/examples/bin/run-druid b/examples/bin/run-druid
@@ -17,7 +17,7 @@
 # specific language governing permissions and limitations
 # under the License.
 
-if [ "$#" -gt 2 ] || [ "$#" -eq 0 ]
+if [ "$#" -gt 4 ] || [ "$#" -eq 0 ]
 then
   >&2 echo "usage: $0 <service> [conf-dir]"
   exit 1
@@ -47,7 +47,45 @@ if [ ! -d "$LOG_DIR" ]; then mkdir -p $LOG_DIR; fi
 
 echo "Running [$1], logging to [$LOG_DIR/$1.log] if no changes made to log4j2.xml"
 
+if [ "$WHATAMI" = 'coordinator-overlord' ]
+then
+    SERVER_NAME=coordinator
+else
+    SERVER_NAME="$WHATAMI"
+fi
+
+
+if [ ! -f "$CONFDIR"/$WHATAMI/main.config ];
+  then
+    MAIN_CLASS="org.apache.druid.cli.Main server $SERVER_NAME"
+  else
+    MAIN_CLASS=`cat "$CONFDIR"/$WHATAMI/main.config | xargs`
+fi
+
 cd "$WHEREAMI/.."
-exec "$WHEREAMI"/run-java -Ddruid.node.type=$1 "-Ddruid.log.path=$LOG_DIR" `cat "$CONFDIR"/"$WHATAMI"/jvm.config | xargs` \
-  -cp "$CONFDIR"/"$WHATAMI":"$CONFDIR"/_common:"$CONFDIR"/_common/hadoop-xml:"$CONFDIR"/../_common:"$CONFDIR"/../_common/hadoop-xml:"$WHEREAMI/../lib/*" \
-  `cat "$CONFDIR"/$WHATAMI/main.config | xargs`
+
+CLASS_PATH="$CONFDIR"/"$WHATAMI":"$CONFDIR"/_common:"$CONFDIR"/_common/hadoop-xml:"$CONFDIR"/../_common:"$CONFDIR"/../_common/hadoop-xml:"$WHEREAMI/../lib/*"
+
+if [ "$#" -eq 3 ] || [ "$#" -eq 4 ]
+then
+  # args: <service> <conf_path> <jvm_args> or <service> <conf_path> <jvm_args> <mm_task_count mm_task_java_props>
+  JVMARGS=`cat "$CONFDIR/_common/common.jvm.config" | xargs`
+  JVMARGS+=' '
+  JVMARGS+=$3
+
+  if [ "$#" -eq 3 ]
+  then
+    # args: <service> <conf_path> <jvm_args>
+    exec "$WHEREAMI"/run-java -Ddruid.node.type=$1 "-Ddruid.log.path=$LOG_DIR" $JVMARGS \
+      -cp $CLASS_PATH $MAIN_CLASS
+  else
+    # args: <service> <conf_path> <jvm_args> <mm_task_count mm_task_java_props>
+    exec "$WHEREAMI"/run-java -Ddruid.node.type=$1 $4 "-Ddruid.log.path=$LOG_DIR"  $JVMARGS \
+      -cp $CLASS_PATH $MAIN_CLASS
+  fi
+else
+  # args: <service> <conf_path>
+  exec "$WHEREAMI"/run-java -Ddruid.node.type=$1 "-Ddruid.log.path=$LOG_DIR" \
+    `cat "$CONFDIR"/"$WHATAMI"/jvm.config | xargs` \
+    -cp  $CLASS_PATH $MAIN_CLASS
+fi
diff --git a/examples/bin/start-druid b/examples/bin/start-druid
@@ -0,0 +1,35 @@
+#!/bin/bash -eu
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+PWD="$(pwd)"
+WHEREAMI="$(dirname "$0")"
+WHEREAMI="$(cd "$WHEREAMI" && pwd)"
+
+if [ -x "$(command -v python3)" ]
+then
+  exec python3 "$WHEREAMI/start-druid-main.py" "$@"
+elif [ -x "$(command -v python2)" ]
+then
+  exec python2 "$WHEREAMI/start-druid-main.py" "$@"
+elif [ -x "$(command -v python)" ]
+then
+  exec python "$WHEREAMI/start-druid-main.py" "$@"
+else
+  echo "python interepreter not found"
+fi