diff --git a/docs/how_to/deploy/adreno.rst b/docs/how_to/deploy/adreno.rst
index 7f4616fbf797..ed016a3ff744 100644
--- a/docs/how_to/deploy/adreno.rst
+++ b/docs/how_to/deploy/adreno.rst
@@ -15,41 +15,54 @@
     specific language governing permissions and limitations
     under the License.
 
-Deploy to Adreno GPU
-=======================================
+Deploy to Adreno™ GPU
+=====================
 
-**Authors**: Daniil Barinov, Egor Churaev, Andrey Malyshev
+**Authors**: Daniil Barinov, Egor Churaev, Andrey Malyshev, Siva Rama Krishna
 
 Introduction
 ------------
 
-Adreno is a series of graphics processing unit (GPU) semiconductor
+Adreno™ is a series of graphics processing unit (GPU) semiconductor
 intellectual property cores developed by Qualcomm and used in many of
 their SoCs.
 
-The Adreno GPU accelerates the rendering of complex geometries to
+The Adreno™ GPU accelerates the rendering of complex geometries to
 deliver high-performance graphics and a rich user experience with low
 power consumption.
 
-This guide will demonstrate :ref:`the benefits of using textures with Adreno<advantages_of_the_textures>`,
-how to :ref:`build TVM with OpenCL<building_tvm_for_adreno>` (needed by Adreno devices) and TVM RPC
-enabled. It will also provide :ref:`example code<build_and_deploy_model_for_adreno>` to better understand the differences in compiling and deploying models
-for Adreno devices.
+TVM supports deep learning acceleration on Adreno™ GPU by native OpenCL backend of TVM and
+also through OpenCLML backend. Native OpenCL backend of TVM is enhanced to make it
+Adreno™ friendly by incorporating texture memory usage and Adreno™ friendly layouts.
+OpenCLML is an SDK release by Qualcomm that provides kernel acceleration library
+for most of the deep learning operators.
 
-.. _advantages_of_the_textures:
+This guide is organized to demonstrate various design aspects of
 
-Advantages of the Textures
---------------------------
+- :ref:`OpenCL Backend Ehnahcements<opencl_enhancements>`
+- :ref:`About OpenCLML<about_openclml>`
+- :ref:`Build and Deploy<build_deploy>`
 
-One of the Adreno's advantages is the clever handling of textures. At
+
+.. _opencl_enhancements:
+
+OpenCL Backend Enhancements
+---------------------------
+
+OpenCL backend of TVM is enhanced to take advantage of Adreno™ specific features like
+- Texture memory usage.
+- Adreno™ friendly activation layouts.
+- Brand new schedules to accelerate with above features.
+
+One of the Adreno™'s advantages is the clever handling of textures. At
 the moment, TVM is able to benefit from this by having texture support
-for Adreno. The graph below shows the Adreno A5x architecture.
+for Adreno™. The graph below shows the Adreno™ A5x architecture.
 
-|High-level overview of the Adreno A5x architecture for OpenCL|
+|High-level overview of the Adreno™ A5x architecture for OpenCL|
 
-*Fig. 1 High-level overview of the Adreno A5x architecture for OpenCL*
+*Fig. 1 High-level overview of the Adreno™ A5x architecture for OpenCL*
 
-*source:* `OpenCL Optimization and Best Practices for Qualcomm Adreno GPUs <https://dl.acm.org/doi/10.1145/3204919.3204935>`_
+*source:* `OpenCL Optimization and Best Practices for Qualcomm Adreno™ GPUs <https://dl.acm.org/doi/10.1145/3204919.3204935>`_
 
 Reasons of using textures:
 
@@ -65,142 +78,479 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+In general we specify target as ``target="opencl"`` for a regular OpenCL based target which generates the kernels as shown below.
 
-Building TVM for Adreno
------------------------
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
+   // body..
+
+Above OpenCL kernel definition has ``__global float*`` poniters which are essestially OpenCL ``buffer``  objects.
+
+When enabled texture based enhancements by modifying target definition as ``target="opencl -device=adreno"`` we can see the generated
+kernels using texture backed OpenCL image objects as shown below.
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
+   // body..
+
+*image2d_t* is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions.
+When we use *image2d_t* we read *4 elements at one time*, and it helps to utilize hardware in a more efficient way.
+
+Please refer to :ref:`Advanced Usage<advanced_usage>` for more details about generation and inspection of kernel sources.
+
+
+.. _about_openclml:
+
+About OpenCLML
+--------------
+
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning operators.
+These operators are exposed as an extension ``cl_qcom_ml_ops`` to standard OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK <https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_ for more details.
+
+OpenCLML is integrated into TVM as a `BYOC <https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_ solution.
+OpenCLML operators can use same context and can be enqueued on same command queue as used in native OpenCL.
+We took advantage of this to avoid any context switching over heads while fallback to native OpenCL.
+
+
+.. _build_deploy:
+
+TVM for Adreno™
+---------------
+
+This section gives instructions about various ways of building and deploying model
+to Adreno™ target. Adreno™ is a remote target which is connected to the host via ADB connection.
+Deploying the compiled model here require use some tools on host as well as on target.
+
+TVM has simplified user friendly command line based tools as well as
+developer centric python API interface for various steps like auto tuning, building and deploying.
+
+
+|Adreno deployment pipeline|
+
+*Fig.2 Build and Deployment pipeline on Adreno devices*
+
+The figure above demonstrates a generalized pipeline for various stages listed below.
+
+**Model import:**
+At this stage we import a model from well known frameworks like Tensorflow, PyTorch, ONNX ...etc.
+This stage converts the given model into TVM's relay module format. Alternatively one can build a relay module manually
+by using TVM's operator inventory too. TVM module generated here is a target independent representation of the graph.
+
+**Auto Tuning:**
+At this stage we tune the TVM generated kernels specific to a target. Auto tuning process requires
+target device availability and in case of a remote target like Adreno™ on Android device we use RPC Setup for communication.
+Later sections in this guide will detail about RPC Setup for Android device. Auto tuning is not a necessary step for
+compilation of a model. It is necessary for acheiving best performance out of TVM generated kernels.
+
+**Compilation:**
+At this stage we compile the model for specific target. Given we auto tuned the module in previous stage,
+TVM compilation make use of the tuning log for genetrating best performing kernels. TVM compilation process produces artifacts
+containing kernel shared lib, graph definition in json format and parameters binary file in TVM specific format.
+
+**Deploy (or test run) on Target:**
+At this stage we run the TVM compilation output on the target. Deployment is possible from python
+environment using RPC Setup and also using TVM's native tool which is native binary cross compiled for Android.
+At this stage we can run the compiled model on Android target and unit test output correctness and performance aspects.
+
+**Application Integration:**
+This stage is all about integrating TVM compiled model in applications. Here we discuss about
+interfacing tvm runtime from Android (cpp native environment or from JNI) for setting input and getting output.
+
+**Advanced Usage:**
+This section advanced user interests like viewing generated source code, altering precision of the module ...etc.
+
+
+This tutorial covers all the above aspects as part of below sections.
+
+- :ref:`Development environment<development_environment>`
+- :ref:`RPC Setup<rpc_setup>`
+- :ref:`Commandline tools<commandline_interface>`
+- :ref:`Python interface<python_interface>`
+- :ref:`Application Integration<application_integration>`
+- :ref:`Advanced Usage<advanced_usage>`
+
+.. _development_environment:
+
+
+Development Environment Setup : Automatic
+-----------------------------------------
+TVM ships a predefined docker container environment with all prerequisites to get started quickly.
+You may also refer to :ref:`Manual Environment Setup<manual_setup>` for more control on the dependencies.
+
+For docker setup the pre requisite is just docker tool availabilty on host.
 
-This section gives instructions on how to build the Android part of TVM
-with OpenCL and TVM RPC Server in order to deploy models on Adreno.
+Below commands can build a docker image for adreno.
 
-Since the process of building TVM for Adreno is exactly the same as the
-process of building TVM for Android, please refer to these instructions:
-`TVM RPC
-Server <https://github.com/apache/tvm/tree/main/apps/cpp_rpc>`_.
+::
+
+   ./docker/build.sh ci_adreno
+   docker tag tvm.ci_adreno ci_adreno
+
+
+Now we can build both host and target utils with below command.
+
+::
+
+   ./tests/scripts/ci.py adreno -i
+
+To build TVM with OpenCLML SDK we need export the OpenCLML SDK as shown below while building
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   ./tests/scripts/ci.py adreno -i
+
+On successful compilation this leaves us into a docker shell. The build leaves two folders
 
-Since there are many required packages for Android, you can use the official Docker Image to build TVM.
-For more information refer to this guide: `Deploy the Pretrained Model on Android <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_android.html>`_.
+* build-adreno:  The host side TVM compiler build.
+* build-adreno-target : Contains the android target components
+
+    * libtvm_runtime.so : TVM runtime library
+    * tvm_rpc : The rpc runtime environment tool
+    * rtvm : A native stand alone tool
+
+While using docker environment the android device is shared with host. Hence, it is required
+to have adb version ``1.0.41`` on the host as the docker used the same version.
+
+We can check adb devices availability inside docker environment too.
+
+::
+
+   user@ci-adreno-fpeqs:~$ adb devices
+   List of devices attached
+   aaaabbbb	device
+   ccccdddd	device
+
+.. _manual_setup:
+
+Development Environment Setup : Manual
+--------------------------------------
+
+Manual build process require building of host and target components.
+
+Below command will configure the build the host compiler
+
+::
 
-**Prerequisites**: Android NDK and Android Debug Bridge must
-be installed, the desired device must have OpenCL support and Android part of TVM must be built:
+   mkdir -p build
+   cd build
+   cp ../cmake/config.cmake .
+
+   # Enable RPC capability to communicate to remote device.
+   echo set\(USE_RPC ON\) >> config.cmake
+   # We use graph executor for any host(x86) side verification of the model.
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   # Enable backtrace if possible for more ebug information on any crash.
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   # The target_host will be llvm.
+   echo set\(USE_LLVM ON\) >> config.cmake
+
+Additionally we can push below config entry to compile with OpenCLML support.
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML ${ADRENO_OPENCL}\) >> config.cmake
+
+now we can build as shown below
+
+::
+
+   cmake ..
+   make
+
+Finally we can export python path as
+
+::
+
+   export PYTHONPATH=$TVM_HOME/python:${PYTHONPATH}
+   python3 -c "import tvm" # Verify tvm python package
+
+
+Now, we can configure and build the target components with below configuration
+Target build require Android NDK to be installed.
 
 - Read documentation about *Android NDK installation* here: https://developer.android.com/ndk
 - To get access to adb tools you can see *Android Debug Bridge installation* here: https://developer.android.com/studio/command-line/adb
 
-You can also build the android part of TVM locally. From the root
-folder of TVM:
 
 ::
 
-   mkdir build_android
-   cd build_android
-   cmake .. -DUSE_OPENCL=ON -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_NATIVE_API_LEVEL=android-28 -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=ON -DANDROID_STL=c++_static -DUSE_CPP_RPC=ON
-   make -jN tvm_runtime tvm_rpc
+   mkdir -p build-adreno
+   cd build-adreno
+   cp ../cmake/config.cmake .
+   # Enable OpenCL backend.
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   # Enable RPC functionality.
+   echo set\(USE_RPC ON\) >> config.cmake
+   # Build tvm_rpc tool that runs on target device.
+   echo set\(USE_CPP_RPC ON\) >> config.cmake
+   # Build native rtvm deploy tool.
+   echo set\(USE_CPP_RTVM ON\) >> config.cmake
+   # We use graph executor for deploying on devices like Android.
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   # Backtrace enablement if possible.
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   # Adreno supports 32bit alignment for OpenCL allocations rather 64bit.
+   echo set\(USE_KALLOC_ALIGNMENT 32\) >> config.cmake
+
+   # Android build related defines.
+   echo set\(ANDROID_ABI arm64-v8a\) >> config.cmake
+   echo set\(ANDROID_PLATFORM android-28\) >> config.cmake
+   echo set\(MACHINE_NAME aarch64-linux-gnu\) >> config.cmake
+
+Additionally we can push below config to compile with OpenCLML support.
 
-where **N** is the number of cores available on your *CPU*.
+::
 
-At this stage you have built TVM for Adreno.
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML "${ADRENO_OPENCL}"\) >> config.cmake
+   echo set\(USE_CLML_GRAPH_EXECUTOR "${ADRENO_OPENCL}"\) >> config.cmake
 
-.. _build_and_deploy_model_for_adreno:
+For Android target build ``ANDROID_NDK_HOME`` is a dependency and we should have the same in the enviromnet variable.
+Below commands will build Adreno™ target components
 
-Build and deploy model for Adreno
----------------------------------
+::
 
-In this section we will focus on target, needed to compile and deploy models for Adreno, demonstrate
-the differences in generated kernels with and without textures and, in addition, the
-possibility of choosing a different precision for model compilation will
-be considered.
+   cmake -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake" \
+      -DANDROID_ABI=arm64-v8a \
+      -DANDROID_PLATFORM=android-28 \
+      -DCMAKE_SYSTEM_VERSION=1 \
+      -DCMAKE_FIND_ROOT_PATH="${ADRENO_OPENCL}" \
+      -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
+      -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
+      -DCMAKE_CXX_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang++" \
+      -DCMAKE_C_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang" \
+      -DMACHINE_NAME="aarch64-linux-gnu" ..
 
-For the complete step-py-step process of compiling and deploying models on
-Adreno, including selection of precision, running the inference of the
-model, getting the predictions, and measuring the performance please refer to this tutorial: `How To Deploy model on Adreno <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+   make tvm_runtime tvm_rpc rtvm
 
-|Android deployment pipeline|
 
-*Fig.2 Deployment pipeline on Adreno devices*
+.. _rpc_setup:
 
-The figure above demonstrates a generalized pipeline for deploying and running neural network models on android devices.
-As can be seen from the figure, the compiled model has a set_input() and a run() methods,
-which *prepare the inputs* for inference and *execute the inference* on the remote device using the Graph Executor runtime module.
+RPC Setup
+---------
 
-Adreno target
-~~~~~~~~~~~~~
+RPC Setup allows remote target access over TCP/IP networking interface. RPC Setup is essential for auto tuning stage as tuning
+involves running of auto generated kernels on real device and optimize the same by using machine learning approach. Please refer
+`Auto-Tune with Templates and AutoTVM <https://tvm.apache.org/docs/how_to/tune_with_autotvm/index.html>`_ got more details about AutoTVM.
 
-Normally, when compiling models for Android using OpenCL, the
-corresponding target is used
+RPC Setup is also useful to deply the compiled model to a remote device from python interface or ``tvmc`` tool from host device.
 
-.. code:: python
+RPC Setup has multiple components as listed below.
 
-   target="opencl"
+**TVM Tracker:**
+TVM tracker is a host side daemon that manages remote devices and serve them to host side applications. Applications
+can connect to this tracker and acquire a remote device handle to communicate.
 
-Using Adreno, we want to get all the benefits of textures, so we have to
-use the following target to generate texture leveraging kernels
+**TVM RPC:**
+TVM RPC is a native application that runs on the remote device (Android in our case) and registers itself to the TVM Tracker
+running on the host.
 
-.. code:: python
 
-   target="opencl -device=adreno"
+Hence, for RPC based setup we will have above components running on host and target device. Below sections explain how to setup the same
+manually and also inside docker using automated tools.
 
-Let's write a simple model with one convolutional (conv2d) layer and take a look at generated kernels for these
-two targets
+**Automated RPC Setup:**
+Here we will explain how to setup RPC in docker environment.
 
-.. code:: python
+Below command launches tracker in docker environment, where tracker listens on port 9190.
 
-   import tvm
-   from tvm import relay
-   import numpy as np
+::
 
-   input_shape=(1, 56, 56, 32)
-   filter_shape=(3, 3, 32, 64)
-   filter = np.random.rand(*filter_shape)
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on the anreno docker
+   source  tests/scripts/setup-adreno-env.sh -e tracker -p 9190
 
-   dtype="float32"
-   input = tvm.relay.var("input", shape=input_shape, dtype=dtype)
-   weight = tvm.relay.var("weight", shape=filter_shape, dtype=dtype)
-   D = relay.nn.conv2d(input, weight, padding=(1, 1), data_layout="NHWC", kernel_layout="HWIO", out_dtype=dtype)
+Now, the below comand can run TVM RPC on remote android device with id ``abcdefgh``.
 
-   mod = relay.Function([input, weight], D)
-   params = {
-      "weight": tvm.nd.array(filter)
-   }
 
-Now compile our model with the classic OpenCL target and print its modules:
+::
 
-.. code:: python
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on adreno docker.
+   source  tests/scripts/setup-adreno-env.sh -e device -p 9190 -d abcdefgh
 
-   target="opencl"
+Further, below command can be used to query the RPC setup details on any other docker terminals.
 
-   with tvm.transform.PassContext(opt_level=3):
-      graph, lib, params = relay.build_module.build(mod, target, params=params)
-   print(lib.imported_modules[0].get_source())
+::
 
-Notice that the generated convolution kernel has pointers in
-the initialization of the function. The kernels generated with the above target are buffer-based.
+   ./tests/scripts/ci.py adreno -i # Launch a new shell on adreno docker.
+   source  tests/scripts/setup-adreno-env.sh -e query -p 9190
 
-.. code:: c
 
-   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
-   // body..
+**Manual RPC Setup:**
+
+Please refer to the tutorial
+`How To Deploy model on Adreno <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+for manual RPC environment setup.
+
+This concludes RPC Setup and we have rpc-tracker available on host ``127.0.0.1`` (rpc-tracker) and port ``9190`` (rpc-port).
+
+
+.. _commandline_interface:
+
+Commandline Tools
+-----------------
+
+Here we describe entire compilation process using command line tools. TVM has command line utility
+`tvmc <https://tvm.apache.org/docs/tutorial/tvmc_command_line_driver.html>`_ to perform
+model import, auto tuning, compilation and deply over rpc.
+`tvmc <https://tvm.apache.org/docs/tutorial/tvmc_command_line_driver.html>`_  has many options to explore and try.
+
+**Model Import & Tuning:**
+Use the below command to import a model from any framework and auto tune the same.
+Here we use a model from Keras and it uses RPC setup for tuning and finally generates tuning log file
+``keras-resnet50.log``.
+
+::
+
+   python3 -m tvm.driver.tvmc tune --target="opencl -device=adreno" \
+   --target-host="llvm -mtriple=aarch64-linux-gnu" \
+   resnet50.h5 -o \
+   keras-resnet50.log \
+   --early-stopping 0 --repeat 30 --rpc-key android \
+   --rpc-tracker 127.0.0.1:9190 --trials 1024 \
+   --tuning-records keras-resnet50-records.log --tuner xgb
+
+**Model Compilation:**
+
+Use below command for compiling the model and produce TVM compiler outputs.
+
+::
+
+   python3 -m tvm.driver.tvmc compile \
+   --cross-compiler ${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang \
+   --target="opencl, llvm" --target-llvm-mtriple aarch64-linux-gnu --target-opencl-device adreno \
+   --tuning-records keras-resnet50.log -o keras-resnet50.tar resnet50.h5
+
+While enabled OpenCLML offloading we need to add target ``clml`` as shown below. Tuning log is valid for OpenCLML offloading also
+as the OpenCL path is fallback option for any operator didn't go through OpenCLML path. The tuning log will be used for such operators.
+
+::
+
+   python3 -m tvm.driver.tvmc compile \
+   --cross-compiler ${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang \
+   --target="opencl, clml, llvm" --target-llvm-mtriple aarch64-linux-gnu --target-opencl-device adreno \
+   --tuning-records keras-resnet50.log -o keras-resnet50.tar resnet50.h5
+
+On successful compilation, above command produce ``keras-resnet50.tar``.
+It is a compressed archive with kernel shared lib(mod.so), graph json(mod.json) and params binary(mod.params).
+
+**Deploy & Run on Target:**
+
+Running the compiled model on Android target is possible in RPC way as well as native deployment.
+
+We can use below tvmc command to deploy on remore target via RPC based setup.
+
+::
+
+   python3 -m tvm.driver.tvmc run --device="cl" keras-resnet50.tar \
+   --rpc-key android --rpc-tracker 127.0.0.1:9190 --print-time
+
+`tvmc <https://tvm.apache.org/docs/tutorial/tvmc_command_line_driver.html>`_ based run has more options
+to initialize the input in various modes like fill, random ..etc.
+
+``tvmc`` based deployment generally a quick verification of compiled model on target from remote host via RPC setup.
+
+Production generally uses native deploymenmt environment like Android JNI or CPP native environments.
+Here we need to use cross compiled ``tvm_runtime`` interface to deploy the tvm compilation output, i.e. ``TVMPackage``.
+
+TVM has a standalone tool ``rtvm`` to deploy and run the model natively on ADB shell. The build process produces this tool under build-adreno-target.
+Please refer to `rtvm <https://github.com/apache/tvm/tree/main/apps/cpp_rtvm>`_ for more details about this tool.
+
+While integrating inside existing Android application TVM has multiple options. For JNI or CPP native we may use `C Runtime API <https://github.com/apache/tvm/blob/main/include/tvm/runtime/c_runtime_api.h>`_
+You may refer to ``rtvm``'s simplified interface `TVMRunner <https://github.com/apache/tvm/blob/main/apps/cpp_rtvm/tvm_runner.h>`_ also.
+
+.. _python_interface:
+
+Python Interface
+----------------
+
+This section explains importing, auto tuning, compiling and running a model using python interface.\
+TVM has a high level interface through ``tvmc`` abstraction as well as low level relay api. We will discuss about both of these in details.
+
+**TVMC Interface:**
+
+While using ``tvmc`` python interface we first load a model that produces ``TVMCModel``. ``TVMCModel`` will be used for Auto Tuning to produce tuning cache.
+Compilation process uses ``TVMCModel`` and tuning cache (optional) to produce ``TVMCPackage``. Now, ``TVMCPackage`` will be saved to file system or
+can be used to deploy and run on target device.
+
+Please refer to the tutorial for the same
+`How To Deploy model on Adreno using TVMC <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno_tvmc.html>`_
+
+Saved ``TVMCPackage`` can be used for native deployment using ``rtvm`` utility too.
+
+Also, please refer to `tvmc <https://tvm.apache.org/docs/tutorial/tvmc_command_line_driver.html>`_
+documentation for more details about the api interface.
+
+**Relay Interface:**
+
+Relay api interface gives lower level api access to the tvm compiler interface.
+Similar to ``tvmc`` interface relay api interface provides various frontend API to convert models to a relay ``Module``.
+Relay ``Module`` will be used for all kinds transforms like precision conversions, CLML offloading and other custom transforms if any.
+The resulting Module will be used for Auto Tuning too. Finally, we use ``relay.build`` API to generate library module.
+From this library module, we can export compilation artifacts like module shared library (mod.so), params(mod.params) and json graph(mod.json).
+This library module will be used to create graph runtime to deploy and run on target device.
+
+Please refer to the tutorial `How To Deploy model on Adreno <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+for a step by step explanation of the same.
+
+Additionally, TVM also supports Java interface through `TVM4J <https://github.com/apache/tvm/tree/main/jvm>`_
 
+.. _application_integration:
 
-Now take a look at “opencl -device=adreno” target:
+Application Integration
+-----------------------
+
+TVM compilation output is represented as module shared lib (mod.so), graph json(mod.json) and params (mod.params).
+Archived representation of TVMPackage is also contains the same.
+
+In general a CPP/C based interface will be sufficient for any Android application integration.
+
+TVM natively expose ``c_runtime_api`` for loading a TVM compiled module and run the same.
+
+Alternatively one may refer to `cpp_rtvm <https://github.com/apache/tvm/blob/main/apps/cpp_rtvm/tvm_runner.h>`_
+``TVMRunner`` interface too for further simplified version of the same.
+
+
+
+.. _advanced_usage:
+
+Advanced Usage
+--------------
+
+This section details some of the advanced usage and additional information while using Adreno™ target on TVM.
+
+Generated Source Inspection
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Apart from standard tvm compilation artifacts kernel library (mod.so), graph (mod.json) and params (mod.params)
+we can also generate opencl kernel source, clml offloaded graph ...etc from lib handle as shown below.
+TVM compilation output is organized as a TVM module and many other TVM modules imported into it.
+
+Below snippet can dump CLML sub graphs in json format.
 
 .. code:: python
 
-   target="opencl -device=adreno"
+   # Look for "clml" typed module imported.
+   clml_modules = list(filter(lambda mod: mod.type_key == "clml", lib.get_lib().imported_modules))
+   # Loop through all clml sub graphs and dump the json formatted CLML sub graphs.
+   for cmod in clml_modules:
+       print("CLML Src:", cmod.get_source())
 
-   with tvm.transform.PassContext(opt_level=3):
-      graph, lib, params = relay.build_module.build(mod, target, params=params)
-   print(lib.imported_modules[0].get_source())
 
-The kernels generated this way is actually working with 2d arrays, leveraging textures
+Similarly, below snippet can extract opencl kernel source from the compiled TVM module.
 
-.. code:: c
+.. code:: python
 
-   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
-   // body..
+   # Similarly we can dump open kernel source too as shown below
+   # Look for "opencl" typed module imported.
+   opencl_modules = list(filter(lambda mod: mod.type_key == "opencl", lib.get_lib().imported_modules))
+   # Now dump kernel source for each OpenCL targetted sub graph.
+   for omod in opencl_modules:
+       print("OpenCL Src:", omod.get_source())
 
-*image2d_t* is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions.
-When we use *image2d_t* we read *4 elements at one time*, and it helps to utilize hardware in a more efficient way.
 
 Precisions
 ~~~~~~~~~~
@@ -214,30 +564,26 @@ We can choose from *float16*, *float16_acc32* (Mixed Precision), *float32* (stan
 To leverage the GPU hardware capabilities and utilize the benefits of half precision computation and memory management,
 we can convert an original model having floating points operation to a model operating with half precision.
 Choosing lower precision will positively affect the performance of the model, but it may also have a decrease in the accuracy of the model.
-To do the conversion you need to write a simple conversion function and specify the *dtype* value of "float16" before calling the function:
+
+To do the conversion you need to call adreno specific transformation API as soon as relay module is generated through any frontend.
 
 .. code:: python
 
-   def  convert_to_dtype(mod, dtype):
-      # downcast to float16
-      if  dtype == "float16":
-         global  conv2d_acc = "float16"
-         from  tvm.ir  import  IRModule
-         mod = IRModule.from_expr(mod)
-         seq = tvm.transform.Sequential(
-            [
-                  relay.transform.InferType(),
-                  relay.transform.ToMixedPrecision()
-            ]
-         )
-         with  tvm.transform.PassContext(opt_level=3):
-            mod = seq(mod)
-      return  mod
-
-   dtype="float16"
-   mod = convert_to_dtype(mod["main"], dtype)
-
-We then can compile our model in any convinient way
+   from tvm.driver.tvmc.transform import apply_graph_transforms
+   mod  = apply_graph_transforms(
+            mod,
+            {
+                "mixed_precision": True,
+                "mixed_precision_ops": ["nn.conv2d", "nn.dense"],
+                "mixed_precision_calculation_type": "float16",
+                "mixed_precision_acc_type": "float16",
+            },
+        )
+
+
+``tvm.driver.tvmc.transform.apply_graph_transforms`` is simplified API over ``ToMixedPrecision`` pass to get desired precision.
+
+We can then compile our model in any convinient way
 
 .. code:: python
 
@@ -246,84 +592,59 @@ We then can compile our model in any convinient way
            mod, target_host=target_host, target=target, params=params
        )
 
+While using ``tvmc`` python interface, the below arguments enables precision conversion to float16.
+
+.. code:: python
+
+    mixed_precision = True,
+    mixed_precision_ops = ["nn.conv2d", "nn.dense"],
+    mixed_precision_calculation_type = "float16",
+    mixed_precision_acc_type = "float16"
+
+Similarly, ``tvmc`` command line interface option bas below listed options.
+
+.. code:: bash
+
+    --mixed-precision
+    --mixed-precision-ops nn.conv2d nn.dense
+    --mixed-precision-calculation-type float16
+    --mixed-precision-acc-type float16
+
+
 **float16_acc32 (Mixed Precision)**
 
-ToMixedPrecision pass traverse over the network and split network to clusters of ops dealing with float or float16 data types.
+``ToMixedPrecision`` pass traverse over the network and split network to clusters of ops dealing with float or float16 data types.
 The clusters are defined by three types of operations:
 - Operations always be converted into float16 data type
-- Operations which can be converted if they follow by converted cluster
+- Operations which can be converted if they followed by converted cluster
 - Operations never be converted to the float16 data type
 This list is defined in the ToMixedPrecision implementation here
 `relay/transform/mixed_precision.py <https://github.com/apache/tvm/blob/main/python/tvm/relay/transform/mixed_precision.py#L34>`_
-and can be overridden by user
+and can be overridden by user.
 
-In some cases, we want higher precision in accumulation than the input data.
-This is supported, for example, for conv2d and dense operations. To override accumulation type you need to register
-function with ``@register_mixed_precision_conversion`` decorator to modify parameters of ``ToMixedPrecision`` conversion
-
-.. code:: python
+The ``ToMixedPrecision`` method is a pass to convert an FP32 relay graph into an FP16 version (with
+FP16 or FP32 accumulation dtypes). Doing this transformation is useful for reducing model size
+as it halves the expected size of the weights (FP16_acc16 case).
 
-   from  tvm.relay.op  import  register_mixed_precision_conversion
-
-   conv2d_acc = "float32"
-
-   # Pick a priority > 10 to overwrite defaults, higher priorities take precedence
-   @register_mixed_precision_conversion("nn.conv2d", level=11)
-   def  conv2d_mixed_precision_rule(call_node: "relay.Call", mixed_precision_type: str):
-       global  conv2d_acc
-       return [
-           # always do main calculation in mixed_precision_type
-           relay.transform.mixed_precision.MIXED_PRECISION_ALWAYS,
-           # the dtype for the accumulator
-           conv2d_acc,
-           # the output dtype for the operation (usually fp16)
-           mixed_precision_type,
-       ]
-
-   # Same for dense
-   @register_mixed_precision_conversion("nn.dense", level=11)
-   def  conv2d_mixed_precision_rule(call_node: "relay.Call", mixed_precision_type: str):
-       global  conv2d_acc
-       return [
-           relay.transform.mixed_precision.MIXED_PRECISION_ALWAYS,
-           conv2d_acc,
-           mixed_precision_type,
-       ]
-
-Now we need to modify the conversion function by adding some logical "forks" and ToMixedPrecision() call,
-then create a Relay graph from desired model in any convinient way and obtain **mod** (which is IR representation of the model),
-after which we can convert it to the required **dtype** and then assemble our model sequentialy
+``ToMixedPrecision`` pass usage is simplified into a simple call as shown below for usage.
 
 .. code:: python
 
-   def  convert_to_dtype(mod, dtype):
-       # downcast to float16
-       if  dtype == "float16"  or  dtype == "float16_acc32":
-           global  conv2d_acc
-           conv2d_acc = "float16"  if  dtype == "float16"  else  "float32"
-           from  tvm.ir  import  IRModule
-           mod = IRModule.from_expr(mod)
-           seq = tvm.transform.Sequential(
-               [
-                   relay.transform.InferType(),
-                   relay.transform.ToMixedPrecision()
-               ]
-           )
-           with tvm.transform.PassContext(
-                config={"relay.ToMixedPrecision.keep_orig_output_dtype": True},
-                opt_level=3):
-            mod = seq(mod)
-       return  mod
-
-   dtype="float16_acc32"
-   mod = convert_to_dtype(mod["main"], dtype)
-   dtype = "float32"  if  dtype == "float32"  else  "float16"
+   from tvm.driver.tvmc.transform import apply_graph_transforms
+   mod  = apply_graph_transforms(
+            mod,
+            {
+                "mixed_precision": True,
+                "mixed_precision_ops": ["nn.conv2d", "nn.dense"],
+                "mixed_precision_calculation_type": "float16",
+                "mixed_precision_acc_type": "float32",
+            },
+        )
 
-The ``ToMixedPrecision`` method is a pass to convert an FP32 relay graph into an FP16 version (with
-FP16 or FP32 accumulation dtypes). Doing this transformation is useful for reducing model size
-as it halves the expected size of the weights (FP16_acc16 case).
 
-From this point onwards, we can compile our model as normal
+``tvm.driver.tvmc.transform.apply_graph_transforms`` is simplified API over ``ToMixedPrecision`` pass to get desired precision.
+
+We can then compile our model in any convinient way
 
 .. code:: python
 
@@ -332,5 +653,24 @@ From this point onwards, we can compile our model as normal
            mod, target_host=target_host, target=target, params=params
        )
 
-.. |High-level overview of the Adreno A5x architecture for OpenCL| image:: https://raw.githubusercontent.com/tlc-pack/web-data/main/images/how-to/adreno_architecture.png
-.. |Android deployment pipeline| image:: https://raw.githubusercontent.com/tlc-pack/web-data/main/images/how-to/android_deployment_pipeline.jpg
+While using ``tvmc`` python interface, the below arguments enables precision conversion to float16.
+
+.. code:: python
+
+    mixed_precision = True,
+    mixed_precision_ops = ["nn.conv2d", "nn.dense"],
+    mixed_precision_calculation_type = "float16",
+    mixed_precision_acc_type = "float32"
+
+Similarly, ``tvmc`` command line interface option bas below listed options.
+
+.. code:: bash
+
+    --mixed-precision
+    --mixed-precision-ops nn.conv2d nn.dense
+    --mixed-precision-calculation-type float16
+    --mixed-precision-acc-type float32
+
+
+.. |High-level overview of the Adreno™ A5x architecture for OpenCL| image:: https://raw.githubusercontent.com/tlc-pack/web-data/main/images/how-to/adreno_architecture.png
+.. |Adreno deployment pipeline| image:: https://raw.githubusercontent.com/tlc-pack/web-data/main/images/how-to/Adreno-Deployment-Pipeline.jpg
diff --git a/gallery/how_to/deploy_models/deploy_model_on_adreno.py b/gallery/how_to/deploy_models/deploy_model_on_adreno.py
index c120c5339b62..c2ba189a6715 100644
--- a/gallery/how_to/deploy_models/deploy_model_on_adreno.py
+++ b/gallery/how_to/deploy_models/deploy_model_on_adreno.py
@@ -18,9 +18,9 @@
 """
 .. _tutorial-deploy-model-on-adreno:
 
-Deploy the Pretrained Model on Adreno
-=======================================
-**Author**: Daniil Barinov
+Deploy the Pretrained Model on Adreno™
+======================================
+**Author**: Daniil Barinov, Siva Rama Krishna
 
 This article is a step-by-step tutorial to deploy pretrained Pytorch ResNet-18 model on Adreno (on different precisions).
 
@@ -53,11 +53,17 @@
 #
 #   adb devices
 #
+# Set the android device to use, if you have several devices connected to your computer.
+#
+# .. code-block:: bash
+#
+#   export ANDROID_SERIAL=<device-hash>
+#
 # Then to upload these two files to the device you should use:
 #
 # .. code-block:: bash
 #
-#   adb -s <device_hash> push {libtvm_runtime.so,tvm_rpc} /data/local/tmp
+#   adb push {libtvm_runtime.so,tvm_rpc} /data/local/tmp
 #
 # At this moment you will have «libtvm_runtime.so» and «tvm_rpc» on path /data/local/tmp on your device.
 # Sometimes cmake can’t find «libc++_shared.so». Use:
@@ -70,7 +76,7 @@
 #
 # .. code-block:: bash
 #
-#   adb -s <device_hash> push libc++_shared.so /data/local/tmp
+#   adb push libc++_shared.so /data/local/tmp
 #
 # We are now ready to run the TVM RPC Server.
 # Launch rpc_tracker with following line in 1st console:
@@ -83,12 +89,12 @@
 #
 # .. code-block:: bash
 #
-#   adb -s <device_hash> reverse tcp:9190 tcp:9190
-#   adb -s <device_hash> forward tcp:9090 tcp:9090
-#   adb -s <device_hash> forward tcp:9091 tcp:9091
-#   adb -s <device_hash> forward tcp:9092 tcp:9092
-#   adb -s <device_hash> forward tcp:9093 tcp:9093
-#   adb -s <device_hash> shell LD_LIBRARY_PATH=/data/local/tmp /data/local/tmp/tvm_rpc server --host=0.0.0.0 --port=9090 --tracker=127.0.0.1:9190 --key=android --port-end=9190
+#   adb reverse tcp:9190 tcp:9190
+#   adb forward tcp:5000 tcp:5000
+#   adb forward tcp:5002 tcp:5001
+#   adb forward tcp:5003 tcp:5002
+#   adb forward tcp:5004 tcp:5003
+#   adb shell LD_LIBRARY_PATH=/data/local/tmp /data/local/tmp/tvm_rpc server --host=0.0.0.0 --port=5000 --tracker=127.0.0.1:9190 --key=android --port-end=5100
 #
 # Before proceeding to compile and infer model, specify TVM_TRACKER_HOST and TVM_TRACKER_PORT
 #
@@ -115,6 +121,73 @@
 #    android      1      1     0
 #    ----------------------------------
 
+#################################################################
+# Configuration
+# -------------
+
+import os
+import torch
+import torchvision
+import tvm
+from tvm import te
+from tvm import relay, rpc
+from tvm.contrib import utils, ndk
+from tvm.contrib import graph_executor
+from tvm.relay.op.contrib import clml
+from tvm import autotvm
+
+# Below are set of configuration that controls the behaviour of this script like
+# local run or device run, target definitions,  dtype setting and auto tuning enablement.
+# Change these settings as needed if required.
+
+# Adreno devices are efficient with float16 compared to float32
+# Given the expected output doesn't effect by lowering precision
+# it's advisable to use lower precision.
+# We have a helper API to make the precision conversion simple and
+# it supports dtype with "float16" and "float16_acc32" modes.
+# Let's choose "float16" for calculation and "float32" for accumulation.
+
+calculation_dtype = "float16"
+acc_dtype = "float32"
+
+# Specify Adreno target before compiling to generate texture
+# leveraging kernels and get all the benefits of textures
+# Note: This generated example running on our x86 server for demonstration.
+# If running it on the Android device, we need to
+# specify its instruction set. Set :code:`local_demo` to False if you want
+# to run this tutorial with a real device over rpc.
+local_demo = True
+
+# by default on CPU target will execute.
+# select 'cpu', 'opencl' and 'opencl -device=adreno'
+test_target = "cpu"
+
+# Change target configuration.
+# Run `adb shell cat /proc/cpuinfo` to find the arch.
+arch = "arm64"
+target = tvm.target.Target("llvm -mtriple=%s-linux-android" % arch)
+
+# Auto tuning is compute intensive and time taking task,
+# hence disabling for default run. Please enable it if required.
+is_tuning = False
+tune_log = "adreno-resnet18.log"
+
+# To enable OpenCLML accelerated operator library.
+enable_clml = False
+
+#################################################################
+# Get a PyTorch Model
+# -------------------
+# Get resnet18 from torchvision models
+model_name = "resnet18"
+model = getattr(torchvision.models, model_name)(pretrained=True)
+model = model.eval()
+
+# We grab the TorchScripted model via tracing
+input_shape = [1, 3, 224, 224]
+input_data = torch.randn(input_shape)
+scripted_model = torch.jit.trace(model, input_data).eval()
+
 #################################################################
 # Load a test image
 # -----------------
@@ -146,86 +219,43 @@
 img = np.expand_dims(img, 0)
 
 #################################################################
-# Load pretrained Pytorch model
-# -----------------------------
-# Create a Relay graph from a Pytorch ResNet-18 model
-import os
-import torch
-import torchvision
-import tvm
-from tvm import te
-from tvm import relay, rpc
-from tvm.contrib import utils, ndk
-from tvm.contrib import graph_executor
-
-model_name = "resnet18"
-model = getattr(torchvision.models, model_name)(pretrained=True)
-model = model.eval()
-
-# We grab the TorchScripted model via tracing
-input_shape = [1, 3, 224, 224]
-input_data = torch.randn(input_shape)
-scripted_model = torch.jit.trace(model, input_data).eval()
-
+# Convert PyTorch model to Relay module
+# -------------------------------------
+# TVM has frontend api for various frameworks under relay.frontend and now
+# for pytorch model import we have relay.frontend.from_pytorch api.
 # Input name can be arbitrary
 input_name = "input0"
 shape_list = [(input_name, img.shape)]
+
 mod, params = relay.frontend.from_pytorch(scripted_model, shape_list)
 
 #################################################################
 # Precisions
 # ----------
-# Since TVM support Mixed Precision, we need to register mixed_precision_conversion:
-from tvm.relay.op import register_mixed_precision_conversion
-
-conv2d_acc = "float32"
-
-
-@register_mixed_precision_conversion("nn.conv2d", level=11)
-def conv2d_mixed_precision_rule(call_node: "relay.Call", mixed_precision_type: str):
-    global conv2d_acc
-    return [
-        relay.transform.mixed_precision.MIXED_PRECISION_ALWAYS,
-        conv2d_acc,
-        mixed_precision_type,
-    ]
-
 
-@register_mixed_precision_conversion("nn.dense", level=11)
-def conv2d_mixed_precision_rule(call_node: "relay.Call", mixed_precision_type: str):
-    global conv2d_acc
-    return [
-        relay.transform.mixed_precision.MIXED_PRECISION_ALWAYS,
-        conv2d_acc,
-        mixed_precision_type,
-    ]
+# Adreno devices are efficient with float16 compared to float32
+# Given the expected output doesn't effect by lowering precision
+# it's advisable to use lower precision.
 
+# TVM support Mixed Precision through ToMixedPrecision transformation pass.
+# We may need to register precision rules like precision type, accumultation
+# datatype ...etc. for the required operators to override the default settings.
+# The below helper api simplifies the precision conversions across the module.
 
-#################################################################
-# and also define the conversion function itself
-def convert_to_dtype(mod, dtype):
-    # downcast to float16
-    if dtype == "float16" or dtype == "float16_acc32":
-        global conv2d_acc
-        conv2d_acc = "float16" if dtype == "float16" else "float32"
-        from tvm.ir import IRModule
-
-        mod = IRModule.from_expr(mod)
-        seq = tvm.transform.Sequential(
-            [relay.transform.InferType(), relay.transform.ToMixedPrecision()]
-        )
-        with tvm.transform.PassContext(opt_level=3):
-            mod = seq(mod)
-    return mod
+# Calculation dtype is set to "float16" and accumulation dtype is set to "float32"
+# in configuration section above.
 
+from tvm.driver.tvmc.transform import apply_graph_transforms
 
-#################################################################
-# Let's choose "float16_acc32" for example.
-dtype = "float16_acc32"
-mod = convert_to_dtype(mod["main"], dtype)
-dtype = "float32" if dtype == "float32" else "float16"
-
-print(mod)
+mod = apply_graph_transforms(
+    mod,
+    {
+        "mixed_precision": True,
+        "mixed_precision_ops": ["nn.conv2d", "nn.dense"],
+        "mixed_precision_calculation_type": calculation_dtype,
+        "mixed_precision_acc_type": acc_dtype,
+    },
+)
 
 #################################################################
 # As you can see in the IR, the architecture now contains cast operations, which are
@@ -233,46 +263,109 @@ def convert_to_dtype(mod, dtype):
 # You can also use "float16" or "float32" precisions as other dtype options.
 
 #################################################################
-# Compile the model with relay
-# ----------------------------
-# Specify Adreno target before compiling to generate texture
-# leveraging kernels and get all the benefits of textures
-# Note: This generated example running on our x86 server for demonstration.
-# If running it on the Android device, we need to
-# specify its instruction set. Set :code:`local_demo` to False if you want
-# to run this tutorial with a real device.
+# Prepare TVM Target
+# ------------------
 
-local_demo = True
+# This generated example running on our x86 server for demonstration.
 
-# by default on CPU target will execute.
-# select 'cpu', 'opencl' and 'vulkan'
-test_target = "cpu"
-
-# Change target configuration.
-# Run `adb shell cat /proc/cpuinfo` to find the arch.
-arch = "arm64"
-target = tvm.target.Target("llvm -mtriple=%s-linux-android" % arch)
+# To deply and tun on real target over RPC please set :code:`local_demo` to False in above configuration sestion.
+# Also, :code:`test_target` is set to :code:`llvm` as this example to make compatible for x86 demonstration.
+# Please change it to :code:`opencl` or :code:`opencl -device=adreno` for RPC target in configuration above.
 
 if local_demo:
     target = tvm.target.Target("llvm")
-elif test_target == "opencl":
-    target = tvm.target.Target("opencl", host=target)
-elif test_target == "vulkan":
-    target = tvm.target.Target("vulkan", host=target)
+elif test_target.find("opencl"):
+    target = tvm.target.Target(test_target, host=target)
 
-with tvm.transform.PassContext(opt_level=3):
-    lib = relay.build(mod, target=target, params=params)
+##################################################################
+# AutoTuning
+# ----------
+# The below few instructions can auto tune the relay module with xgboost being the tuner algorithm.
 
-#################################################################
-# Deploy the Model Remotely by RPC
-# --------------------------------
-# Using RPC you can deploy the model from host
-# machine to the remote Adreno device
+# Auto Tuning process involces stages of extracting the tasks, defining tuning congiguration and
+# tuning each task for best performing kernel configuration.
 
+# Get RPC related settings.
 rpc_tracker_host = os.environ.get("TVM_TRACKER_HOST", "127.0.0.1")
 rpc_tracker_port = int(os.environ.get("TVM_TRACKER_PORT", 9190))
 key = "android"
 
+# Auto tuning is compute intensive and time taking task.
+# It is set to False in above configuration as this script runs in x86 for demonstration.
+# Please to set :code:`is_tuning` to True to enable auto tuning.
+
+if is_tuning:
+    # Auto Tuning Stage 1: Extract tunable tasks
+    tasks = autotvm.task.extract_from_program(
+        mod, target=test_target, target_host=target, params=params
+    )
+
+    # Auto Tuning Stage 2: Define tuning configuration
+    tmp_log_file = tune_log + ".tmp"
+    measure_option = autotvm.measure_option(
+        builder=autotvm.LocalBuilder(
+            build_func=ndk.create_shared, timeout=15
+        ),  # Build the test kernel locally
+        runner=autotvm.RPCRunner(  # The runner would be on a remote device.
+            key,  # RPC Key
+            host=rpc_tracker_host,  # Tracker host
+            port=int(rpc_tracker_port),  # Tracker port
+            number=3,  # Number of runs before averaging
+            timeout=600,  # RPC Timeout
+        ),
+    )
+    n_trial = 1024  # Number of iteration of training before choosing the best kernel config
+    early_stopping = False  # Can be enabled to stop tuning while the loss is not minimizing.
+
+    # Auto Tuning Stage 3: Iterate through the tasks and tune.
+    from tvm.autotvm.tuner import XGBTuner
+
+    for i, tsk in enumerate(reversed(tasks[:3])):
+        print("Task:", tsk)
+        prefix = "[Task %2d/%2d] " % (i + 1, len(tasks))
+        tuner_obj = XGBTuner(tsk, loss_type="rank")
+
+        tsk_trial = min(n_trial, len(tsk.config_space))
+        tuner_obj.tune(
+            n_trial=tsk_trial,
+            early_stopping=early_stopping,
+            measure_option=measure_option,
+            callbacks=[
+                autotvm.callback.progress_bar(tsk_trial, prefix=prefix),
+                autotvm.callback.log_to_file(tmp_log_file),
+            ],
+        )
+    # Auto Tuning Stage 4: Pick the best performing configurations from the overall log.
+    autotvm.record.pick_best(tmp_log_file, tune_log)
+
+#################################################################
+# Enable OpenCLML Offloading
+# --------------------------
+# OpenCLML offloading will try to accelerate supported operators
+# by using OpenCLML proprietory operator library.
+
+# By default :code:`enable_clml` is set to False in above configuration section.
+
+if not local_demo and enable_clml:
+    mod = clml.partition_for_clml(mod, params)
+
+#################################################################
+# Compilation
+# -----------
+# Use tuning cache if exists.
+if os.path.exists(tune_log):
+    with autotvm.apply_history_best(tune_log):
+        with tvm.transform.PassContext(opt_level=3):
+            lib = relay.build(mod, target=target, params=params)
+else:
+    with tvm.transform.PassContext(opt_level=3):
+        lib = relay.build(mod, target=target, params=params)
+
+#################################################################
+# Deploy the Model Remotely by RPC
+# --------------------------------
+# Using RPC you can deploy the model from host
+# machine to the remote Adreno device
 if local_demo:
     remote = rpc.LocalSession()
 else:
@@ -282,10 +375,8 @@ def convert_to_dtype(mod, dtype):
 
 if local_demo:
     dev = remote.cpu(0)
-elif test_target == "opencl":
+elif test_target.find("opencl"):
     dev = remote.cl(0)
-elif test_target == "vulkan":
-    dev = remote.vulkan(0)
 else:
     dev = remote.cpu(0)
 
diff --git a/gallery/how_to/deploy_models/deploy_model_on_adreno_tvmc.py b/gallery/how_to/deploy_models/deploy_model_on_adreno_tvmc.py
new file mode 100644
index 000000000000..b54ac1b2c6e7
--- /dev/null
+++ b/gallery/how_to/deploy_models/deploy_model_on_adreno_tvmc.py
@@ -0,0 +1,198 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+.. _tutorial-deploy-model-on-adreno-tvmc:
+
+Deploy the Pretrained Model on Adreno™ with tvmc Interface
+==========================================================
+**Author**: Siva Rama Krishna
+
+This article is a step-by-step tutorial to deploy pretrained Keras resnet50 model on Adreno™.
+
+Besides that, you should have TVM built for Android.
+See the following instructions on how to build it and setup RPC environment.
+
+`Deploy to Adreno GPU <https://tvm.apache.org/docs/how_to/deploy/adreno.html>`_
+
+"""
+
+import os
+import tvm
+import numpy as np
+from tvm import relay
+from tvm.driver import tvmc
+from tvm.driver.tvmc.model import TVMCPackage
+from tvm.contrib import utils
+
+#################################################################
+# Configuration
+# -------------
+# Specify Adreno target before compiling to generate texture
+# leveraging kernels and get all the benefits of textures
+# Note: This generated example running on our x86 server for demonstration.
+# If running it on the Android device, we need to
+# specify its instruction set. Set :code:`local_demo` to False if you want
+# to run this tutorial with a real device over rpc.
+local_demo = True
+
+# by default on CPU target will execute.
+# select 'llvm', 'opencl' and 'opencl -device=adreno'
+target = "llvm"
+
+# Change target configuration.
+# Run `adb shell cat /proc/cpuinfo` to find the arch.
+arch = "arm64"
+target_host = "llvm -mtriple=%s-linux-android" % arch
+
+# Auto tuning is compute and time taking task, hence disabling for default run. Please enable it if required.
+is_tuning = False
+tune_log = "adreno-resnet50.log"
+
+# To enable OpenCLML accelerated operator library.
+enable_clml = False
+cross_compiler = (
+    os.getenv("ANDROID_NDK_HOME", "")
+    + "/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang"
+)
+
+#######################################################################
+# Make a Keras Resnet50 Model
+# ---------------------------
+
+from tensorflow.keras.applications.resnet50 import ResNet50
+
+tmp_path = utils.tempdir()
+model_file_name = tmp_path.relpath("resnet50.h5")
+
+model = ResNet50(include_top=True, weights="imagenet", input_shape=(224, 224, 3), classes=1000)
+model.save(model_file_name)
+
+
+#######################################################################
+# Load Model
+# ----------
+# Convert a model from any framework to a tvm relay module.
+# tvmc.load supports models from any framework (like tensorflow saves_model, onnx, tflite ..etc) and auto detects the filetype.
+
+tvmc_model = tvmc.load(model_file_name)
+
+print(tvmc_model.mod)
+
+# tvmc_model consists of tvmc_mode.mod which is relay module and tvmc_model.params which parms of the module.
+
+#######################################################################
+# AutoTuning
+# ----------
+# Now, the below api can be used for autotuning the model for any target.
+# Tuning required RPC setup and please refer to
+# `Deploy to Adreno GPU <https://tvm.apache.org/docs/how_to/deploy/adreno.html>`_
+
+rpc_tracker_host = os.environ.get("TVM_TRACKER_HOST", "127.0.0.1")
+rpc_tracker_port = int(os.environ.get("TVM_TRACKER_PORT", 9190))
+rpc_key = "android"
+rpc_tracker = rpc_tracker_host + ":" + str(rpc_tracker_port)
+
+# Auto tuning is compute intensive and time taking task.
+# It is set to False in above configuration as this script runs in x86 for demonstration.
+# Please to set :code:`is_tuning` to True to enable auto tuning.
+
+# Also, :code:`test_target` is set to :code:`llvm` as this example to make compatible for x86 demonstration.
+# Please change it to :code:`opencl` or :code:`opencl -device=adreno` for RPC target in configuration above.
+
+if is_tuning:
+    tvmc.tune(
+        tvmc_model,
+        target=target,
+        tuning_records=tune_log,
+        target_host=target_host,
+        hostname=rpc_tracker_host,
+        port=rpc_tracker_port,
+        rpc_key=rpc_key,
+        tuner="xgb",
+        repeat=30,
+        trials=3,
+        early_stopping=0,
+    )
+
+#######################################################################
+# Compilation
+# -----------
+# Compilation to produce tvm artifacts
+
+# This generated example running on our x86 server for demonstration.
+# To deply and tun on real target over RPC please set :code:`local_demo` to False in above configuration sestion.
+
+# OpenCLML offloading will try to accelerate supported operators by using OpenCLML proprietory operator library.
+# By default :code:`enable_clml` is set to False in above configuration section.
+
+if not enable_clml:
+    if local_demo:
+        tvmc_package = tvmc.compile(
+            tvmc_model,
+            target=target,
+        )
+    else:
+        tvmc_package = tvmc.compile(
+            tvmc_model,
+            target=target,
+            target_host=target_host,
+            cross=cross_compiler,
+            tuning_records=tune_log,
+        )
+else:
+    # Altrernatively, we can save the compilation output and save it as a TVMCPackage.
+    # This way avoids loading of compiled module without compiling again.
+    target = target + ", clml"
+    pkg_path = tmp_path.relpath("keras-resnet50.tar")
+    tvmc.compile(
+        tvmc_model,
+        target=target,
+        target_host=target_host,
+        cross=cross_compiler,
+        tuning_records=tune_log,
+        package_path=pkg_path,
+    )
+
+    # Load the compiled package
+    tvmc_package = TVMCPackage(package_path=pkg_path)
+
+# tvmc_package consists of tvmc_package.lib_path, tvmc_package.graph, tvmc_package.params
+# Saved TVMPackage is nothing but tar archive with mod.so, mod.json and mod.params.
+
+
+#######################################################################
+# Deploy & Run
+# ------------
+# Deploy and run the compiled model on RPC
+# Let tvmc fill inputs using random
+
+# Run on RPC setup
+if local_demo:
+    result = tvmc.run(tvmc_package, device="cpu", fill_mode="random")
+else:
+    result = tvmc.run(
+        tvmc_package,
+        device="cl",
+        rpc_key=rpc_key,
+        hostname=rpc_tracker_host,
+        port=rpc_tracker_port,
+        fill_mode="random",
+    )
+
+# result is a dictionary of outputs.
+print("Result:", result)
diff --git a/tests/python/relay/opencl_texture/test_network.py b/tests/python/relay/opencl_texture/test_network.py
index 46ee79697ea6..1d0e996f9f97 100644
--- a/tests/python/relay/opencl_texture/test_network.py
+++ b/tests/python/relay/opencl_texture/test_network.py
@@ -27,26 +27,25 @@
 from utils.adreno_utils import build_run_compare, get_model, gpu_preprocess
 
 
-def convert_to_fp16(mod, dtype):
-    from tvm.ir import IRModule
-
-    mod = IRModule.from_expr(mod)
-    seq = tvm.transform.Sequential(
-        [relay.transform.InferType(), relay.transform.ToMixedPrecision()]
-    )
-    with tvm.transform.PassContext(opt_level=3):
-        mod = seq(mod)
-        return mod
-
-
-def _test_mobilenet_v1(remote, target, dtype):
+def _test_mobilenet_v1(remote, target, calc_dtype, acc_dtype):
     mod, params, inputs, dtypes = get_model(
         "https://github.com/mlcommons/mobile_models/raw/main/v0_7/tflite/mobilenet_edgetpu_224_1.0_float.tflite",
         "mobilenet_edgetpu_224_1.0_float.tflite",
         "tflite",
     )
-    if dtype == "float16":
-        mod = convert_to_fp16(mod["main"], dtype)
+    if calc_dtype == "float16":
+        from tvm.driver.tvmc.transform import apply_graph_transforms
+
+        mod = apply_graph_transforms(
+            mod,
+            {
+                "mixed_precision": True,
+                "mixed_precision_ops": ["nn.conv2d", "nn.dense"],
+                "mixed_precision_calculation_type": calc_dtype,
+                "mixed_precision_acc_type": acc_dtype,
+            },
+        )
+
     build_run_compare(remote, mod, params, inputs, dtypes, target, [])
 
 
@@ -55,14 +54,21 @@ def _test_mobilenet_v1(remote, target, dtype):
 @tvm.testing.parametrize_targets("opencl -device=adreno")
 @pytest.mark.skipif(tvm.testing.utils.IS_IN_CI, reason="CI doesn't support fp16(half datatypes)")
 def test_mobilenet_v1_fp16(remote, target):
-    _test_mobilenet_v1(remote, target, "float16")
+    _test_mobilenet_v1(remote, target, "float16", "float16")
 
 
 @pytest.mark.skip(reason="See https://github.com/apache/tvm/issues/13443")
 @tvm.testing.requires_opencl
 @tvm.testing.parametrize_targets("opencl -device=adreno")
 def test_mobilenet_v1_fp32(remote, target):
-    _test_mobilenet_v1(remote, target, "float32")
+    _test_mobilenet_v1(remote, target, "float32", "float32")
+
+
+@pytest.mark.skip(reason="See https://github.com/apache/tvm/issues/13443")
+@tvm.testing.requires_opencl
+@tvm.testing.parametrize_targets("opencl -device=adreno")
+def test_mobilenet_v1_fp16_acc32(remote, target):
+    _test_mobilenet_v1(remote, target, "float16", "float32")
 
 
 if __name__ == "__main__":
diff --git a/tests/scripts/setup-adreno-env.sh b/tests/scripts/setup-adreno-env.sh
new file mode 100755
index 000000000000..44eb7edce8bc
--- /dev/null
+++ b/tests/scripts/setup-adreno-env.sh
@@ -0,0 +1,113 @@
+#!/usr/bin/env bash
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+
+ENVIRONMENT=""
+RPC_PORT=""
+ADB_SERIAL=""
+
+function usage() {
+    echo "Helper script to setup the environment for Tracker, RPC Device and for application"
+    echo "Usage (Help) : source setup-adreno-env.sh -h"
+    echo "Usage (Tracker): source setup-adreno-env.sh -e tracker -p <RPC PORT>"
+    echo "Usage (Device): source setup-adreno-env.sh -e device -p <RPC PORT> -d <Android Serial>"
+    echo "Usage (Query): source setup-adreno-env.sh -e query -p <RPC PORT>"
+}
+
+while [[ $# -gt 0 ]]; do
+  case $1 in
+    -e|--environment)
+      ENVIRONMENT="$2"
+      shift # past argument
+      shift # past value
+      ;;
+    -p|--rpc-port)
+      RPC_PORT="$2"
+      shift # past argument
+      shift # past value
+      ;;
+    -d|--android-device)
+      ADB_SERIAL="$2"
+      shift # past argument
+      shift # past value
+      ;;
+    -h|--help)
+      usage
+      return 0
+      ;;
+    -*|--*)
+      usage
+      return 0
+      ;;
+    *)
+      ;;
+  esac
+done
+
+echo "ENVIRONMENT   = ${ENVIRONMENT}"
+echo "RPC_PORT      = ${RPC_PORT}"
+echo "ADB_SERIAL    = ${ADB_SERIAL}"
+
+
+function def_environment() {
+    source tests/scripts/setup-pytest-env.sh
+    export PYTHONPATH=${PYTHONPATH}:${TVM_PATH}/apps/extension/python
+    export LD_LIBRARY_PATH="${TVM_PATH}/build:${LD_LIBRARY_PATH}"
+    export TVM_TRACKER_HOST=0.0.0.0
+    export TVM_TRACKER_PORT=$RPC_PORT
+    export RPC_DEVICE_KEY="android"
+    export RPC_TARGET="adreno"
+    export TVM_NDK_CC="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang"
+}
+
+def_environment
+
+case ${ENVIRONMENT} in
+
+  "tracker")
+    echo "Starting Tracker on port :${TVM_TRACKER_PORT}"
+    def_environment
+    python3 -m tvm.exec.rpc_tracker --host "${TVM_TRACKER_HOST}" --port "${TVM_TRACKER_PORT}"
+    ;;
+
+  "device")
+    echo "Running RPC on device : ${ADB_SERIAL} with key $RPC_DEVICE_KEY"
+    def_environment
+    export ANDROID_SERIAL=${ADB_SERIAL}
+
+    adb shell "mkdir -p /data/local/tmp/tvm_ci"
+    adb push build-adreno-target/tvm_rpc /data/local/tmp/tvm_ci/tvm_rpc_ci
+    adb push build-adreno-target/libtvm_runtime.so /data/local/tmp/tvm_ci
+
+    adb reverse tcp:${TVM_TRACKER_PORT} tcp:${TVM_TRACKER_PORT}
+    adb forward tcp:5000 tcp:5000
+    adb forward tcp:5001 tcp:5001
+    adb forward tcp:5002 tcp:5002
+    adb shell "cd /data/local/tmp/tvm_ci; killall -9 tvm_rpc_ci; sleep 2; LD_LIBRARY_PATH=/data/local/tmp/tvm_ci/ ./tvm_rpc_ci server --host=0.0.0.0 --port=5000 --port-end=5010 --tracker=127.0.0.1:${TVM_TRACKER_PORT} --key=${RPC_DEVICE_KEY}"
+    ;;
+
+  "query")
+    def_environment
+    echo "Setting dev environment with Tracker Port : $TVM_TRACKER_HOST} and the available devices are"
+    python3 -m tvm.exec.query_rpc_tracker --port ${TVM_TRACKER_PORT}
+    ;;
+
+  *)
+    usage
+    ;;
+esac
diff --git a/tests/scripts/task_build_adreno_bins.sh b/tests/scripts/task_build_adreno_bins.sh
index f65794106ee3..87f50367440c 100755
--- a/tests/scripts/task_build_adreno_bins.sh
+++ b/tests/scripts/task_build_adreno_bins.sh
@@ -28,7 +28,6 @@ cd ${output_directory}
 
 cp ../cmake/config.cmake .
 
-echo set\(USE_MICRO OFF\) >> config.cmake
 if [ -f "${ADRENO_OPENCL}/CL/cl_qcom_ml_ops.h" ] ; then
 echo set\(USE_CLML "${ADRENO_OPENCL}"\) >> config.cmake
 echo set\(USE_CLML_GRAPH_EXECUTOR "${ADRENO_OPENCL}"\) >> config.cmake
diff --git a/tests/scripts/task_config_build_adreno.sh b/tests/scripts/task_config_build_adreno.sh
index d378b5f842b5..62e6ffecbced 100755
--- a/tests/scripts/task_config_build_adreno.sh
+++ b/tests/scripts/task_config_build_adreno.sh
@@ -23,7 +23,6 @@ mkdir -p "$BUILD_DIR"
 cd "$BUILD_DIR"
 cp ../cmake/config.cmake .
 
-echo set\(USE_OPENCL ON\) >> config.cmake
 if [ -f "${ADRENO_OPENCL}/CL/cl_qcom_ml_ops.h" ] ; then
 echo set\(USE_CLML ${ADRENO_OPENCL}\) >> config.cmake
 fi