From ce1aa1f525a15de44d022629035dd7e66bb0870d Mon Sep 17 00:00:00 2001
From: Aliaksandr Kukrash <multiarc@gmail.com>
Date: Fri, 15 Aug 2025 00:54:23 +0200
Subject: [PATCH 1/4] Cleanup

Signed-off-by: Aliaksandr Kukrash <multiarc@gmail.com>
---
 docs/INSTALL.md | 87 +------------------------------------------------
 1 file changed, 1 insertion(+), 86 deletions(-)

diff --git a/docs/INSTALL.md b/docs/INSTALL.md
index 6775b57..40a8e7a 100644
--- a/docs/INSTALL.md
+++ b/docs/INSTALL.md
@@ -1,87 +1,2 @@
-# Install AMD ROCm accelerator on Linux/WSL environment.
-Beware of if you have integrated AMD graphics (most likely you do with AMD CPUs), you must turn it off in order for ROCm accelerators to function with ONNX Runtime.
+# Install Optimum CLI for model conversion and optimization
 
-Here is the instruction on how to install version 6.4.2 of ROCm, and it works with an open source AMD driver on Ubuntu 24.04.
-```bash
-wget https://repo.radeon.com/amdgpu-install/6.4.2/ubuntu/noble/amdgpu-install_6.4.60402-1_all.deb
-sudo apt update
-sudo apt install ./amdgpu-install_6.4.60402-1_all.deb
-sudo amdgpu-install --usecase=rocm,hiplibsdk,graphics,opencl -y --vulkan=amdvlk --no-dkms
-```
-
-Sample for version 6.4.3 
-```bash
-wget https://repo.radeon.com/amdgpu-install/6.4.3/ubuntu/noble/amdgpu-install_6.4.60403-1_all.deb
-sudo apt update
-sudo apt install ./amdgpu-install_6.4.60403-1_all.deb
-sudo amdgpu-install --usecase=rocm,hiplibsdk,graphics,opencl -y --vulkan=amdvlk --no-dkms
-```
-
-And to check if the installation succeeded.
-```bash
-rocminfo #make note of your GPU uuid, to whitelist only CPU and discreet GPU on the next step
-```
-
-`rocminfo` DOESN'T fail if integrated GPU is enabled, but a lot of features may not be supported to a point when it will crash a driver at runtime.
-Your options are: disable iGPU in UEFI/BIOS or export environment variable to whitelist CPU and discreet GPU only.
-```bash
-export ROCR_VISIBLE_DEVICES="0,GPU-deadbeefdeadbeef" #0 - CPU, GPU-deadbeefdeadbeef - GPU.
-```
-
-The source for instruction was taken from version 6.4.1 — it does not exist for higher versions. But it works with pretty much all versions.
-
-## Instructions source
-https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.4.1/install/install-methods/amdgpu-installer/amdgpu-installer-ubuntu.html
-
-# Building ONNX Runtime for ROCm
-
-The build process for ROCm target accelerator is extremely heavy and may take 3+ hours on Ryzen 9 9950X and peaks at ~50 Gb memory usage (with 96 Gb total RAM).
-Considering the above, choose your targets from the beginning. I recommend building all targets in one go (Python and .NET) — this will save a lot of time.
-
-Clone repo
-```bash
-git clone --recursive https://github.com/ROCm/onnxruntime.git
-git checkout tags/v1.22.1
-cd onnxruntime
-```
-
-Build for .NET only to run models
-```bash
-./build.sh --update --build --config Release --build_nuget --parallel --use_rocm --rocm_home /opt/rocm --skip_tests
-```
-
-Build for .NET and for Python stack with PyTorch and any other toolset that may utilize GPU accelerators on AMD 
-
-```bash
-python3 -m venv .
-source ./bin/activate
-pip install 'cmake>=3.28,<4'
-pip install -r requirements.txt
-pip install setuptools
-./build.sh --update --build --config Release --build_wheel --build_nuget --parallel --use_rocm --rocm_home /opt/rocm --skip_tests
-```
-
-Install wheel for python to use in the venv
-```bash
-pip install ./build/Linux/Release/dist/*.whl
-```
-Instructions primary source
-https://onnxruntime.ai/docs/build/eps.html#amd-rocm
-
-### Pre-built .NET packages are linked to the repo
-
-### Optimum[onnx] CLI can use ROCm but would actually call accelerator/target as CUDA and work for parts of workloads, please hold on tight and brace yourself, this may get fixed at some point in the future.
-Also, AMD has a CUDA translation layer for non-precompiled code, so it may simply work sometimes.
-```text
-  .-'---`-.
-,'          `.
-|             \
-|              \
-\           _  \
-,\  _    ,'-,/-)\
-( * \ \,' ,' ,'-)
- `._,)     -',-')
-   \/         ''/
-    )        / /
-   /       ,'-'
-```
\ No newline at end of file

From 902d921f88d67775fa77a82fc1fbceb9e912f9fd Mon Sep 17 00:00:00 2001
From: Aliaksandr Kukrash <multiarc@gmail.com>
Date: Fri, 15 Aug 2025 18:14:55 +0200
Subject: [PATCH 2/4] Add optimum docs

Signed-off-by: Aliaksandr Kukrash <multiarc@gmail.com>
---
 docs/INSTALL.md | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/docs/INSTALL.md b/docs/INSTALL.md
index 40a8e7a..9c6790e 100644
--- a/docs/INSTALL.md
+++ b/docs/INSTALL.md
@@ -1,2 +1,15 @@
 # Install Optimum CLI for model conversion and optimization
 
+```bash
+sudo apt update
+sudo apt install build-essential flex bison libssl-dev libelf-dev bc python3 pahole cpio python3.12-venv python3-pip
+mkdir optimum
+cd optimum
+python3 -m venv .
+source ./bin/activate
+pip install optimum
+pip install optimum[exporters,onnxruntime,sentence_transformers,amd]
+pip install accelerate
+```
+
+To install AMD GPU support to run models, please follow the instructions in [AMD GPU Support](INSTALL_AMD_ROCm.md) 
\ No newline at end of file

From afc2c1b9e037baf5e89acb8f9f4160622ec0075e Mon Sep 17 00:00:00 2001
From: Aliaksandr Kukrash <multiarc@gmail.com>
Date: Sun, 24 Aug 2025 15:00:27 +0200
Subject: [PATCH 3/4] Update docs for ROCm model optimization

Signed-off-by: Aliaksandr Kukrash <multiarc@gmail.com>
---
 .gitignore      |  4 +++-
 docs/INSTALL.md | 16 ++++++++++++----
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/.gitignore b/.gitignore
index 54637a6..ccb6850 100644
--- a/.gitignore
+++ b/.gitignore
@@ -252,4 +252,6 @@ paket-files/
 **/reranker_m3_onnx
 **/reranker_m3_onnx_gpu
 **/bge_m3_onnx
-**/bge_m3_onnx_gpu
\ No newline at end of file
+**/bge_m3_onnx_gpu
+**/llama3.1_8b_onnx_gpu
+**/llama3.2_3b_onnx_gpu
diff --git a/docs/INSTALL.md b/docs/INSTALL.md
index 9c6790e..41bde09 100644
--- a/docs/INSTALL.md
+++ b/docs/INSTALL.md
@@ -7,9 +7,17 @@ mkdir optimum
 cd optimum
 python3 -m venv .
 source ./bin/activate
-pip install optimum
-pip install optimum[exporters,onnxruntime,sentence_transformers,amd]
-pip install accelerate
+pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm6.4
+pip install onnxruntime_genai onnx-ir
+#ROCm
+python3 -m onnxruntime_genai.models.builder -i . -o ./onnx_opt_i4 -p int4 -e rocm
+#CUDA
+python3 -m onnxruntime_genai.models.builder -i . -o ./onnx_opt_i4 -p int4 -e cuda 
 ```
 
-To install AMD GPU support to run models, please follow the instructions in [AMD GPU Support](INSTALL_AMD_ROCm.md) 
\ No newline at end of file
+To install AMD GPU support for onnx runtime to run and optimize models, please follow the instructions in [AMD GPU Support](INSTALL_AMD_ROCm.md)
+
+Optimize a model for inference on GPU using FP16 precision
+```bash
+optimum-cli export onnx --model . --dtype fp16 --task default --device cuda --optimize O4 ./onnx_fp16
+```
\ No newline at end of file

From 94ae745bfe457e27e6f569d13e9c96ca0de1b89a Mon Sep 17 00:00:00 2001
From: Aliaksandr Kukrash <multiarc@gmail.com>
Date: Sun, 24 Aug 2025 15:11:49 +0200
Subject: [PATCH 4/4] More docs

Signed-off-by: Aliaksandr Kukrash <multiarc@gmail.com>
---
 OrtForge.sln                |  2 ++
 docs/INSTALL.md             | 20 +++++++++++++++-----
 docs/INSTALL_NVIDIA_CUDA.md | 16 ++++++++++++++++
 3 files changed, 33 insertions(+), 5 deletions(-)
 create mode 100644 docs/INSTALL_NVIDIA_CUDA.md

diff --git a/OrtForge.sln b/OrtForge.sln
index 2138d7a..3ccd7f6 100755
--- a/OrtForge.sln
+++ b/OrtForge.sln
@@ -11,6 +11,8 @@ EndProject
 Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "docs", "docs", "{63CDC6A4-3C2D-499F-B3F9-6B75D40887E1}"
 	ProjectSection(SolutionItems) = preProject
 		docs\INSTALL_AMD_ROCm.md = docs\INSTALL_AMD_ROCm.md
+		docs\INSTALL.md = docs\INSTALL.md
+		docs\INSTALL_NVIDIA_CUDA.md = docs\INSTALL_NVIDIA_CUDA.md
 	EndProjectSection
 EndProject
 Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "OrtForge.AI.Models.Astractions", "OrtForge.AI.Models.Astractions\OrtForge.AI.Models.Astractions.csproj", "{40A4313C-6826-4E8D-9A01-DA760DE4CE26}"
diff --git a/docs/INSTALL.md b/docs/INSTALL.md
index 41bde09..7308354 100644
--- a/docs/INSTALL.md
+++ b/docs/INSTALL.md
@@ -6,16 +6,26 @@ sudo apt install build-essential flex bison libssl-dev libelf-dev bc python3 pah
 mkdir optimum
 cd optimum
 python3 -m venv .
-source ./bin/activate
+source ./bin/activate 
+```
+
+AMD GPU support for onnx runtime to run and optimize models, please follow the instructions in [AMD GPU Support](INSTALL_AMD_ROCm.md)
+
+## ROCm
+```bash
 pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm6.4
 pip install onnxruntime_genai onnx-ir
-#ROCm
 python3 -m onnxruntime_genai.models.builder -i . -o ./onnx_opt_i4 -p int4 -e rocm
-#CUDA
-python3 -m onnxruntime_genai.models.builder -i . -o ./onnx_opt_i4 -p int4 -e cuda 
 ```
 
-To install AMD GPU support for onnx runtime to run and optimize models, please follow the instructions in [AMD GPU Support](INSTALL_AMD_ROCm.md)
+Nvidia GPU (CUDA) support for onnx runtime to run and optimize models, please follow the instructions in [CUDA GPU Support](INSTALL_NVIDIA_CUDA.md)
+
+## CUDA
+```bash
+pip install torch torchvision
+pip install onnxruntime_genai onnx-ir onnxruntime_gpu
+python3 -m onnxruntime_genai.models.builder -i . -o ./onnx_opt_i4 -p int4 -e cuda
+```
 
 Optimize a model for inference on GPU using FP16 precision
 ```bash
diff --git a/docs/INSTALL_NVIDIA_CUDA.md b/docs/INSTALL_NVIDIA_CUDA.md
new file mode 100644
index 0000000..aeda33d
--- /dev/null
+++ b/docs/INSTALL_NVIDIA_CUDA.md
@@ -0,0 +1,16 @@
+# Install Nvidia CUDA accelerator on Linux WSL environment.
+
+1. Update drivers to the latest on Windows.
+2. Install CUDA Toolkit 13.0.
+3. Install ONNX Runtime for CUDA.
+
+```bash
+sudo apt-key del 7fa2af80
+wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-keyring_1.1-1_all.deb
+sudo dpkg -i cuda-keyring_1.1-1_all.deb
+sudo apt-get update
+sudo apt-get -y install cuda-toolkit-13-0
+```
+
+## Instructions source
+https://docs.nvidia.com/cuda/wsl-user-guide/index.html#getting-started-with-cuda-on-wsl
\ No newline at end of file