OrbitFlow Benchmark Script README

This guide explains how to use run_orbitflow.sh to execute vLLM benchmarks for specific prefetch methods, traces, and SLO ratios.

Overview

The script automates benchmarking with orbitflow.py. It loops over:

SLO ratios: Latency scaling factors (e.g., 1.5).
Experiments: Named groups (e.g., TestBestAndWorst).
Methods: Prefetch strategies (e.g., Ours, FlexGen, DistNSingle) from supported_methods.json.
Traces: Workload inputs (e.g., test_shortshort_enough).

Prerequisites

Environment:
- Python 3.11 with vLLM 0.6.6.
- CUDA GPU version 12.1.
Directory Structure:
- Root: $HOME/vllm (adjust ROOT if needed).
- Files:
  - benchmark/scripts/supported_methods.json: Method CLI args.
  - examples/orbitflow.py: Benchmark script.
  - configs/logging_template.json: Logging config.
  - benchmark/selected_traces/test_best_worst/*.json: Trace files.
  - benchmark/data_analysis/profiling/extract_profiled_results.py: Profiling script.
Install Dependencies:
```
pip install vllm
```

Script Configuration

Edit the CONSTANTS section in run_orbitflow.sh:

export CUDA_VISIBLE_DEVICES=0
export VLLM_CONFIGURE_LOGGING=1
export NUM_LAYERS=32  # e.g., 32 for LLaMa3-8B, 80 for LLaMa3-70B
LOGGING_LEVEL=CRITICAL  # DEBUG, INFO, WARNING, ERROR, CRITICAL
ROOT="$HOME/vllm"
MODEL_PATH="meta-llama/Meta-Llama-3.1-8B-Instruct"  # Or use local path: "$HOME/models/llama-3.1-8b-instruct"
profiled_path="$HOME/vllm/benchmark/scripts/profiling_data/profiled_results_A6000.json"
FIGURE_ONLY="${1:-0}"  # 0: Run benchmarks + plot, 1: Plot only

EXP_LIST=(paper_main_exp)
METHOD_LIST=(Ours)
TRACE_LIST=(both_dyn_veryhigh_bs2)  # Trace basenames
TRACE_CFG_DIR="${ROOT}/benchmark/selected_traces"
SLO_RATIO_LIST=(1.5)  # e.g., 1.5, 2.0

EXP_LIST: Experiment names.
METHOD_LIST: Public methods from supported_methods.json.
TRACE_LIST: Trace files (without .json) in TRACE_CFG_DIR.
SLO_RATIO_LIST: SLO ratios.
MODEL_PATH: Path to model (e.g., LLaMa3-8B).
NUM_LAYERS: Model layers (e.g., 32 for LLaMa3-8B).
profiled_path: Path to profiling data (see Generating Profiled Data).

Creating Custom Traces

You may create your own trace files to simulate specific workloads or existing traces we made. Place them in TRACE_CFG_DIR (e.g., benchmark/selected_traces/test_best_worst/) and list their basenames (without .json) in TRACE_LIST.

Trace Format

Trace files are JSON objects with the following structure:

batch_size: Integer, number of requests processed together.
max_model_len: Integer, maximum sequence length the model supports.
num_gpu_blocks_override: Integer, number of GPU memory blocks to use.
arrival_pattern: String, defines request arrival distribution (e.g., BimodalArrival(l1=0.16,l2=0.1,p=0.7,max=888)).
vocab: (You may ignore this one). Array of integers, vocabulary range for token generation.
peak_batch_blocks: Integer, maximum GPU blocks needed for the batch.
requests: Object mapping request IDs to details:
- category: String, request type (e.g., C2_IN3680-4496_OUT4-32).
- input_length: Integer, number of input tokens.
- output_length: Integer, number of output tokens.
- arrival_time: Integer, time (in arbitrary units) when the request arrives.
- sched_time: Integer, time when the request is scheduled.
- wait_time: Integer, time difference between arrival and scheduling.

Example Trace (example_trace.json):

{
  "batch_size": 4,
  "max_model_len": 32384,
  "num_gpu_blocks_override": 1664,
  "arrival_pattern": "BimodalArrival(l1=0.16087516087516088,l2=0.1,p=0.7,max=888)",
  "vocab": [200, 30000],
  "peak_batch_blocks": 4459,
  "requests": {
    "request_0": {
      "category": "C2_IN3680-4496_OUT4-32",
      "input_length": 3705,
      "output_length": 27,
      "arrival_time": 0,
      "sched_time": 0,
      "wait_time": 0
    },
    "request_1": {
      "category": "C2_IN448-560_OUT4-32",
      "input_length": 545,
      "output_length": 22,
      "arrival_time": 4,
      "sched_time": 4,
      "wait_time": 0
    }
  }
}

Create traces manually or with a script to match your workload.
Validate JSON syntax before running.

Generating Profiled Data

To generate the profiled_path file (e.g., profiled_results_A6000.json), use the extract_profiled_results.py script located at ${HOME}/vllm/benchmark/data_analysis/profiling/extract_profiled_results.py. This script processes two CSV files generated from benchmarks using the NoPrefetch and NextLayer methods to create profiling data tailored to your GPU and model.

Steps to Generate Profiled Data

Run Benchmarks for NoPrefetch and NextLayer:
- Configure run_orbitflow.sh with METHOD_LIST=(NoPrefetch NextLayer) and TRACE_LIST=(profile_trace).
- Ensure the trace file $HOME/vllm/benchmark/selected_traces/profile/profile_trace.json exists.
- Execute the script to generate CSV outputs with profiling trace:
```
./run_orbitflow.sh 0
```
- Find the CSV files in ${ROOT}/outputs/benchmark/${EXP}/slo${SLO}/${METHOD}/${TRACE}/outputs.csv.

Run the Profiling Script:

Use the CSV files from NoPrefetch and NextLayer runs as inputs.

Example command:

python $HOME/vllm/benchmark/data_analysis/profiling/extract_profiled_results.py \
${ROOT}/outputs/benchmark/paper_main_exp/slo1.5/NoPrefetch/profile_trace/outputs.csv \
${ROOT}/outputs/benchmark/paper_main_exp/slo1.5/NextLayer/profile_trace/outputs.csv \
--out ${ROOT}/benchmark/scripts/profiling_data/profiled_results_A6000.json

The script outputs a JSON file (e.g., profiled_results_A6000.json) containing linear fit parameters (A, B, R2) for NoPrefetch and Communication (NextLayer).

Update run_orbitflow.sh:
- Set profiled_path to the generated JSON file path (e.g., $HOME/vllm/benchmark/scripts/profiling_data/profiled_results_A6000.json).

KV Placement Methods

The following methods from supported_methods.json are available:

Flexgen: FlexGen-based prefetching strategy.
NoPrefetch: Disables prefetching entirely.
NextLayer: Prefetches only the immediate next layer.
Static1/2/4/8: Static prefetching with fixed distances (1, 2, 4, or 8 layers ahead).

Ours:

Description: Uses a solver to dynamically determine the exact number of layers to offload to the CPU for each request, optimizing resource allocation based on workload demands.

Configuration:

{
  "Ours": [
    "--prefetch-mode", "solver",
    "--prefetch-distance", "1",
    "--flattened-cache", "true",
    "--merge-prefetch-buffer", "true",
    "--pause-and-resume",
    "--enable-deposit"
  ]
}

OursUniformSolver:

Description: Employs a solver but offloads a fixed number of layers to the CPU for each request, ensuring uniform resource allocation across requests.

Configuration:

{
  "OursUniformSolver": [
    "--prefetch-mode", "solver",
    "--prefetch-distance", "1",
    "--flattened-cache", "true",
    "--merge-prefetch-buffer", "true",
    "--pause-and-resume",
    "--enable-deposit",
    "--uniform-solver"
  ]
}

DistNSingle:

Description: Uses a heuristic approach (no solver) to determine the number of layers to offload to the CPU, providing a simpler, less computationally intensive method.

Configuration:

{
  "DistNSingle": [
    "--prefetch-mode", "distn_single",
    "--prefetch-distance", "1",
    "--flattened-cache", "true",
    "--merge-prefetch-buffer", "true"
  ]
}

Add or modify these methods in supported_methods.json, ensuring valid orbitflow.py arguments.

Running the Script

Set Permissions:
```
chmod +x run_orbitflow.sh
```

Run:

Run benchmarks:
```
./run_orbitflow.sh 0
```
Skip execution:
```
./run_orbitflow.sh 1
```

Output: Saved in:
```
${ROOT}/outputs/benchmark/${EXP}/slo${SLO}/${METHOD}/${TRACE}/
```
- outputs.log: Benchmark log.
- vllm_msg.log: vLLM log.
- outputs.csv: Results (if generated).
- logging_cfg.json: Per-run logging config.
Example: Run Ours with SLO=1.5, trace test_shortshort_enough:
- Set METHOD_LIST=(Ours), SLO_RATIO_LIST=(1.5), TRACE_LIST=(test_shortshort_enough).
- Execute:
```
./run_orbitflow.sh 0
```

Name		Name	Last commit message	Last commit date
Latest commit History 4,447 Commits
.buildkite		.buildkite
.github		.github
benchmark		benchmark
benchmarks		benchmarks
build_trace		build_trace
cmake		cmake
configs		configs
csrc		csrc
docs		docs
draw_graph_temp		draw_graph_temp
examples		examples
outputs/benchmark		outputs/benchmark
samples		samples
tests		tests
tools		tools
trace_pool		trace_pool
vllm		vllm
.clang-format		.clang-format
.dockerignore		.dockerignore
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
.shellcheckrc		.shellcheckrc
.yapfignore		.yapfignore
CMakeLists.txt		CMakeLists.txt
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DCO		DCO
Dockerfile		Dockerfile
Dockerfile.arm		Dockerfile.arm
Dockerfile.cpu		Dockerfile.cpu
Dockerfile.hpu		Dockerfile.hpu
Dockerfile.neuron		Dockerfile.neuron
Dockerfile.openvino		Dockerfile.openvino
Dockerfile.ppc64le		Dockerfile.ppc64le
Dockerfile.rocm		Dockerfile.rocm
Dockerfile.tpu		Dockerfile.tpu
Dockerfile.xpu		Dockerfile.xpu
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
SECURITY.md		SECURITY.md
collect_env.py		collect_env.py
find_cuda_init.py		find_cuda_init.py
format.sh		format.sh
pyproject.toml		pyproject.toml
python_only_dev.py		python_only_dev.py
requirements-build.txt		requirements-build.txt
requirements-common.txt		requirements-common.txt
requirements-cpu.txt		requirements-cpu.txt
requirements-cuda.txt		requirements-cuda.txt
requirements-dev.txt		requirements-dev.txt
requirements-hpu.txt		requirements-hpu.txt
requirements-lint.txt		requirements-lint.txt
requirements-neuron.txt		requirements-neuron.txt
requirements-openvino.txt		requirements-openvino.txt
requirements-rocm.txt		requirements-rocm.txt
requirements-test.in		requirements-test.in
requirements-test.txt		requirements-test.txt
requirements-tpu.txt		requirements-tpu.txt
requirements-xpu.txt		requirements-xpu.txt
setup.py		setup.py
use_existing_torch.py		use_existing_torch.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

OrbitFlow Benchmark Script README

Overview

Prerequisites

Script Configuration

Creating Custom Traces

Trace Format

Generating Profiled Data

Steps to Generate Profiled Data

KV Placement Methods

Running the Script

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Contributors 753

Uh oh!

Languages

Uh oh!

License

omnia-postech/OrbitFlow

Folders and files

Latest commit

History

Repository files navigation

OrbitFlow Benchmark Script README

Overview

Prerequisites

Script Configuration

Creating Custom Traces

Trace Format

Generating Profiled Data

Steps to Generate Profiled Data

KV Placement Methods

Running the Script

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Contributors 753

Uh oh!

Languages

Packages