Dynamic Spec - OpenVINO Speculative Decoding

This project tests and compares OpenVINO speculative decoding strategies for LLM inference optimization.

Overview

Speculative decoding uses a smaller "draft" model to generate token candidates that are then verified by a larger "target" model, potentially improving inference speed.

Models

Target Model: Phi-3-mini-4k-instruct (4B parameters, INT4 quantized)
Draft Model: Phi-3-mini-FastDraft (50M parameters, INT8 quantized)

Strategies Tested

No Speculation: Baseline inference without speculation
Fixed Speculation: Uses num_assistant_tokens=5 (fixed number of speculative tokens)
Dynamic Speculation: Uses assistant_confidence_threshold=0.1 (adaptive token acceptance)

Setup

Option 1: Use Virtual Environment (Recommended)

# Create and activate virtual environment
python -m venv venv_ov_test
source venv_ov_test/bin/activate  # On Windows: venv_ov_test\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Option 2: Install directly

pip install "huggingface-hub<1.0" "transformers" "tokenizers" \
    "openvino>=2024.5.0" "openvino-tokenizers>=2024.5.0" "openvino-genai>=2024.5.0" \
    requests

Usage

Jupyter Notebook

Open OV_test.ipynb in Jupyter or VS Code and run the cells sequentially.

Python Script

Run the standalone Python script with various options:

# Activate virtual environment first
source venv_ov_test/bin/activate

# Run with defaults
python ov_test.py

# Run with custom configuration
python ov_test.py --device CPU --max-tokens 200 --warmup

# Skip baseline test and use custom confidence threshold
python ov_test.py --skip-no-spec --confidence-threshold 0.2

# Use custom prompt
python ov_test.py --prompt "Artificial intelligence is"

# See all options
python ov_test.py --help

Metrics

The notebook measures:

Generation time (seconds)
Number of tokens generated
Tokens per second (throughput)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
OV_test.ipynb		OV_test.ipynb
README.md		README.md
ov_test.py		ov_test.py
requirements.txt		requirements.txt
run_ov_test.sh		run_ov_test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dynamic Spec - OpenVINO Speculative Decoding

Overview

Models

Strategies Tested

Setup

Option 1: Use Virtual Environment (Recommended)

Option 2: Install directly

Usage

Jupyter Notebook

Python Script

Metrics

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Dynamic Spec - OpenVINO Speculative Decoding

Overview

Models

Strategies Tested

Setup

Option 1: Use Virtual Environment (Recommended)

Option 2: Install directly

Usage

Jupyter Notebook

Python Script

Metrics

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages