Skip to content

Latest commit

 

History

History
271 lines (209 loc) · 17 KB

File metadata and controls

271 lines (209 loc) · 17 KB

Benchmark Results

Back to README | All docs | Handbook

Auto-generated by npm run bench:save. Do not edit manually.

v1.0.0 · Generated: 2026-02-26

avg ratio best scenarios round-trip gzip

Summary

Metric Value
Scenarios 8
Average compression 2.08x
Best compression 6.16x
Round-trip integrity all PASS
pie title "Message Outcomes"
    "Preserved" : 90
    "Compressed" : 65
Loading

Compression by Scenario

8 scenarios · 2.08x avg ratio · 1.00x6.16x range · all round-trips PASS

xychart-beta
    title "Compression Ratio by Scenario"
    x-axis ["Coding", "Long Q&A", "Tool-heavy", "Short", "Deep", "Technical", "Structured", "Agentic"]
    y-axis "Char Ratio"
    bar [1.68, 6.16, 1.30, 1.00, 2.12, 1.00, 1.93, 1.43]
Loading
Scenario Ratio Reduction Token Ratio Messages Compressed Preserved
Coding assistant 1.68 41% 1.67 13 5 8
Long Q&A 6.16 84% 6.11 10 4 6
Tool-heavy 1.30 23% 1.29 18 2 16
Short conversation 1.00 0% 1.00 7 0 7
Deep conversation 2.12 53% 2.12 51 50 1
Technical explanation 1.00 0% 1.00 11 0 11
Structured content 1.93 48% 1.92 12 2 10
Agentic coding session 1.43 30% 1.43 33 2 31

Deduplication Impact

xychart-beta
    title "Deduplication Impact (recencyWindow=0)"
    x-axis ["Long Q&A", "Agentic"]
    y-axis "Char Ratio"
    bar [5.14, 1.14]
    bar [6.16, 1.43]
Loading

First bar: no dedup · Second bar: with dedup

Scenario No Dedup (rw=0) Dedup (rw=0) No Dedup (rw=4) Dedup (rw=4) Deduped
Coding assistant 1.68 1.68 1.51 1.51 0
Long Q&A 5.14 6.16 1.90 2.03 1
Tool-heavy 1.30 1.30 1.30 1.30 0
Short conversation 1.00 1.00 1.00 1.00 0
Deep conversation 2.12 2.12 1.95 1.95 0
Technical explanation 1.00 1.00 1.00 1.00 0
Structured content 1.93 1.93 1.37 1.37 0
Agentic coding session 1.14 1.43 1.14 1.43 4

Fuzzy Dedup

Scenario Exact Deduped Fuzzy Deduped Ratio vs Base
Coding assistant 0 0 1.68 -
Long Q&A 1 0 6.16 -
Tool-heavy 0 0 1.30 -
Short conversation 0 0 1.00 -
Deep conversation 0 0 2.12 -
Technical explanation 0 0 1.00 -
Structured content 0 0 1.93 -
Agentic coding session 4 2 2.23 +56%

Token Budget

Target: 2000 tokens · 1/4 fit

Scenario Dedup Tokens Fits recencyWindow Compressed Preserved Deduped
Deep conversation no 3738 no 0 50 1 0
Deep conversation yes 3738 no 0 50 1 0
Agentic coding session no 2345 no 0 4 33 0
Agentic coding session yes 1957 yes 9 1 32 4

Bundle Size

Zero-dependency ESM library — tracked per-file to catch regressions.

File Size Gzip
classify.js 7.5 KB 3.2 KB
compress.js 33.1 KB 8.5 KB
dedup.js 10.0 KB 2.8 KB
expand.js 2.7 KB 934 B
index.js 225 B 159 B
summarizer.js 2.5 KB 993 B
types.js 11 B 31 B
total 56.2 KB 16.6 KB

LLM vs Deterministic

Results are non-deterministic — LLM outputs vary between runs. Saved as reference data, not used for regression testing.

Deterministic vs ollama/llama3.2

Coding assistant        Det ████████░░░░░░░░░░░░░░░░░░░░░░ 1.68x
                        LLM ████████░░░░░░░░░░░░░░░░░░░░░░ 1.55x

Long Q&A                Det ██████████████████████████████ 6.16x
                        LLM ██████████████████████░░░░░░░░ 4.49x

Tool-heavy              Det ██████░░░░░░░░░░░░░░░░░░░░░░░░ 1.30x
                        LLM ██████░░░░░░░░░░░░░░░░░░░░░░░░ 1.28x

Deep conversation       Det ██████████░░░░░░░░░░░░░░░░░░░░ 2.12x
                        LLM ████████████████░░░░░░░░░░░░░░ 3.28x  ★

Technical explanation   Det █████░░░░░░░░░░░░░░░░░░░░░░░░░ 1.00x
                        LLM █████░░░░░░░░░░░░░░░░░░░░░░░░░ 1.00x

Structured content      Det █████████░░░░░░░░░░░░░░░░░░░░░ 1.93x
                        LLM ███████░░░░░░░░░░░░░░░░░░░░░░░ 1.46x

Agentic coding session  Det ███████░░░░░░░░░░░░░░░░░░░░░░░ 1.43x
                        LLM ███████░░░░░░░░░░░░░░░░░░░░░░░ 1.40x

★ = LLM wins
Deterministic vs openai/gpt-4.1-mini

Coding assistant        Det ████████░░░░░░░░░░░░░░░░░░░░░░ 1.68x
                        LLM ████████░░░░░░░░░░░░░░░░░░░░░░ 1.64x

Long Q&A                Det ██████████████████████████████ 6.16x
                        LLM ██████████████████████████░░░░ 5.37x

Tool-heavy              Det ██████░░░░░░░░░░░░░░░░░░░░░░░░ 1.30x
                        LLM █████░░░░░░░░░░░░░░░░░░░░░░░░░ 1.12x

Deep conversation       Det ██████████░░░░░░░░░░░░░░░░░░░░ 2.12x
                        LLM ████████████░░░░░░░░░░░░░░░░░░ 2.37x  ★

Technical explanation   Det █████░░░░░░░░░░░░░░░░░░░░░░░░░ 1.00x
                        LLM █████░░░░░░░░░░░░░░░░░░░░░░░░░ 1.00x

Structured content      Det █████████░░░░░░░░░░░░░░░░░░░░░ 1.93x
                        LLM ██████░░░░░░░░░░░░░░░░░░░░░░░░ 1.29x

Agentic coding session  Det ███████░░░░░░░░░░░░░░░░░░░░░░░ 1.43x
                        LLM ███████░░░░░░░░░░░░░░░░░░░░░░░ 1.43x

★ = LLM wins

Provider Summary

Provider Model Avg Ratio Avg vsDet Round-trip Budget Fits Avg Time
ollama llama3.2 2.09x 0.96 all PASS 1/4 4.2s
openai gpt-4.1-mini 2.09x 0.92 all PASS 2/4 8.1s

Key findings: LLM wins on prose-heavy scenarios: Deep conversation, Technical explanation Deterministic wins on structured/technical content: Coding assistant, Long Q&A, Tool-heavy, Structured content

ollama (llama3.2)

Generated: 2026-02-25

Scenario details
Scenario Method Char Ratio Token Ratio vsDet Compressed Preserved Round-trip Time
Coding assistant deterministic 1.68 1.67 - 5 8 PASS 0ms
llm-basic 1.48 1.48 0.88 5 8 PASS 5.9s
llm-escalate 1.55 1.55 0.92 5 8 PASS 3.0s
Long Q&A deterministic 6.16 6.11 - 4 6 PASS 1ms
llm-basic 4.31 4.28 0.70 4 6 PASS 4.1s
llm-escalate 4.49 4.46 0.73 4 6 PASS 3.7s
Tool-heavy deterministic 1.30 1.29 - 2 16 PASS 2ms
llm-basic 1.12 1.11 0.86 2 16 PASS 2.3s
llm-escalate 1.28 1.28 0.99 2 16 PASS 2.8s
Deep conversation deterministic 2.12 2.12 - 50 1 PASS 3ms
llm-basic 3.12 3.11 1.47 50 1 PASS 22.7s
llm-escalate 3.28 3.26 1.54 50 1 PASS 23.3s
Technical explanation deterministic 1.00 1.00 - 0 11 PASS 1ms
llm-basic 1.00 1.00 1.00 0 11 PASS 3.2s
llm-escalate 1.00 1.00 1.00 2 9 PASS 785ms
Structured content deterministic 1.93 1.92 - 2 10 PASS 0ms
llm-basic 1.46 1.45 0.75 2 10 PASS 3.5s
llm-escalate 1.38 1.38 0.71 2 10 PASS 3.7s
Agentic coding session deterministic 1.43 1.43 - 2 31 PASS 1ms
llm-basic 1.35 1.34 0.94 2 31 PASS 3.3s
llm-escalate 1.40 1.40 0.98 2 31 PASS 5.4s

Token Budget (target: 2000 tokens)

Scenario Method Tokens Fits recencyWindow Ratio Round-trip Time
Deep conversation deterministic 3738 false 0 2.12 PASS 12ms
llm-escalate 2593 false 0 3.08 PASS 132.0s
Agentic coding session deterministic 1957 true 9 1.36 PASS 2ms
llm-escalate 2003 false 9 1.33 PASS 4.1s

openai (gpt-4.1-mini)

Generated: 2026-02-25

Scenario details
Scenario Method Char Ratio Token Ratio vsDet Compressed Preserved Round-trip Time
Coding assistant deterministic 1.68 1.67 - 5 8 PASS 0ms
llm-basic 1.64 1.63 0.98 5 8 PASS 5.6s
llm-escalate 1.63 1.63 0.97 5 8 PASS 6.0s
Long Q&A deterministic 6.16 6.11 - 4 6 PASS 1ms
llm-basic 5.37 5.33 0.87 4 6 PASS 5.9s
llm-escalate 5.35 5.31 0.87 4 6 PASS 7.0s
Tool-heavy deterministic 1.30 1.29 - 2 16 PASS 0ms
llm-basic 1.11 1.10 0.85 2 16 PASS 3.5s
llm-escalate 1.12 1.12 0.86 2 16 PASS 5.3s
Deep conversation deterministic 2.12 2.12 - 50 1 PASS 3ms
llm-basic 2.34 2.33 1.10 50 1 PASS 50.4s
llm-escalate 2.37 2.36 1.11 50 1 PASS 50.8s
Technical explanation deterministic 1.00 1.00 - 0 11 PASS 1ms
llm-basic 1.00 1.00 1.00 1 10 PASS 2.6s
llm-escalate 1.00 1.00 1.00 1 10 PASS 3.3s
Structured content deterministic 1.93 1.92 - 2 10 PASS 0ms
llm-basic 1.23 1.23 0.64 2 10 PASS 10.2s
llm-escalate 1.29 1.29 0.67 2 10 PASS 4.8s
Agentic coding session deterministic 1.43 1.43 - 2 31 PASS 1ms
llm-basic 1.43 1.43 1.00 2 31 PASS 5.8s
llm-escalate 1.32 1.32 0.93 1 32 PASS 9.5s

Token Budget (target: 2000 tokens)

Scenario Method Tokens Fits recencyWindow Ratio Round-trip Time
Deep conversation deterministic 3738 false 0 2.12 PASS 10ms
llm-escalate 3391 false 0 2.35 PASS 280.5s
Agentic coding session deterministic 1957 true 9 1.36 PASS 2ms
llm-escalate 1915 true 3 1.39 PASS 28.1s

Methodology

  • All deterministic results use the same input → same output guarantee
  • Metrics: compression ratio, token ratio, message counts, dedup counts
  • Timing is excluded from baselines (hardware-dependent)
  • LLM benchmarks are saved as reference data, not used for regression testing
  • Round-trip integrity is verified for every scenario (compress then uncompress)