Benchmark Results

Auto-generated by npm run bench:save. Do not edit manually.

v1.0.0 · Generated: 2026-02-26

Summary

Metric	Value
Scenarios	8
Average compression	2.08x
Best compression	6.16x
Round-trip integrity	all PASS

pie title "Message Outcomes"
    "Preserved" : 90
    "Compressed" : 65

Compression by Scenario

8 scenarios · 2.08x avg ratio · 1.00x – 6.16x range · all round-trips PASS

xychart-beta
    title "Compression Ratio by Scenario"
    x-axis ["Coding", "Long Q&A", "Tool-heavy", "Short", "Deep", "Technical", "Structured", "Agentic"]
    y-axis "Char Ratio"
    bar [1.68, 6.16, 1.30, 1.00, 2.12, 1.00, 1.93, 1.43]

Scenario	Ratio	Reduction	Token Ratio	Messages	Compressed	Preserved
Coding assistant	1.68	41%	1.67	13	5	8
Long Q&A	6.16	84%	6.11	10	4	6
Tool-heavy	1.30	23%	1.29	18	2	16
Short conversation	1.00	0%	1.00	7	0	7
Deep conversation	2.12	53%	2.12	51	50	1
Technical explanation	1.00	0%	1.00	11	0	11
Structured content	1.93	48%	1.92	12	2	10
Agentic coding session	1.43	30%	1.43	33	2	31

Deduplication Impact

xychart-beta
    title "Deduplication Impact (recencyWindow=0)"
    x-axis ["Long Q&A", "Agentic"]
    y-axis "Char Ratio"
    bar [5.14, 1.14]
    bar [6.16, 1.43]

First bar: no dedup · Second bar: with dedup

Scenario	No Dedup (rw=0)	Dedup (rw=0)	No Dedup (rw=4)	Dedup (rw=4)	Deduped
Coding assistant	1.68	1.68	1.51	1.51	0
Long Q&A	5.14	6.16	1.90	2.03	1
Tool-heavy	1.30	1.30	1.30	1.30	0
Short conversation	1.00	1.00	1.00	1.00	0
Deep conversation	2.12	2.12	1.95	1.95	0
Technical explanation	1.00	1.00	1.00	1.00	0
Structured content	1.93	1.93	1.37	1.37	0
Agentic coding session	1.14	1.43	1.14	1.43	4

Fuzzy Dedup

Scenario	Exact Deduped	Fuzzy Deduped	Ratio	vs Base
Coding assistant	0	0	1.68	-
Long Q&A	1	0	6.16	-
Tool-heavy	0	0	1.30	-
Short conversation	0	0	1.00	-
Deep conversation	0	0	2.12	-
Technical explanation	0	0	1.00	-
Structured content	0	0	1.93	-
Agentic coding session	4	2	2.23	+56%

Token Budget

Target: 2000 tokens · 1/4 fit

Scenario	Dedup	Tokens	Fits	recencyWindow	Compressed	Preserved	Deduped
Deep conversation	no	3738	no	0	50	1	0
Deep conversation	yes	3738	no	0	50	1	0
Agentic coding session	no	2345	no	0	4	33	0
Agentic coding session	yes	1957	yes	9	1	32	4

Bundle Size

Zero-dependency ESM library — tracked per-file to catch regressions.

File	Size	Gzip
classify.js	7.5 KB	3.2 KB
compress.js	33.1 KB	8.5 KB
dedup.js	10.0 KB	2.8 KB
expand.js	2.7 KB	934 B
index.js	225 B	159 B
summarizer.js	2.5 KB	993 B
types.js	11 B	31 B
total	56.2 KB	16.6 KB

LLM vs Deterministic

Results are non-deterministic — LLM outputs vary between runs. Saved as reference data, not used for regression testing.

Deterministic vs ollama/llama3.2

Coding assistant        Det ████████░░░░░░░░░░░░░░░░░░░░░░ 1.68x
                        LLM ████████░░░░░░░░░░░░░░░░░░░░░░ 1.55x

Long Q&A                Det ██████████████████████████████ 6.16x
                        LLM ██████████████████████░░░░░░░░ 4.49x

Tool-heavy              Det ██████░░░░░░░░░░░░░░░░░░░░░░░░ 1.30x
                        LLM ██████░░░░░░░░░░░░░░░░░░░░░░░░ 1.28x

Deep conversation       Det ██████████░░░░░░░░░░░░░░░░░░░░ 2.12x
                        LLM ████████████████░░░░░░░░░░░░░░ 3.28x  ★

Technical explanation   Det █████░░░░░░░░░░░░░░░░░░░░░░░░░ 1.00x
                        LLM █████░░░░░░░░░░░░░░░░░░░░░░░░░ 1.00x

Structured content      Det █████████░░░░░░░░░░░░░░░░░░░░░ 1.93x
                        LLM ███████░░░░░░░░░░░░░░░░░░░░░░░ 1.46x

Agentic coding session  Det ███████░░░░░░░░░░░░░░░░░░░░░░░ 1.43x
                        LLM ███████░░░░░░░░░░░░░░░░░░░░░░░ 1.40x

★ = LLM wins

Deterministic vs openai/gpt-4.1-mini

Coding assistant        Det ████████░░░░░░░░░░░░░░░░░░░░░░ 1.68x
                        LLM ████████░░░░░░░░░░░░░░░░░░░░░░ 1.64x

Long Q&A                Det ██████████████████████████████ 6.16x
                        LLM ██████████████████████████░░░░ 5.37x

Tool-heavy              Det ██████░░░░░░░░░░░░░░░░░░░░░░░░ 1.30x
                        LLM █████░░░░░░░░░░░░░░░░░░░░░░░░░ 1.12x

Deep conversation       Det ██████████░░░░░░░░░░░░░░░░░░░░ 2.12x
                        LLM ████████████░░░░░░░░░░░░░░░░░░ 2.37x  ★

Technical explanation   Det █████░░░░░░░░░░░░░░░░░░░░░░░░░ 1.00x
                        LLM █████░░░░░░░░░░░░░░░░░░░░░░░░░ 1.00x

Structured content      Det █████████░░░░░░░░░░░░░░░░░░░░░ 1.93x
                        LLM ██████░░░░░░░░░░░░░░░░░░░░░░░░ 1.29x

Agentic coding session  Det ███████░░░░░░░░░░░░░░░░░░░░░░░ 1.43x
                        LLM ███████░░░░░░░░░░░░░░░░░░░░░░░ 1.43x

★ = LLM wins

Provider Summary

Provider	Model	Avg Ratio	Avg vsDet	Round-trip	Budget Fits	Avg Time
ollama	llama3.2	2.09x	0.96	all PASS	1/4	4.2s
openai	gpt-4.1-mini	2.09x	0.92	all PASS	2/4	8.1s

Key findings: LLM wins on prose-heavy scenarios: Deep conversation, Technical explanation Deterministic wins on structured/technical content: Coding assistant, Long Q&A, Tool-heavy, Structured content

ollama (llama3.2)

Generated: 2026-02-25

Scenario details

Scenario	Method	Char Ratio	Token Ratio	vsDet	Compressed	Preserved	Round-trip	Time
Coding assistant	deterministic	1.68	1.67	-	5	8	PASS	0ms
	llm-basic	1.48	1.48	0.88	5	8	PASS	5.9s
	llm-escalate	1.55	1.55	0.92	5	8	PASS	3.0s
Long Q&A	deterministic	6.16	6.11	-	4	6	PASS	1ms
	llm-basic	4.31	4.28	0.70	4	6	PASS	4.1s
	llm-escalate	4.49	4.46	0.73	4	6	PASS	3.7s
Tool-heavy	deterministic	1.30	1.29	-	2	16	PASS	2ms
	llm-basic	1.12	1.11	0.86	2	16	PASS	2.3s
	llm-escalate	1.28	1.28	0.99	2	16	PASS	2.8s
Deep conversation	deterministic	2.12	2.12	-	50	1	PASS	3ms
	llm-basic	3.12	3.11	1.47	50	1	PASS	22.7s
	llm-escalate	3.28	3.26	1.54	50	1	PASS	23.3s
Technical explanation	deterministic	1.00	1.00	-	0	11	PASS	1ms
	llm-basic	1.00	1.00	1.00	0	11	PASS	3.2s
	llm-escalate	1.00	1.00	1.00	2	9	PASS	785ms
Structured content	deterministic	1.93	1.92	-	2	10	PASS	0ms
	llm-basic	1.46	1.45	0.75	2	10	PASS	3.5s
	llm-escalate	1.38	1.38	0.71	2	10	PASS	3.7s
Agentic coding session	deterministic	1.43	1.43	-	2	31	PASS	1ms
	llm-basic	1.35	1.34	0.94	2	31	PASS	3.3s
	llm-escalate	1.40	1.40	0.98	2	31	PASS	5.4s

Token Budget (target: 2000 tokens)

Scenario	Method	Tokens	Fits	recencyWindow	Ratio	Round-trip	Time
Deep conversation	deterministic	3738	false	0	2.12	PASS	12ms
	llm-escalate	2593	false	0	3.08	PASS	132.0s
Agentic coding session	deterministic	1957	true	9	1.36	PASS	2ms
	llm-escalate	2003	false	9	1.33	PASS	4.1s

openai (gpt-4.1-mini)

Generated: 2026-02-25

Scenario details

Scenario	Method	Char Ratio	Token Ratio	vsDet	Compressed	Preserved	Round-trip	Time
Coding assistant	deterministic	1.68	1.67	-	5	8	PASS	0ms
	llm-basic	1.64	1.63	0.98	5	8	PASS	5.6s
	llm-escalate	1.63	1.63	0.97	5	8	PASS	6.0s
Long Q&A	deterministic	6.16	6.11	-	4	6	PASS	1ms
	llm-basic	5.37	5.33	0.87	4	6	PASS	5.9s
	llm-escalate	5.35	5.31	0.87	4	6	PASS	7.0s
Tool-heavy	deterministic	1.30	1.29	-	2	16	PASS	0ms
	llm-basic	1.11	1.10	0.85	2	16	PASS	3.5s
	llm-escalate	1.12	1.12	0.86	2	16	PASS	5.3s
Deep conversation	deterministic	2.12	2.12	-	50	1	PASS	3ms
	llm-basic	2.34	2.33	1.10	50	1	PASS	50.4s
	llm-escalate	2.37	2.36	1.11	50	1	PASS	50.8s
Technical explanation	deterministic	1.00	1.00	-	0	11	PASS	1ms
	llm-basic	1.00	1.00	1.00	1	10	PASS	2.6s
	llm-escalate	1.00	1.00	1.00	1	10	PASS	3.3s
Structured content	deterministic	1.93	1.92	-	2	10	PASS	0ms
	llm-basic	1.23	1.23	0.64	2	10	PASS	10.2s
	llm-escalate	1.29	1.29	0.67	2	10	PASS	4.8s
Agentic coding session	deterministic	1.43	1.43	-	2	31	PASS	1ms
	llm-basic	1.43	1.43	1.00	2	31	PASS	5.8s
	llm-escalate	1.32	1.32	0.93	1	32	PASS	9.5s

Token Budget (target: 2000 tokens)

Scenario	Method	Tokens	Fits	recencyWindow	Ratio	Round-trip	Time
Deep conversation	deterministic	3738	false	0	2.12	PASS	10ms
	llm-escalate	3391	false	0	2.35	PASS	280.5s
Agentic coding session	deterministic	1957	true	9	1.36	PASS	2ms
	llm-escalate	1915	true	3	1.39	PASS	28.1s

Methodology

All deterministic results use the same input → same output guarantee
Metrics: compression ratio, token ratio, message counts, dedup counts
Timing is excluded from baselines (hardware-dependent)
LLM benchmarks are saved as reference data, not used for regression testing
Round-trip integrity is verified for every scenario (compress then uncompress)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark Results

Summary

Compression by Scenario

Deduplication Impact

Fuzzy Dedup

Token Budget

Bundle Size

LLM vs Deterministic

Provider Summary

ollama (llama3.2)

Token Budget (target: 2000 tokens)

openai (gpt-4.1-mini)

Token Budget (target: 2000 tokens)

Methodology

FilesExpand file tree

benchmark-results.md

Latest commit

History

benchmark-results.md

File metadata and controls

Benchmark Results

Summary

Compression by Scenario

Deduplication Impact

Fuzzy Dedup

Token Budget

Bundle Size

LLM vs Deterministic

Provider Summary

ollama (llama3.2)

Token Budget (target: 2000 tokens)

openai (gpt-4.1-mini)

Token Budget (target: 2000 tokens)

Methodology