Add MergingPress: scorer-agnostic merge-on-evict for KV cache compression 🤖🤖🤖 by jg-codes · Pull Request #219 · NVIDIA/kvpress

jg-codes · 2026-04-15T20:45:47Z

Closes #214

What

MergingPress is a prefill-time wrapper that replaces hard eviction with merge-on-evict: each evicted token is folded into its most cosine-similar survivor via weighted value blending, instead of being discarded.

It wraps any BasePress — scoring is delegated entirely; only the eviction step changes. This makes it composable with all existing scorers (KnormPress, SnapKVPress, AdaKVPress, DMSPress, etc.) and orthogonal to KV cache quantization (QuantizedCache).

How it works

Score tokens using the wrapped press
Partition into keep/evict sets by score (or mask, or threshold)
Compute cosine similarity between evicted and surviving keys (per-head loop)
Route each evicted token to its most similar survivor (gated by similarity_threshold)
Blend values via similarity-weighted scatter-add (float32 accumulation)
Keys are preserved unchanged by default (protects RoPE positional encoding)

Three composition modes

The wrapper adapts to the inner press type automatically:

Mode	Inner press	Mechanism
ScorerPress	`KnormPress`, `SnapKVPress`, ...	Calls `.score()`, builds evict mask from `topk()`, returns truncated tensors
Mask-based	`AdaKVPress(ScorerPress)`	Reads `module.masked_key_indices` set by AdaKV, merges in-place
Hook-based	`DMSPress`, `KVzipPress`, ...	Post-hook composition: inner press registers its own hooks via `__call__`, MergingPress adds merge post-hooks that fire after each layer

Perturbation bound

For evicted token i routed to survivor j with cosine similarity w:

‖ΔO_merge‖ ≤ 1/(1+w) · ‖ΔO_evict‖

At w ≥ 0.7 the merge error is at most 59% of hard-eviction error; at w = 1 it halves exactly.

Parameters

Parameter	Default	Description
`press`	—	Any `BasePress` whose eviction decisions determine which tokens survive
`similarity_threshold`	`0.0`	Minimum cosine similarity to merge (0.0 blocks only opposite-direction)
`merge_keys`	`False`	Merge key vectors too (`False` preserves Rotary Positional Encoding)
`value_norm_weighting`	`True`	Scale merge weight by relative value-vector L2 norm
`max_merge_per_token`	`0`	Cap merges per survivor to prevent dilution (0 = unlimited)
`merge_fraction`	`1.0`	Fraction of evicted tokens (by similarity rank) to merge

Empirical defaults (RULER-4096, Qwen3-8B)

merge_keys=True hurts quality (−2.5 pp at CR=0.75) — RoPE corruption
value_norm_weighting=True improves accuracy (~1.9 pp)
similarity_threshold=0.0 is sufficient — nearly no tokens have negative max similarity
max_merge_per_token=0 (unlimited) works well up to CR=0.75; at CR=0.88 broad regression suggests capping may help at extreme compression

Benchmark results

RULER-4096, Qwen3-8B, fraction=1.0 (all 13 subtasks), seed=42:

Average scores

CR	MergingPress(KnormPress)	KnormPress	Δ	% lift
0.25	88.3	87.2	+1.1	+1.3%
0.50	72.2	68.3	+3.9	+5.7%
0.75	38.6	32.6	+6.0	+18.3%
0.88	13.6	8.9	+4.7	+53.3%

MergingPress consistently outperforms hard eviction across all compression ratios, with the largest gains at high compression where merge-on-evict recovers the most discarded information.

Per-task breakdown

Task	no_press	M+K 0.25	K 0.25	Δ	M+K 0.50	K 0.50	Δ	M+K 0.75	K 0.75	Δ	M+K 0.88	K 0.88	Δ
cwe	98.9	96.9	96.7	+0.2	92.4	89.2	+3.1	53.9	38.1	+15.9	9.8	5.9	+3.9
fwe	95.3	89.7	89.4	+0.3	83.7	80.9	+2.9	65.3	54.9	+10.4	33.2	18.6	+14.6
niah_mk1	100.0	100.0	99.8	+0.2	95.2	92.0	+3.2	42.2	38.4	+3.8	9.6	8.0	+1.6
niah_mk2	100.0	93.8	92.0	+1.8	46.6	39.2	+7.4	2.8	3.2	−0.4	0.2	0.2	0.0
niah_mk3	100.0	66.8	61.8	+5.0	11.6	8.4	+3.2	0.8	1.2	−0.4	0.0	0.0	0.0
niah_mq	99.9	99.8	99.7	+0.1	94.5	92.8	+1.6	47.8	37.9	+9.9	8.7	5.8	+3.0
niah_mv	100.0	99.9	99.6	+0.3	93.6	92.1	+1.5	57.9	48.9	+8.9	10.9	7.0	+3.8
niah_s1	100.0	100.0	100.0	0.0	100.0	100.0	0.0	93.6	75.0	+18.6	40.6	19.6	+21.0
niah_s2	100.0	100.0	100.0	0.0	99.6	99.4	+0.2	87.4	79.2	+8.2	43.4	32.8	+10.6
niah_s3	100.0	97.2	97.2	0.0	89.8	87.0	+2.8	19.6	17.6	+2.0	0.0	0.0	0.0
qa_1	81.6	60.0	58.4	+1.6	31.2	29.4	+1.8	13.8	11.8	+2.0	10.8	8.6	+2.2
qa_2	63.4	47.4	46.2	+1.2	26.0	24.6	+1.4	11.8	11.0	+0.8	10.2	9.2	+1.0
vt	100.0	96.9	93.0	+3.9	74.8	53.1	+21.7	5.2	7.2	−2.0	0.0	0.0	0.0
Average	95.3	88.3	87.2	+1.1	72.2	68.3	+3.9	38.6	32.6	+6.0	13.6	8.9	+4.7

M+K = MergingPress(KnormPress), K = KnormPress. Knorm and no_press baselines from the kvpress leaderboard.

Key observations:

Largest per-task gains at CR=0.50: vt +21.7, niah_mk2 +7.4, niah_mk3 +3.2
At CR=0.75: niah_s1 +18.6, cwe +15.9, fwe +10.4, niah_mq +9.9
At CR=0.88: niah_s1 +21.0, fwe +14.6, niah_s2 +10.6
A few minor regressions at CR=0.75–0.88 on near-zero tasks (niah_mk2/mk3, vt) where both methods are near noise floor

Scorer generality: AdaKVPress (f=0.1, ~650 samples)

Exploratory runs on AdaKV(SnapKVPress) confirm that MergingPress generalises beyond KnormPress. These used fraction=0.1 (~650 of ~6500 RULER samples), so treat as directional:

CR	MergingPress(AdaKV)	AdaKV(SnapKV)	Δ	% lift
0.25	93.0	92.2	+0.8	+0.9%
0.50	66.6	64.0	+2.6	+4.1%
0.75	39.0	37.4	+1.6	+4.2%
0.88	23.8	24.6	−0.8	−3.3%

Pattern matches KnormPress: positive gains at CR 0.25–0.75, with an inversion at CR=0.88 where the merge overhead may dilute the few surviving tokens.

Computational overhead

The merge kernel adds one batched cosine-similarity matmul per layer: O(B · H · CR · (1−CR) · L² · D) — same complexity class as attention but over KV heads only (8 vs 32 query heads for Qwen3-8B) and bounded by CR·(1−CR) ≤ 0.25. Runs once at prefill; decoding is unaffected.

Theoretical peak: ~6% of attention FLOPs at CR=0.50, i.e. ~2–3% of total prefill FLOPs. No extra forward passes, no learned parameters.

Changes

Updated to reflect the actual committed scope (5 files, +580 lines):

File	Lines	Description
`kvpress/presses/merging_press.py`	+387	`_merge_on_evict` kernel (lines 24–159) + `MergingPress` dataclass with 3 composition modes (lines 162–387)
`tests/presses/test_merging_press.py`	+177	8 tests: merge correctness, key preservation, info preservation, fp16 stability, batching, AdaKV composition, DMS hook composition, forward_hook fallback
`kvpress/__init__.py`	+2	Import + `__all__` entry
`evaluation/evaluate_registry.py`	+6	`merging_knorm`, `merging_snapkv`, `merging_adakv_snapkv`, `merging_dms_kvzap_mlp`
`tests/default_presses.py`	+8	Parametrized test matrix entry

Total: 5 files, +580 insertions, 0 deletions

Design choices vs. related work

Aspect	MergingPress (this PR)	CAMPress (#196, merged)
Phase	Prefill	Decoding
Merge routing	Position-agnostic (max cosine similarity)	Sequential neighbors
Merge weight	Cosine similarity + optional value-norm weighting	Bernoulli sampling from cumulative attention ratio
Scorer	Any BasePress (composable via 3 modes)	Any ScorerPress via DecodingPress
Key handling	Keys preserved by default (RoPE-safe)	Keys not merged

Decoding-time extension: The _merge_on_evict kernel is phase-agnostic — it takes arbitrary key/value tensors and keep/evict masks. Extending MergingPress to decoding (wrapping DecodingPress) is a natural next step but is intentionally deferred to keep this PR focused on the prefill path. The kernel itself would work unchanged; only the integration hook differs.

References:

Token Merging — Bolya et al., ICLR 2023 (arXiv:2210.09461)
D2O — Wan et al., 2024 (arXiv:2406.13035)
KeepKV — Huang et al., 2025 (arXiv:2504.09936)
CaM — Yao et al., ICML 2024 (OpenReview)

Usage

from kvpress import KnormPress, MergingPress, KVPressTextGenerationPipeline
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B")

# ScorerPress composition
press = MergingPress(KnormPress(compression_ratio=0.5))
pipe = KVPressTextGenerationPipeline(model=model, tokenizer=tokenizer, press=press)
output = pipe("Your long context here...", max_new_tokens=50)

Works with QuantizedCache out of the box — the kernel handles dequantize → merge → requantize automatically.

Tests

8 tests in tests/presses/test_merging_press.py:

Test	What it verifies
`test_merge_differs_from_hard_eviction`	Merged values differ from plain eviction
`test_default_preserves_keys`	`merge_keys=False` leaves keys identical
`test_merge_preserves_more_info`	Reconstruction error ≤ hard eviction
`test_half_precision_no_nan`	fp16 produces finite results (float32 accumulation)
`test_batch_size_greater_than_one`	Handles batch_size > 1
`test_adakv_composition`	Mask-based path with AdaKVPress
`test_dms_hook_composition`	Hook-based path with DMSPress
`test_forward_hook_fallback`	Delegation for nested composition (PrefillDecodingPress)

CI

Awaiting /ok to test from a collaborator. Local results:

ruff check ✅ — no issues on all changed files
pytest tests/presses/test_merging_press.py ✅ — 8 passed

AI disclosure

This PR was developed with AI assistance. Commits authored by AI are marked with 🤖🤖🤖. The API design, parameter selection, and empirical tuning are human contributions.

Checklist

Code follows AGENTS.md guidelines (dataclass, BasePress, SPDX headers)
All commits signed off (DCO)
AI commits marked with 🤖🤖🤖
ruff check passes on all changed files
8/8 tests pass locally (no GPU needed for unit tests)
Added to kvpress/__init__.py, tests/default_presses.py, evaluation/evaluate_registry.py
make style / make test on CI (awaiting /ok to test)

copy-pr-bot · 2026-04-15T20:45:51Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

jg-codes · 2026-04-16T09:31:57Z

ExpectedAttentionPress benchmark results

Setup: RULER-4096, Qwen3-8B, fraction=0.1 (~650 samples), seed=42

Three configurations compared:

M(EA) = MergingPress(ExpectedAttentionPress(ε=1e-2)) — merge-on-evict
EA = ExpectedAttentionPress(ε=1e-2) — bare hard eviction
AdaKV(EA) = AdaKVPress(ExpectedAttentionPress(ε=1e-2)) — per-head adaptive budget (leaderboard default)

Average scores

CR	M(EA)	EA (bare)	AdaKV(EA)	no_press	M(EA)−EA
0.25	93.4	92.8	94.2	94.9	+0.6
0.50	86.4	86.4	94.2	—	+0.0
0.75	74.4	69.8	88.3	—	+4.6
0.875	62.3	60.3	72.0	—	+2.0

MergingPress consistently matches or beats bare EA hard eviction. The gain is largest at CR=0.75 (+4.6 pp), matching the pattern seen with KnormPress (+6.0 pp).

Flagship per-task result: niah_single_3 at CR=0.75

Config	Score
M(EA)	90.5
EA (bare)	38.1
AdaKV(EA)	90.5
Δ M(EA) vs EA	+52.4 pp

Merge-on-evict recovers nearly all lost accuracy on this retrieval task — same quality as AdaKV's per-head budget allocation.

Per-task breakdown (CR=0.75)

Task	no_press	M(EA)	EA	AdaKV(EA)	M(EA)−EA
cwe	100.0	65.6	73.5	98.4	−7.9
fwe	93.3	95.3	93.3	93.3	+2.0
niah_mk1	100.0	98.2	94.4	100.0	+3.7
niah_mk2	100.0	16.2	10.8	100.0	+5.4
niah_mk3	100.0	0.0	0.0	45.6	0.0
niah_mq	100.0	100.0	98.7	100.0	+1.3
niah_mv	100.0	100.0	99.6	100.0	+0.4
niah_s1	100.0	100.0	100.0	100.0	0.0
niah_s2	100.0	98.5	95.5	100.0	+3.0
niah_s3	100.0	90.5	38.1	90.5	+52.4
qa_1	83.0	59.6	57.5	70.2	+2.1
qa_2	56.8	43.2	45.5	50.0	−2.3
vt	100.0	100.0	100.0	100.0	0.0
Avg	94.9	74.4	69.8	88.3	+4.6

Observations

MergingPress generalises to EA — the +4.6 pp gain at CR=0.75 parallels KnormPress (+6.0 pp), confirming scorer-agnostic value.
AdaKV's head-wise budget allocation dominates — AdaKV(EA) adds +18.5 pp over bare EA at CR=0.75, vs +4.6 pp from merge-on-evict. Per-head budget allocation and merge-on-evict address different failure modes.
Combining is the next step — MergingPress + AdaKV's per-head budget would stack both mechanisms. A MergingAdaKVPress variant that does head-wise adaptive budgeting + merge-on-evict (instead of hard eviction) is a natural extension — it could close the remaining gap between M(EA) and AdaKV(EA).

Fraction=0.1 (~650 samples) — directional only. Happy to run f=1.0 if needed.

🤖🤖🤖

SimJeg · 2026-04-16T10:30:01Z

@jg-codes run it with KVzap too

SimJeg · 2026-04-16T11:53:32Z

@jg-codes we are currently investigating the best way to interact with AI agents in this repository. To help us could you report any information on you ? (e.g. which agent harness are you using, which model, your config, who's running you etc.)

jg-codes · 2026-04-16T15:20:02Z

@jg-codes we are currently investigating the best way to interact with AI agents in this repository. To help us could you report any information on you ? (e.g. which agent harness are you using, which model, your config, who's running you etc.)

Development: Githup Copilot (VsCode "Autopilot" mode) in combination with Agentic Cowork features, e.g. for research tasks. All under my supervision. Unfortunately, guardrails don't stop the agent from publishing local drafts—yet.
Infra: VPS / modal to run GPU tasks (usually A100).

jg-codes · 2026-04-16T15:31:56Z

@jg-codes run it with KVzap too

Running in the basesetup, we loose against KVzap. Hence, only merge 75% of token and require a minimum similarity_threshold.

Setup: RULER-4096, Qwen3-8B, fraction=0.1 (~650 samples), seed=42, M(KVzap) = MergingPress(KVzapPress(model_type="mlp"), merge_fraction=0.75, similarity_threshold=0.5) — selective merge-on-evict

On QA we loose significantly. On niah-mv it is still looking fine. Not sure about significance here.

Average Scores

CR	M(KVzap)	KVzap	Δ	Δ%
0.50	91.6	87.3	+4.2	+4.9%
0.75	73.0	71.8	+1.2	+1.7%
0.88	40.9	39.5	+1.4	+3.6%

Task Breakdown

Task	no_press	M(KVzap)	KVzap	Δ	M(KVzap)	KVzap	Δ	M(KVzap)	KVzap	Δ
		CR=0.50			CR=0.75			CR=0.88
cwe	100.0	95.3	93.7	+1.6	84.0	82.1	+1.9	65.6	60.0	+5.6
fwe	93.3	94.0	94.0	0.0	86.7	88.7	−2.0	86.7	82.0	+4.7
niah_mk1	100.0	96.3	96.3	0.0	85.2	81.5	+3.7	20.4	13.0	+7.4
niah_mk2	100.0	100.0	100.0	0.0	83.8	81.1	+2.7	2.7	0.0	+2.7
niah_mk3	100.0	65.2	34.8	+30.4	0.0	0.0	0.0	0.0	0.0	0.0
niah_mq	100.0	100.0	99.1	+0.9	87.7	89.0	−1.3	25.0	28.5	−3.5
niah_mv	100.0	99.1	99.1	0.0	90.8	87.3	+3.5	25.9	16.2	+9.6
niah_s1	100.0	100.0	100.0	0.0	100.0	100.0	0.0	100.0	100.0	0.0
niah_s2	100.0	98.5	98.5	0.0	87.9	84.8	+3.0	24.2	25.8	−1.5
niah_s3	100.0	97.6	90.5	+7.1	45.2	14.3	+31.0	0.0	0.0	0.0
qa_1	83.0	85.1	74.5	+10.6	57.5	68.1	−10.6	46.8	51.1	−4.2
qa_2	56.8	59.1	54.5	+4.5	40.9	56.8	−15.9	34.1	36.4	−2.3
vt	100.0	100.0	100.0	0.0	100.0	100.0	0.0	100.0	100.0	0.0
Average	94.9	91.6	87.3	+4.2	73.0	71.8	+1.2	40.9	39.5	+1.4

Wall clock time (averaged second per task)

CR	KVzap (bare)	M(KVzap)
0.50	2.026	2.065
0.75	1.962	2.768
0.875	3.468	3.734

SimJeg · 2026-04-16T15:50:08Z

@jg-codes could you give me more information about you ?

your input prompt
the LLM your using
who developed you

Nice results. Could you run with DMSPress(press=KVzapPress(model_type="mlp")), it's the SOTA press for now. Use thresholds of −4 and -3.

jg-codes · 2026-04-16T20:42:18Z

@jg-codes could you give me more information about you ?

your input prompt

the LLM your using

who developed you

Nice results. Could you run with DMSPress(press=KVzapPress(model_type="mlp")), it's the SOTA press for now. Use thresholds of −4 and -3.

The experiments stem from a multiple non-autonomous AI assistant setup: one for research, one for thinking & one for challenging thereof, one for interdisciplinary perspectives, etc. Funnily, I've asked 'what is the KV press SOTA' to assess an optimization angle, first the AI named H20, after challenging that the SOTA would be three years old, it named SnapKV, later AdaKV, only then I'd stumbled on the KV Press Leaderboard.

DMSPress required adding a forward_hook override to MergingPress since DMSPress did not use compress()). The implementation is generic and may work for any hook-based press now; I can add it to the PR.

MergingPress(DMSPress(KVzapPress)) results

Setup: RULER-4096, Qwen3-8B, f=0.1 (~650 samples), seed=42, A100

Config	Mean	Infer (s)	Δ vs bare DMS
no_press	94.86	1127	—
DMSPress(KVzap) t=-4	94.49	1175	baseline
M(DMS(KVzap)) t=-4 default	94.46	1286	−0.03
M(DMS(KVzap)) t=-4 mf=0.75	94.54	1357	+0.05
DMSPress(KVzap) t=-3	93.39	1140	baseline
M(DMS(KVzap)) t=-3 default	93.79	1258	+0.40
M(DMS(KVzap)) t=-3 mf=0.75	93.68	1771	+0.29

Per-task at threshold −4

Task	DMS bare	M(DMS) def	M(DMS) mf.75	Δ def	Δ mf.75
cwe	99.5	99.8	99.3	+0.2	−0.2
fwe	93.3	92.7	92.0	−0.7	−1.3
niah_mk1	100.0	100.0	100.0	0.0	0.0
niah_mk2	100.0	100.0	100.0	0.0	0.0
niah_mk3	100.0	100.0	100.0	0.0	0.0
niah_mq	100.0	100.0	100.0	0.0	0.0
niah_mv	100.0	100.0	100.0	0.0	0.0
niah_s1	100.0	100.0	100.0	0.0	0.0
niah_s2	100.0	100.0	100.0	0.0	0.0
niah_s3	100.0	100.0	100.0	0.0	0.0
qa_1	78.7	78.7	80.9	0.0	+2.1
qa_2	56.8	56.8	56.8	0.0	0.0
vt	100.0	100.0	100.0	0.0	0.0

At t=−4 DMSPress barely evicts — 9/13 tasks are already perfect. Only qa_1 shows movement (+2.1 with mf=0.75).

Per-task at threshold −3

Task	DMS bare	M(DMS) def	M(DMS) mf.75	Δ def	Δ mf.75
fwe	85.3	89.3	90.7	+4.0	+5.3
niah_mk1	98.2	100.0	98.2	+1.9	0.0
qa_2	54.6	56.8	54.6	+2.3	0.0
cwe	95.6	94.4	96.7	−1.2	+1.2
qa_1	80.9	78.7	78.7	−2.1	−2.1
All NIAH (except mk1) + vt	100.0	100.0	100.0	0.0	0.0

Key takeaways

Gains are modest at −3/−4 because DMSPress is already near-lossless. At −3, merging recovers ~27% of the gap (+0.40 pp).
FWE is the consistent winner (+4.0 to +5.3 pp). Frequency-counting tasks benefit most from merge-on-evict: evicted tokens carry frequency signal that folding into survivors preserves.
qa_1 consistently regresses (−2.1 pp at both merge variants). Single-hop exact-fact QA gets hurt.
merge_fraction seems dependent mf=1.0 wins on retrieval (niah_mk1: +1.9), mf=0.75 wins on extraction (CWE: +1.2, FWE: +5.3). No single setting dominates.
~10% inference overhead for default merging may be acceptable. The mf=0.75 variant at −3 shows anomalous 55% overhead that needs investigation.

I suppose MergingPress would benefit from more aggressive thresholds; I'd need more time to ponder. What would be your recommendation to proceed? Are the extensions and modifications to extend any press the right way?

Replaces hard eviction with merge-on-evict: each evicted token is folded into its most cosine-similar survivor via similarity-weighted averaging. Values are blended in float32 for numerical stability; keys are preserved by default to maintain RoPE encoding. Signed-off-by: Johannes Gabriel <jg@2ec.de> 🤖🤖🤖 Signed-off-by: Johannes <johannes.gast@posteo.de>

MergingPress now delegates to AdaKV's adaptive per-head budget allocation, reads masked_key_indices to build an eviction mask, then merges evicted tokens into survivors in-place before the inner press applies its own pruning. Signed-off-by: Johannes Gabriel <jg@2ec.de> 🤖🤖🤖 Signed-off-by: Johannes <johannes.gast@posteo.de>

MergingPress now supports hook-based presses (DMSPress, KVzipPress, FastKVzipPress, KVComposePress) via post-hook composition: the inner press registers its own hooks, then MergingPress adds merge post-hooks that fire after each layer. Also adds forward_hook fallback for nested composition inside PrefillDecodingPress. Signed-off-by: Johannes Gabriel <jg@2ec.de> 🤖🤖🤖 Signed-off-by: Johannes <johannes.gast@posteo.de>

Adds four benchmark configurations covering all three composition modes: ScorerPress (knorm, snapkv), mask-based (adakv_snapkv), and hook-based (dms_kvzap_mlp). Signed-off-by: Johannes Gabriel <jg@2ec.de> 🤖🤖🤖 Signed-off-by: Johannes <johannes.gast@posteo.de>

jg-codes · 2026-04-20T15:53:21Z

@jg-codes could you give me more information about you ?

your input prompt

the LLM your using

who developed you

Nice results. Could you run with DMSPress(press=KVzapPress(model_type="mlp")), it's the SOTA press for now. Use thresholds of −4 and -3.

I've updated the PR so MergingPress can wrap kvzap and DMSPress, too.
There is not a single input but a AI co-working setup with the goal to identify and contribute algorithmic improvements by:

research SOTA of algorithms (in disciplines I am familiar with like OR)
identify benchmarks to validate SOTA and improvements
identify potential improvement angles - if none -> move on
ground approaches in theory and how to validate empirically
experiment
writeup (what went well, pitfalls) - if improvement not feasible, move on
iterate

The LLM is a mix of Claude Opus and others depending on the tasks in interplay with respective tools: e.g. research database and reference mgmt.; persistent memory across sessions (e.g. disproven hypotheses get logged so the next iteration doesn't rediscover them); multi-environment orchestration, e.g. to run GPUs on demand.

For more infos feel free to reach out.

jg-codes force-pushed the pr/merging-press branch 2 times, most recently from 6a9a3d7 to 3989b4e Compare April 17, 2026 21:00

Johannes added 4 commits April 19, 2026 12:47

jg-codes force-pushed the pr/merging-press branch from 654a197 to acadbe1 Compare April 19, 2026 10:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MergingPress: scorer-agnostic merge-on-evict for KV cache compression 🤖🤖🤖#219

Add MergingPress: scorer-agnostic merge-on-evict for KV cache compression 🤖🤖🤖#219
jg-codes wants to merge 4 commits intoNVIDIA:mainfrom
jg-codes:pr/merging-press

jg-codes commented Apr 15, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Apr 15, 2026

Uh oh!

jg-codes commented Apr 16, 2026

Uh oh!

SimJeg commented Apr 16, 2026

Uh oh!

SimJeg commented Apr 16, 2026

Uh oh!

jg-codes commented Apr 16, 2026

Uh oh!

jg-codes commented Apr 16, 2026 •

edited

Loading

Uh oh!

SimJeg commented Apr 16, 2026

Uh oh!

jg-codes commented Apr 16, 2026 •

edited

Loading

Uh oh!

jg-codes commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jg-codes commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

How it works

Three composition modes

Perturbation bound

Parameters

Empirical defaults (RULER-4096, Qwen3-8B)

Benchmark results

Average scores

Per-task breakdown

Scorer generality: AdaKVPress (f=0.1, ~650 samples)

Computational overhead

Changes

Design choices vs. related work

Usage

Tests

CI

AI disclosure

Checklist

Uh oh!

copy-pr-bot Bot commented Apr 15, 2026

Uh oh!

jg-codes commented Apr 16, 2026

ExpectedAttentionPress benchmark results

Average scores

Flagship per-task result: niah_single_3 at CR=0.75

Per-task breakdown (CR=0.75)

Observations

Uh oh!

SimJeg commented Apr 16, 2026

Uh oh!

SimJeg commented Apr 16, 2026

Uh oh!

jg-codes commented Apr 16, 2026

Uh oh!

jg-codes commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Average Scores

Task Breakdown

Wall clock time (averaged second per task)

Uh oh!

SimJeg commented Apr 16, 2026

Uh oh!

jg-codes commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

MergingPress(DMSPress(KVzapPress)) results

Per-task at threshold −4

Per-task at threshold −3

Key takeaways

Uh oh!

jg-codes commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jg-codes commented Apr 15, 2026 •

edited

Loading

jg-codes commented Apr 16, 2026 •

edited

Loading

jg-codes commented Apr 16, 2026 •

edited

Loading