fix: type-filtered recall surfaces recent memories first#395
Merged
CalebisGross merged 5 commits intomainfrom Apr 11, 2026
Merged
fix: type-filtered recall surfaces recent memories first#395CalebisGross merged 5 commits intomainfrom
CalebisGross merged 5 commits intomainfrom
Conversation
…ysis Best eval loss: 1.2002 (PPL 3.3) at step 4800. Early stopped at step 5800 after 9.5h on RX 7800 XT. Two-phase learning: peak LR caused instability (regression steps 1200-1600), minimum LR produced steady second descent through 14 consecutive new bests. Full per-checkpoint loss table in registry. Evaluation of SC/EPR/FR/NP pending. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Eval loss improved (1.68→1.20) but generation is degenerate: 2/13 valid JSON (15%), 0 SC. Base model without spokes achieves 24/25 valid. Root cause: autoregressive generation compounds spoke perturbations through NF4 dequantization noise. Teacher-forced eval loss does not predict generation quality for spoke adapters on quantized models. Production path: Gemma E2B + faithful prompt + GBNF grammar (no spokes). Spoke training requires full bf16 (MI300X). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Multiple eval runs confirm: 1/10 valid JSON (10%), 0 SC. Base model without spokes achieves 24/25 valid. The spokes generate faithful content but cannot maintain JSON structure despite training on 5,238 perfectly structured examples. Eval loss (-0.483 improvement) does not predict generation quality for NF4 spoke adapters. Teacher-forced training and autoregressive generation have fundamentally different error dynamics on quantized models. Production path: Gemma E2B + faithful prompt + GBNF grammar. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Python HF generate() produces valid faithful JSON with trained spokes. llama.cpp server produces garbage with the same GGUF. The discrepancy is an inference engine bug, not a training failure. GBNF grammar was never tested through a working path. Verdict suspended pending llama.cpp debugging and spokes + GBNF evaluation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Type-filtered queries (e.g. type:"handoff") now use stronger recency scoring: weight 0.5 with 7-day half-life (vs general 0.2/30-day). When you filter by type, you've already constrained relevance — recency should dominate. Also adds Content field to check_memory output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #394 —
recallwith type filter (e.g.type:"handoff") now reliably surfaces the most recent memory of that type, instead of older memories with richer association graphs.TypeFilterRecencyWeight(0.5) andTypeFilterRecencyHalfLife(7 days) override general recency (0.2 / 30 days) for type-filtered queries. When you filter by type, you've already constrained what — the system now prioritizes when.Contentfield (was simply omitted from the format string, not truncated).Changes
internal/agent/retrieval/agent.gointernal/config/config.gocmd/mnemonic/runtime.gointernal/mcp/server.gointernal/agent/retrieval/config_behavior_test.gointernal/mcp/server_test.goVerified
recency_bonus: 0.499(near max)check_memoryshows full contentTest plan
TestConfigTypeFilterRecencyBoostsRecent— recent handoff ranks above older one with more associationsTestConfigTypeFilterRecencyParamsUsed— aggressive params override general onesTestHandleCheckMemoryIncludesContent— content field present in outputcurlto/mcp)🤖 Generated with Claude Code