Skip to content

fix(lex-knowledge): gap fixes — metadata→context, embed fallback, retire tag, error logging#14

Merged
Esity merged 3 commits intomainfrom
fix/content-hash-md5-match-apollo-schema
May 7, 2026
Merged

fix(lex-knowledge): gap fixes — metadata→context, embed fallback, retire tag, error logging#14
Esity merged 3 commits intomainfrom
fix/content-hash-md5-match-apollo-schema

Conversation

@Esity
Copy link
Copy Markdown
Contributor

@Esity Esity commented May 7, 2026

$(cat <<'EOF'

Summary

Seven correctness fixes across runners/ingest.rb, runners/query.rb, runners/maintenance.rb, and actors/maintenance_runner.rb.

  • Fix 1 — metadata:context: in ingest_to_apollo (CRITICAL): chunk provenance (source_file, heading, section_path, chunk_index, token_count) was silently dropped by Ruby's double-splat because handle_ingest expects context:. Every lex-knowledge chunk stored '{}' in apollo_entries.source_context, breaking all maintenance queries that use source_context->>'source_file'.

  • Fix 2 — llm_embed_available? now accepts single-embed providers: previously only checked for Legion::LLM.embed_batch; any LLM that only exposes Legion::LLM.embed caused all chunks to be stored without embeddings. build_embed_map now falls back to per-chunk embed calls when embed_batch is unavailable.

  • Fix 3 — retire_file uses valid content type and context: key: changed content_type: 'document_retired' (unknown to lex-apollo, silently stored as :observation with no signal) to content_type: 'observation' with tags: [..., 'retired', ...] for an explicit retirement signal. Also switched metadata:context: to match handle_ingest signature.

  • Fix 4 — retrieve_chunks logs errors instead of swallowing them: the anonymous rescue => _e; [] pattern silently hid DB/connection errors, causing synthesize_answer to generate hallucinated answers with no context. Now logs at warn level.

  • Fix 5 — synthesize_answer returns nil on LLM error: previously returned the error message string (e.g. "Error generating answer: connection refused") as the answer: field, which callers presented to users as a real answer. Now returns nil and logs at warn level.

  • Fix 6 — MaintenanceRunner#enabled? works for monitors-only installs: enabled? previously returned false when corpus_path was nil, so the maintenance actor was never scheduled for installs that use monitors: config. Now returns true when Runners::Monitor.resolve_monitors.any?. args falls back to monitors.first[:path] in that case.

  • Fix 7 — query_count uses correct action string: ApolloAccessLog records use action: 'query' (set by lex-apollo), not 'knowledge_query'. The old value always returned 0, making quality_report totals useless.

Test plan

  • Ingest a corpus file and verify source_context in apollo_entries contains source_file, heading, section_path, chunk_index, token_count
  • Configure an LLM provider that only exposes embed (not embed_batch) and confirm chunks are stored with embeddings
  • Remove a file from a corpus and confirm the retirement entry has content_type: 'observation' and tags include 'retired'
  • Simulate a retrieval failure (take Apollo offline) and confirm a warn-level log line is emitted, not a hallucinated answer
  • Simulate an LLM failure during synthesis and confirm answer: is nil in the response, not an error string
  • Configure a node with monitors: only (no corpus_path) and confirm MaintenanceRunner is enabled and scheduled
  • Confirm quality_report total_queries returns a non-zero value after running queries
    EOF
    )

Esity added 3 commits May 6, 2026 22:44
Replace llm_chat shim with direct Legion::LLM.chat call matching
the rest of the codebase convention.
…ire tag, error logging

- Fix 1: ingest_to_apollo sends context: instead of metadata: so chunk
  provenance (source_file, heading, section_path, chunk_index,
  token_count) is correctly stored in apollo_entries.source_context
- Fix 2: llm_embed_available? now also accepts Legion::LLM.embed
  (single-item); build_embed_map falls back to per-chunk embed when
  embed_batch is unavailable
- Fix 3: retire_file uses content_type 'observation' + tags ['retired']
  instead of unknown type 'document_retired'; switches metadata: to context:
- Fix 4: retrieve_chunks logs retrieval errors at warn level instead of
  silently swallowing them
- Fix 5: synthesize_answer returns nil (not error string) on LLM failure
  and logs at warn level
- Fix 6: MaintenanceRunner#enabled? also returns true when monitors
  config is present (no corpus_path required); args falls back to
  monitors.first[:path] in that case
- Fix 7: query_count queries action: 'query' (lex-apollo log value)
  instead of 'knowledge_query' which always returned 0
- Add log helper to Runners::Query to satisfy Legion/HelperMigration
  rubocop cop
@Esity Esity merged commit 02cae51 into main May 7, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant