Skip to content

Releases: daniloaguiarbr/sqlite-graphrag

v1.0.36

30 Apr 20:57

Choose a tag to compare

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog,
and this project adheres to Semantic Versioning.

[Unreleased]

[1.0.36] - 2026-04-30

Fixed (Linguistic policy)

  • C1 (CRITICAL): Synced --type enum in skill/sqlite-graphrag-en/SKILL.md:46 and -pt/SKILL.md:46 from 4 listed values to the full set of 9 (user, feedback, project, reference, decision, incident, skill, document, note). Agents using SKILL.md as a contract had been silently losing five memory types since v1.0.30. Source of truth: src/cli.rs:364-374 (MemoryType enum) and src/commands/remember.rs:26 long-help.
  • H1+H2+H3 (HIGH): Translated three Portuguese-without-accent strings in tracing::warn! macros that escaped the audit gate rg '[áéíóúâêôãõç]' src/ documented in v1.0.33: src/extraction.rs:1204 ("NER falhou...""NER failed..."), src/extraction.rs:964 ("batch NER falhou (chunk de N janelas)...""batch NER failed (chunk of N windows)..."), src/commands/remember.rs:345 ("auto-extraction falhou...""auto-extraction failed..."). Bonus: also translated src/storage/urls.rs:37 ("falha ao persistir url...""failed to persist url...") and the production error in src/commands/remember.rs:367 ("limite de N namespaces ativos excedido...""active namespace limit of N reached...").
  • M1 (MEDIUM): Added a complementary CI gate in .github/workflows/ci.yml language-check job that scans tracing::*!, #[error(...)], doc comments, and panic!/assert!/expect/bail!/ensure! macros for Portuguese words without diacritical marks (falhou, janelas, usando apenas, nao foi, ja existe, obrigatorio, memoria, etc.). Plain string literals are intentionally not scanned because they hold legitimate PT test fixtures for multilingual extraction.
  • M3 (MEDIUM): Renamed 33 Portuguese test function names to English across tests/integration.rs, tests/exit_codes_integration.rs, tests/concurrency_limit_integration.rs, tests/recall_integration.rs, tests/prd_compliance.rs, tests/loom_lock_slots.rs, tests/vacuum_integration.rs, src/commands/optimize.rs, list.rs, health.rs, debug_schema.rs, unlink.rs. Examples: test_link_idempotente_retorna_already_existstest_link_idempotent_returns_already_exists; prd_optimize_executa_e_retorna_status_okprd_optimize_runs_and_returns_status_ok; optimize_response_serializa_campos_obrigatoriosoptimize_response_serializes_required_fields. Plus ~80 .expect("X falhou") test helpers translated to .expect("X failed"), doc comments and assert messages cleaned in src/graph.rs, src/memory_guard.rs, src/cli.rs, src/storage/entities.rs, and several tests/*.rs files. Test fixture STRINGS that exercise PT-BR ingestion (e.g. multilingual NER inputs) remain intentionally in PT-BR.

Fixed (Code logic)

  • H5 (HIGH): Extended regex_section_marker() in src/extraction.rs:210-218 to include Camada alongside Etapa, Fase, Passo, Seção, Capítulo. Audit on a 50-file PT-BR corpus showed Camada 1 through Camada 5 leaking through to entities with degree 3 each, polluting the graph. The filter now strips them at both the regex prefilter and the BERT NER post-merge stages.
  • M7 (MEDIUM): Expanded ALL_CAPS_STOPWORDS in src/extraction.rs:60-165 with ADICIONADA, ADICIONADAS, ADICIONADO, ADICIONADOS, CLARO, CONFIRMARAM, CONFIRMEI, CONFIRMOU (alphabetically merged into the list). The earlier audit found these PT-BR adjective/verb forms being captured as concept entities by regex_all_caps() in apply_regex_prefilter.
  • L2 (LOW): Daemon spawn backoff in src/daemon.rs:record_spawn_failure now applies half jitter (base/2 + rand([0, base/2))) instead of pure exponential. Avoids retry herd if multiple CLI instances detect daemon failure simultaneously. Uses SystemTime::now().subsec_nanos() as a dependency-free entropy source — sufficient for low-frequency spawn coordination.
  • L5+L6 (LOW): src/i18n.rs::Language::from_env_or_locale now treats empty SQLITE_GRAPHRAG_LANG="" as unset (no tracing::warn! emitted), matching POSIX convention. src/i18n.rs::init short-circuits when the OnceLock is already populated, preventing the env-resolver from running a second time and emitting the warning twice.

Improved

  • M2 (MEDIUM): Added a "JSON Schemas" section to README.md, README.pt-BR.md, docs/AGENT_PROTOCOL.md, and docs/AGENT_PROTOCOL.pt-BR.md linking to the 30 canonical JSON Schema files in docs/schemas/. These contracts existed since v1.0.33 but were undiscoverable from the public docs.
  • M4 (MEDIUM): src/i18n.rs::tr no longer leaks one allocation per call. The signature now requires &'static str inputs (which all in-tree callers already pass — they are string literals) and returns one of them directly. The previous Box::leak(en.to_string().into_boxed_str()) pattern accumulated allocations in long-running pipelines.
  • L3 (LOW): Added an MSRV (Rust 1.88) callout to README.md and README.pt-BR.md Installation sections. Previously documented only as a footnote in the Mac Intel notes.

Notes

  • M6 was reclassified as a documentation/test artefact: related --json was reported to return graph_depth: null, but the field is named hop_distance (src/commands/related.rs:77 and serialised key). The audit query used .graph_depth which did not exist. The field has always been populated correctly. No code change required.
  • L1 (sys_locale) was deferred: the manual LC_ALL/LANG parsing in src/i18n.rs:34-57 works correctly across the targets used in CI. Adding sys_locale would introduce a dependency for marginal benefit (macOS CFLocale APIs and Windows GetUserDefaultLocaleName) without a confirmed reproducer.
  • L4 (BERT NER misclassifications) is out of scope: Tokio=location, Borda=person, Campos=location, and AdapterRun=organization are limitations of Davlan/bert-base-multilingual-cased-ner-hrl. Filtering would require either a different model or a curated whitelist; both deferred until they cause concrete user impact.
  • All 427 lib tests pass with the new test names and translated assertions. cargo fmt --check, cargo clippy -- -D warnings, cargo doc, cargo audit, and cargo deny check advisories licenses bans sources are clean.
  • The new language-check gate in CI now blocks any PR re-introducing PT in tracing/error/doc/assert surfaces.

[1.0.35] - 2026-04-30

Fixed

  • WAL-AUTO-INIT (HIGH): Auto-init path (remember, ingest, recall, list, ... — every command that goes through ensure_db_ready()) now activates journal_mode=wal consistently. Before v1.0.35 only the explicit init command flipped journal mode to WAL; databases created on-demand by other commands stayed in journal_mode=delete, breaking sync-safe-copy checkpoint semantics, the documented concurrency guarantees, and the troubleshooting advice that referenced WAL. Fix moves PRAGMA journal_mode = WAL into apply_connection_pragmas (called by every open_rw) and adds a defensive re-assertion (ensure_wal_mode) after migrations to neutralise refinery's internal handle reuse. Regression coverage: tests/wal_auto_init_regression.rs.
  • JSON-SCHEMA-VERSION (MEDIUM-HIGH): init --json, stats --json and migrate --json now emit schema_version as a JSON number instead of a string, aligning with health --json (which already used number). Fixes parsing inconsistency for clients that consumed both shapes. JSON Schemas (docs/schemas/stats.schema.json, docs/schemas/migrate.schema.json, docs/schemas/debug-schema.schema.json) updated to reflect the canonical type. Breaking for clients that explicitly compared as string; clients using numeric comparisons are unaffected.
  • DAEMON-SOCKET-FALLBACK (LOW): Unix socket fallback path in to_local_socket_name() now respects XDG_RUNTIME_DIR then SQLITE_GRAPHRAG_HOME before falling back to /tmp. Reduces collision risk on multi-tenant hosts. Path is only used when abstract namespace sockets fail to bind (rare).

Added

  • CLI-LIMIT-ALIAS (UX): recall and hybrid-search now accept --limit as alias of -k/--k. Aligns with list/related which already used --limit. Non-breaking, additive.
  • CLI-RENAME-FROM-TO (UX): rename now accepts --from/--to as aliases of --name/--new-name. Non-breaking, additive.
  • JSON-RELATED-INPUT-ECHO (UX): related --json response now includes name and max_hops echo fields for input transparency. Non-breaking, additive.

Changed

  • GRAPH-NODE-KIND-DEPRECATED: graph --format json still emits both kind and type fields per node, but kind is now formally documented as deprecated (kept for pre-v1.0.35 backward compat). New consumers MUST read type. The duplicate field will be removed in a future major release.

Documentation

  • PRAGMA-USER-VERSION-49: Added doc comment in src/constants.rs explaining why SCHEMA_USER_VERSION = 49 (project signature for external diagnostic tools) versus CURRENT_SCHEMA_VERSION = 9 (application-level migration count). They are intentionally different and serve distinct purposes.
  • README: Expanded the Memory content lifecycle table with --body-file/--body-stdin/--entities-file/--relationships-file/--graph-stdin flags for remember, the new aliases for recall/rename, and a callout about kebab-case ASCII memory name validation. Added explicit rows for ingest and cache clear-models.

Notes

  • Audit findings #4 (structured truncation flags in JSON output) and #6 (progress/ETA in ingest summary) are deferred to v1.0.36 — they require schema design beyond a patch release. Truncation is currently surfaced via tracing::warn! only; pipeline consumers should...
Read more

v1.0.35

30 Apr 16:32

Choose a tag to compare

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog,
and this project adheres to Semantic Versioning.

[Unreleased]

[1.0.35] - 2026-04-30

Fixed

  • WAL-AUTO-INIT (HIGH): Auto-init path (remember, ingest, recall, list, ... — every command that goes through ensure_db_ready()) now activates journal_mode=wal consistently. Before v1.0.35 only the explicit init command flipped journal mode to WAL; databases created on-demand by other commands stayed in journal_mode=delete, breaking sync-safe-copy checkpoint semantics, the documented concurrency guarantees, and the troubleshooting advice that referenced WAL. Fix moves PRAGMA journal_mode = WAL into apply_connection_pragmas (called by every open_rw) and adds a defensive re-assertion (ensure_wal_mode) after migrations to neutralise refinery's internal handle reuse. Regression coverage: tests/wal_auto_init_regression.rs.
  • JSON-SCHEMA-VERSION (MEDIUM-HIGH): init --json, stats --json and migrate --json now emit schema_version as a JSON number instead of a string, aligning with health --json (which already used number). Fixes parsing inconsistency for clients that consumed both shapes. JSON Schemas (docs/schemas/stats.schema.json, docs/schemas/migrate.schema.json, docs/schemas/debug-schema.schema.json) updated to reflect the canonical type. Breaking for clients that explicitly compared as string; clients using numeric comparisons are unaffected.
  • DAEMON-SOCKET-FALLBACK (LOW): Unix socket fallback path in to_local_socket_name() now respects XDG_RUNTIME_DIR then SQLITE_GRAPHRAG_HOME before falling back to /tmp. Reduces collision risk on multi-tenant hosts. Path is only used when abstract namespace sockets fail to bind (rare).

Added

  • CLI-LIMIT-ALIAS (UX): recall and hybrid-search now accept --limit as alias of -k/--k. Aligns with list/related which already used --limit. Non-breaking, additive.
  • CLI-RENAME-FROM-TO (UX): rename now accepts --from/--to as aliases of --name/--new-name. Non-breaking, additive.
  • JSON-RELATED-INPUT-ECHO (UX): related --json response now includes name and max_hops echo fields for input transparency. Non-breaking, additive.

Changed

  • GRAPH-NODE-KIND-DEPRECATED: graph --format json still emits both kind and type fields per node, but kind is now formally documented as deprecated (kept for pre-v1.0.35 backward compat). New consumers MUST read type. The duplicate field will be removed in a future major release.

Documentation

  • PRAGMA-USER-VERSION-49: Added doc comment in src/constants.rs explaining why SCHEMA_USER_VERSION = 49 (project signature for external diagnostic tools) versus CURRENT_SCHEMA_VERSION = 9 (application-level migration count). They are intentionally different and serve distinct purposes.
  • README: Expanded the Memory content lifecycle table with --body-file/--body-stdin/--entities-file/--relationships-file/--graph-stdin flags for remember, the new aliases for recall/rename, and a callout about kebab-case ASCII memory name validation. Added explicit rows for ingest and cache clear-models.

Notes

  • Audit findings #4 (structured truncation flags in JSON output) and #6 (progress/ETA in ingest summary) are deferred to v1.0.36 — they require schema design beyond a patch release. Truncation is currently surfaced via tracing::warn! only; pipeline consumers should monitor stderr.
  • All 427 lib tests pass. Regression test wal_auto_init_regression.rs added (uses assert_cmd + tempfile, same pattern as existing integration tests).

[1.0.34] - 2026-04-30

Added

  • JS7 (LOW): vacuum --json response now includes reclaimed_bytes: u64 derived field, computed as size_before_bytes.saturating_sub(size_after_bytes). Callers no longer need to compute the delta themselves. Schema in src/commands/vacuum.rs:32-41. Existing fields size_before_bytes and size_after_bytes preserved unchanged.

Documentation

  • PRD-sync (LOW): Updated docs_rules/prd.md (excluded from published crate via Cargo.toml exclude) to reflect schema reality after V008 (v1.0.25) and V009 (v1.0.30) migrations:
    • MemoryType enum: 7 → 9 (added document, note per V009 CHECK constraint and MemoryType enum in src/cli.rs).
    • EntityType enum: 10 → 13 (added organization, location, date per V008 CHECK constraint and BERT NER types).

Notes

  • Audit dimension unwrap/expect reaffirmed clean by audit-team-v1033/diagnostician: ZERO production unwraps; 12 production expects all carry English-language documented invariants (regex literal compilation, BERT NER no-NaN logits, OnceLock just-set get, const compile-time invariants) — all fall under CLAUDE.md's "casos impossíveis" exception.
  • Unsafe blocks audit reaffirmed clean: all ~14 unsafe { } blocks across main.rs (4×), embedder.rs (1×), storage/connection.rs (1×), commands/optimize.rs (2×), and paths.rs (6× tests) carry SAFETY comments. The earlier finding flagging missing SAFETY comments was a false positive (the comments precede the unsafe keyword, outside -B3 grep context).
  • Bumped patch (1.0.33 → 1.0.34) because the new reclaimed_bytes field is purely additive (#[derive(Serialize)] adds the key) and PRD changes are doc-only (file is in Cargo.toml exclude). No API removed; no behavior changed.

[1.0.33] - 2026-04-30

Fixed (Linguistic Policy)

  • C3-residual (HIGH): Translated remaining Portuguese string in src/daemon.rs:183 (Drop impl tracing::debug! for spawn lock removal). v1.0.32 A1 covered lines 113/131/154/307/419 but missed line 183 inside impl Drop for DaemonSpawnGuard. Audit gate rg '[áéíóúâêôãõç]' src/ -g '!i18n.rs' now returns ZERO matches.
  • PT-V007 (HIGH): Translated 5-line Portuguese SQL header comment in migrations/V007__memory_urls.sql to English. The file is part of the published crate (not in Cargo.toml exclude), so docs.rs and crates.io tarball previously shipped Portuguese SQL comments.
  • AS-PT (MEDIUM): Translated 20 Portuguese assert! messages to English across src/commands/hybrid_search.rs (19 occurrences) and src/commands/list.rs (1 occurrence). All mem-* deveria existir assertion messages in src/storage/memories.rs (9 occurrences) translated to mem-* should exist. Per CLAUDE.md "NUNCA assert! com mensagem em português" — even test code is EN-only.

Fixed (Documentation)

  • D3 (MEDIUM): Synchronized --type doc-comment in src/commands/recall.rs:33, src/commands/list.rs:30, src/commands/hybrid_search.rs:35 to list all 13 graph entity types (project/tool/person/file/concept/incident/decision/memory/dashboard/issue_tracker/organization/location/date). Previously listed only 10, omitting organization/location/date added by migrations/V008__expand_entity_types.sql (BERT NER types). Aligns CLI help with PRD docs_rules/prd.md and the V008 CHECK constraint.

Notes

  • Validated against real-world ingest of 50 representative .md files (~6.6 MB corpus): 50/50 indexed in 56.9s with --skip-extraction; 5/5 indexed with full BERT NER extraction in 57.3s. All 12 functional CLI scenarios (init, ingest, recall, hybrid-search, list, related, graph, health, stats, lifecycle, vacuum, sync-safe-copy) returned exit 0 with valid JSON. Auto-create of graphrag.sqlite in CWD (without prior init) confirmed working with mode 0600.
  • Backwards-compatible duplicate fields in stats --json (memories/memories_total, entities/entities_total, relationships/relationships_total, db_size_bytes/db_bytes, edges/relationships) and list --json (id/memory_id) are intentional per existing test assertions in src/commands/stats.rs:244-248 and src/commands/list.rs:190. They are deliberately preserved for backwards compatibility with existing JSON parsers.
  • schema_version type asymmetry between stats --json (String) and health --json (u32) is documented as a known issue. Normalization to u32 everywhere would be a breaking change deferred to v2.0.
  • kill_on_drop(true) for the daemon child process remains N/A (the orphan detach is deliberate, documented in src/daemon.rs:491-499 and v1.0.32 M4 / C2). The CLI must return immediately while the daemon stays warm.

[1.0.32] - 2026-04-30

Fixed (Critical — Audit findings from v1.0.31)

  • C1 (CRITICAL): Auto-init unified across all CRUD handlers via new ensure_db_ready helper in src/storage/connection.rs. Previously remember silently auto-created the DB while recall, list, etc. returned NotFound, breaking the implicit "if it works for one, it works for all" contract. Now every CRUD subcommand creates the database on first use with a single tracing::info!("creating database (auto-init) at <path> schema_version=9") log entry. Resolves the 23 inconsistent paths.db.exists() checks across forget, related, optimize, edit, health, hybrid_search, cleanup_orphans, rename, recall, read, vacuum, graph_export (×4), purge, list, history, unlink, link, stats, sync_safe_copy, debug_schema.
  • C2 (CRITICAL): Documented the deliberate orphan-daemon detach in src/daemon.rs:487. The Child handle is now intentionally dropped with a // SAFETY: comment explaining lifecycle ownership via spawn lock + ready file + idle-timeout shutdown, plus a tracing::debug! log capturing the daemon PID. Stdio::null() already covered the I/O detach.
  • C3 (CRITICAL): New integration test tests/readme_examples_executable.rs parses every bash fenced block from README.md and README.pt-BR.md at compile time and executes each sqlite-graphrag invocation against a real binary in an isolated TempDir. Blocks containing pipes/redirect...
Read more

v1.0.34

30 Apr 11:51

Choose a tag to compare

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog,
and this project adheres to Semantic Versioning.

[Unreleased]

[1.0.34] - 2026-04-30

Added

  • JS7 (LOW): vacuum --json response now includes reclaimed_bytes: u64 derived field, computed as size_before_bytes.saturating_sub(size_after_bytes). Callers no longer need to compute the delta themselves. Schema in src/commands/vacuum.rs:32-41. Existing fields size_before_bytes and size_after_bytes preserved unchanged.

Documentation

  • PRD-sync (LOW): Updated docs_rules/prd.md (excluded from published crate via Cargo.toml exclude) to reflect schema reality after V008 (v1.0.25) and V009 (v1.0.30) migrations:
    • MemoryType enum: 7 → 9 (added document, note per V009 CHECK constraint and MemoryType enum in src/cli.rs).
    • EntityType enum: 10 → 13 (added organization, location, date per V008 CHECK constraint and BERT NER types).

Notes

  • Audit dimension unwrap/expect reaffirmed clean by audit-team-v1033/diagnostician: ZERO production unwraps; 12 production expects all carry English-language documented invariants (regex literal compilation, BERT NER no-NaN logits, OnceLock just-set get, const compile-time invariants) — all fall under CLAUDE.md's "casos impossíveis" exception.
  • Unsafe blocks audit reaffirmed clean: all ~14 unsafe { } blocks across main.rs (4×), embedder.rs (1×), storage/connection.rs (1×), commands/optimize.rs (2×), and paths.rs (6× tests) carry SAFETY comments. The earlier finding flagging missing SAFETY comments was a false positive (the comments precede the unsafe keyword, outside -B3 grep context).
  • Bumped patch (1.0.33 → 1.0.34) because the new reclaimed_bytes field is purely additive (#[derive(Serialize)] adds the key) and PRD changes are doc-only (file is in Cargo.toml exclude). No API removed; no behavior changed.

[1.0.33] - 2026-04-30

Fixed (Linguistic Policy)

  • C3-residual (HIGH): Translated remaining Portuguese string in src/daemon.rs:183 (Drop impl tracing::debug! for spawn lock removal). v1.0.32 A1 covered lines 113/131/154/307/419 but missed line 183 inside impl Drop for DaemonSpawnGuard. Audit gate rg '[áéíóúâêôãõç]' src/ -g '!i18n.rs' now returns ZERO matches.
  • PT-V007 (HIGH): Translated 5-line Portuguese SQL header comment in migrations/V007__memory_urls.sql to English. The file is part of the published crate (not in Cargo.toml exclude), so docs.rs and crates.io tarball previously shipped Portuguese SQL comments.
  • AS-PT (MEDIUM): Translated 20 Portuguese assert! messages to English across src/commands/hybrid_search.rs (19 occurrences) and src/commands/list.rs (1 occurrence). All mem-* deveria existir assertion messages in src/storage/memories.rs (9 occurrences) translated to mem-* should exist. Per CLAUDE.md "NUNCA assert! com mensagem em português" — even test code is EN-only.

Fixed (Documentation)

  • D3 (MEDIUM): Synchronized --type doc-comment in src/commands/recall.rs:33, src/commands/list.rs:30, src/commands/hybrid_search.rs:35 to list all 13 graph entity types (project/tool/person/file/concept/incident/decision/memory/dashboard/issue_tracker/organization/location/date). Previously listed only 10, omitting organization/location/date added by migrations/V008__expand_entity_types.sql (BERT NER types). Aligns CLI help with PRD docs_rules/prd.md and the V008 CHECK constraint.

Notes

  • Validated against real-world ingest of 50 representative .md files (~6.6 MB corpus): 50/50 indexed in 56.9s with --skip-extraction; 5/5 indexed with full BERT NER extraction in 57.3s. All 12 functional CLI scenarios (init, ingest, recall, hybrid-search, list, related, graph, health, stats, lifecycle, vacuum, sync-safe-copy) returned exit 0 with valid JSON. Auto-create of graphrag.sqlite in CWD (without prior init) confirmed working with mode 0600.
  • Backwards-compatible duplicate fields in stats --json (memories/memories_total, entities/entities_total, relationships/relationships_total, db_size_bytes/db_bytes, edges/relationships) and list --json (id/memory_id) are intentional per existing test assertions in src/commands/stats.rs:244-248 and src/commands/list.rs:190. They are deliberately preserved for backwards compatibility with existing JSON parsers.
  • schema_version type asymmetry between stats --json (String) and health --json (u32) is documented as a known issue. Normalization to u32 everywhere would be a breaking change deferred to v2.0.
  • kill_on_drop(true) for the daemon child process remains N/A (the orphan detach is deliberate, documented in src/daemon.rs:491-499 and v1.0.32 M4 / C2). The CLI must return immediately while the daemon stays warm.

[1.0.32] - 2026-04-30

Fixed (Critical — Audit findings from v1.0.31)

  • C1 (CRITICAL): Auto-init unified across all CRUD handlers via new ensure_db_ready helper in src/storage/connection.rs. Previously remember silently auto-created the DB while recall, list, etc. returned NotFound, breaking the implicit "if it works for one, it works for all" contract. Now every CRUD subcommand creates the database on first use with a single tracing::info!("creating database (auto-init) at <path> schema_version=9") log entry. Resolves the 23 inconsistent paths.db.exists() checks across forget, related, optimize, edit, health, hybrid_search, cleanup_orphans, rename, recall, read, vacuum, graph_export (×4), purge, list, history, unlink, link, stats, sync_safe_copy, debug_schema.
  • C2 (CRITICAL): Documented the deliberate orphan-daemon detach in src/daemon.rs:487. The Child handle is now intentionally dropped with a // SAFETY: comment explaining lifecycle ownership via spawn lock + ready file + idle-timeout shutdown, plus a tracing::debug! log capturing the daemon PID. Stdio::null() already covered the I/O detach.
  • C3 (CRITICAL): New integration test tests/readme_examples_executable.rs parses every bash fenced block from README.md and README.pt-BR.md at compile time and executes each sqlite-graphrag invocation against a real binary in an isolated TempDir. Blocks containing pipes/redirects or marked <!-- skip-test --> are skipped. 22 commands per README are now CI-validated, eliminating the drift uncovered in v1.0.31 (8+ broken examples: --query vs positional <QUERY>, --top-k vs -k, --dir vs positional <DIR>, etc.).

Fixed (High)

  • A1 (HIGH): Translated 8 Portuguese runtime strings to English in src/lock.rs:36, src/daemon.rs:113,131,154,307,419 (including the daemon.rs:307 IPC payload that leaked PT into JSON message fields). Added Message::EmptyQueryValidation and Message::EmptyBodyValidation (as validation::empty_query() / validation::empty_body()) in src/i18n.rs so user-visible validation messages remain bilingual; internal errors are EN-only. Audit gate rg '[áéíóúâêôãõç]' src/ -g '!i18n.rs' now returns ZERO matches.
  • A2 (HIGH): Refactored src/commands/ingest.rs from per-file fork-spawn (Command::new(current_exe).args(["remember", ...]).output()) to in-process pipeline. Loads the embedder once and reuses it across all files via crate::daemon::embed_passage_or_local. Measured speedup: 50 files in 21 seconds vs ~14 minutes previously (≈40× faster, well under the 60s target). Per-file NDJSON event schema unchanged ({file, name, status, memory_id, action}).
  • A3 (HIGH): Replaced .expect("OnceLock populated by set() above") in src/embedder.rs:56 with .ok_or_else(|| AppError::Embedding(...))? propagating a real error variant. Eliminates the only remaining production .expect() outside documented invariants.
  • A4 (HIGH): Added #[command(after_long_help = "EXAMPLES: ...")] with 2-4 realistic invocations to 21 subcommands previously missing it (init, daemon, read, list, forget, purge, rename, edit, history, restore, health, migrate, namespace-detect, optimize, stats, sync-safe-copy, vacuum, related, cleanup-orphans, cache, __debug_schema, plus enrichment of hybrid-search/ingest).
  • A5 (HIGH): Auto-migrate transparency. ensure_db_ready now compares PRAGMA user_version against SCHEMA_USER_VERSION and runs the remaining migrations automatically when an older DB (e.g. v1.0.27 schema 7) is opened by a newer binary. Logs tracing::warn!(from, to, path, "auto-migrating database schema") so operators are not surprised. Eliminates the silent failure mode where stale DBs caused indeterminate runtime errors.
  • A6 (HIGH): Renamed 23 Portuguese identifiers to English across tests/property_based.rs, tests/i18n_bilingual_integration.rs, tests/integration.rs, tests/vacuum_integration.rs, tests/exit_codes_integration.rs, tests/regression_v2_0_4.rs, tests/schema_contract_strict.rs, src/errors.rs, src/commands/health.rs. Plus residual PT comments and assert messages in src/storage/entities.rs, src/commands/remember.rs, src/chunking.rs, src/graph.rs, src/embedder.rs, src/output.rs, src/tz.rs, src/memory_guard.rs, src/daemon.rs, src/lock.rs translated to English.

Fixed (Medium)

  • M1 (MEDIUM): recall -k and hybrid-search -k now use value_parser = parse_k_range validating the inclusive range 1..=4096 (matches sqlite-vec's knn limit) at parse time. Out-of-range values surface a clean Clap error instead of leaking the engine's "k value in knn query too large" message. Added unit tests in src/parsers/mod.rs.
  • M2 (MEDIUM): purge UX clarified. Added alias --max-age-days for the existing --retention-days. When purged_count == 0, the JSON response now includes a message field (`"no soft-de...
Read more

v1.0.33

30 Apr 11:36

Choose a tag to compare

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog,
and this project adheres to Semantic Versioning.

[Unreleased]

[1.0.33] - 2026-04-30

Fixed (Linguistic Policy)

  • C3-residual (HIGH): Translated remaining Portuguese string in src/daemon.rs:183 (Drop impl tracing::debug! for spawn lock removal). v1.0.32 A1 covered lines 113/131/154/307/419 but missed line 183 inside impl Drop for DaemonSpawnGuard. Audit gate rg '[áéíóúâêôãõç]' src/ -g '!i18n.rs' now returns ZERO matches.
  • PT-V007 (HIGH): Translated 5-line Portuguese SQL header comment in migrations/V007__memory_urls.sql to English. The file is part of the published crate (not in Cargo.toml exclude), so docs.rs and crates.io tarball previously shipped Portuguese SQL comments.
  • AS-PT (MEDIUM): Translated 20 Portuguese assert! messages to English across src/commands/hybrid_search.rs (19 occurrences) and src/commands/list.rs (1 occurrence). All mem-* deveria existir assertion messages in src/storage/memories.rs (9 occurrences) translated to mem-* should exist. Per CLAUDE.md "NUNCA assert! com mensagem em português" — even test code is EN-only.

Fixed (Documentation)

  • D3 (MEDIUM): Synchronized --type doc-comment in src/commands/recall.rs:33, src/commands/list.rs:30, src/commands/hybrid_search.rs:35 to list all 13 graph entity types (project/tool/person/file/concept/incident/decision/memory/dashboard/issue_tracker/organization/location/date). Previously listed only 10, omitting organization/location/date added by migrations/V008__expand_entity_types.sql (BERT NER types). Aligns CLI help with PRD docs_rules/prd.md and the V008 CHECK constraint.

Notes

  • Validated against real-world ingest of 50 representative .md files (~6.6 MB corpus): 50/50 indexed in 56.9s with --skip-extraction; 5/5 indexed with full BERT NER extraction in 57.3s. All 12 functional CLI scenarios (init, ingest, recall, hybrid-search, list, related, graph, health, stats, lifecycle, vacuum, sync-safe-copy) returned exit 0 with valid JSON. Auto-create of graphrag.sqlite in CWD (without prior init) confirmed working with mode 0600.
  • Backwards-compatible duplicate fields in stats --json (memories/memories_total, entities/entities_total, relationships/relationships_total, db_size_bytes/db_bytes, edges/relationships) and list --json (id/memory_id) are intentional per existing test assertions in src/commands/stats.rs:244-248 and src/commands/list.rs:190. They are deliberately preserved for backwards compatibility with existing JSON parsers.
  • schema_version type asymmetry between stats --json (String) and health --json (u32) is documented as a known issue. Normalization to u32 everywhere would be a breaking change deferred to v2.0.
  • kill_on_drop(true) for the daemon child process remains N/A (the orphan detach is deliberate, documented in src/daemon.rs:491-499 and v1.0.32 M4 / C2). The CLI must return immediately while the daemon stays warm.

[1.0.32] - 2026-04-30

Fixed (Critical — Audit findings from v1.0.31)

  • C1 (CRITICAL): Auto-init unified across all CRUD handlers via new ensure_db_ready helper in src/storage/connection.rs. Previously remember silently auto-created the DB while recall, list, etc. returned NotFound, breaking the implicit "if it works for one, it works for all" contract. Now every CRUD subcommand creates the database on first use with a single tracing::info!("creating database (auto-init) at <path> schema_version=9") log entry. Resolves the 23 inconsistent paths.db.exists() checks across forget, related, optimize, edit, health, hybrid_search, cleanup_orphans, rename, recall, read, vacuum, graph_export (×4), purge, list, history, unlink, link, stats, sync_safe_copy, debug_schema.
  • C2 (CRITICAL): Documented the deliberate orphan-daemon detach in src/daemon.rs:487. The Child handle is now intentionally dropped with a // SAFETY: comment explaining lifecycle ownership via spawn lock + ready file + idle-timeout shutdown, plus a tracing::debug! log capturing the daemon PID. Stdio::null() already covered the I/O detach.
  • C3 (CRITICAL): New integration test tests/readme_examples_executable.rs parses every bash fenced block from README.md and README.pt-BR.md at compile time and executes each sqlite-graphrag invocation against a real binary in an isolated TempDir. Blocks containing pipes/redirects or marked <!-- skip-test --> are skipped. 22 commands per README are now CI-validated, eliminating the drift uncovered in v1.0.31 (8+ broken examples: --query vs positional <QUERY>, --top-k vs -k, --dir vs positional <DIR>, etc.).

Fixed (High)

  • A1 (HIGH): Translated 8 Portuguese runtime strings to English in src/lock.rs:36, src/daemon.rs:113,131,154,307,419 (including the daemon.rs:307 IPC payload that leaked PT into JSON message fields). Added Message::EmptyQueryValidation and Message::EmptyBodyValidation (as validation::empty_query() / validation::empty_body()) in src/i18n.rs so user-visible validation messages remain bilingual; internal errors are EN-only. Audit gate rg '[áéíóúâêôãõç]' src/ -g '!i18n.rs' now returns ZERO matches.
  • A2 (HIGH): Refactored src/commands/ingest.rs from per-file fork-spawn (Command::new(current_exe).args(["remember", ...]).output()) to in-process pipeline. Loads the embedder once and reuses it across all files via crate::daemon::embed_passage_or_local. Measured speedup: 50 files in 21 seconds vs ~14 minutes previously (≈40× faster, well under the 60s target). Per-file NDJSON event schema unchanged ({file, name, status, memory_id, action}).
  • A3 (HIGH): Replaced .expect("OnceLock populated by set() above") in src/embedder.rs:56 with .ok_or_else(|| AppError::Embedding(...))? propagating a real error variant. Eliminates the only remaining production .expect() outside documented invariants.
  • A4 (HIGH): Added #[command(after_long_help = "EXAMPLES: ...")] with 2-4 realistic invocations to 21 subcommands previously missing it (init, daemon, read, list, forget, purge, rename, edit, history, restore, health, migrate, namespace-detect, optimize, stats, sync-safe-copy, vacuum, related, cleanup-orphans, cache, __debug_schema, plus enrichment of hybrid-search/ingest).
  • A5 (HIGH): Auto-migrate transparency. ensure_db_ready now compares PRAGMA user_version against SCHEMA_USER_VERSION and runs the remaining migrations automatically when an older DB (e.g. v1.0.27 schema 7) is opened by a newer binary. Logs tracing::warn!(from, to, path, "auto-migrating database schema") so operators are not surprised. Eliminates the silent failure mode where stale DBs caused indeterminate runtime errors.
  • A6 (HIGH): Renamed 23 Portuguese identifiers to English across tests/property_based.rs, tests/i18n_bilingual_integration.rs, tests/integration.rs, tests/vacuum_integration.rs, tests/exit_codes_integration.rs, tests/regression_v2_0_4.rs, tests/schema_contract_strict.rs, src/errors.rs, src/commands/health.rs. Plus residual PT comments and assert messages in src/storage/entities.rs, src/commands/remember.rs, src/chunking.rs, src/graph.rs, src/embedder.rs, src/output.rs, src/tz.rs, src/memory_guard.rs, src/daemon.rs, src/lock.rs translated to English.

Fixed (Medium)

  • M1 (MEDIUM): recall -k and hybrid-search -k now use value_parser = parse_k_range validating the inclusive range 1..=4096 (matches sqlite-vec's knn limit) at parse time. Out-of-range values surface a clean Clap error instead of leaking the engine's "k value in knn query too large" message. Added unit tests in src/parsers/mod.rs.
  • M2 (MEDIUM): purge UX clarified. Added alias --max-age-days for the existing --retention-days. When purged_count == 0, the JSON response now includes a message field ("no soft-deleted memories older than {N} day(s); use --retention-days 0 to purge all soft-deleted memories regardless of age"). Help text on --yes rewritten to clarify it confirms intent but does NOT override --retention-days.
  • M3 (MEDIUM): Added #[arg(help = "...")] to 9 positional arguments previously bare in --help output: recall <QUERY>, hybrid-search <QUERY>, ingest <DIR>, read <NAME>, forget <NAME>, rename <NAME>, edit <NAME>, history <NAME>, related <NAME>.
  • M4 (MEDIUM): Verified daemon --stop already exists (dispatches to crate::daemon::try_shutdown) and that the autostart spawn path uses std::process::Command with intentional orphan detach (documented under C2). tokio::process::Command kill_on_drop(true) was N/A — code path uses std spawn — so no change needed; the C2 safety comment now explains the design rationale.
  • M5 (MEDIUM): Audit finding "duplicate v1.0.29 entries with date 2026-04-29" was a false positive (v1.0.29 and v1.0.30 are distinct entries that legitimately share 2026-04-29 as their release date). No CHANGELOG change required.

Fixed (Low)

  • B_1 (LOW): README structure (split README.md + README.pt-BR.md) preserved; the bilingual policy is documented elsewhere. ADR not required since the split is a deliberate product decision predating the audit.
  • B_2 (LOW): Added GitHub Actions CI badge ([![CI](...)](...)) to both README.md and README.pt-BR.md. Final badge order: crates.io → docs.rs → CI → license → Contributor Covenant.
  • B_3 (LOW): Added bash example blocks for 16 subcommands previously without one in either README: daemon, ingest, rename, edit, restore, migrate, namespace-detect, optimize, vacuum, link, unlink, related, `graph...
Read more

v1.0.32

30 Apr 10:36

Choose a tag to compare

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog,
and this project adheres to Semantic Versioning.

[Unreleased]

[1.0.32] - 2026-04-30

Fixed (Critical — Audit findings from v1.0.31)

  • C1 (CRITICAL): Auto-init unified across all CRUD handlers via new ensure_db_ready helper in src/storage/connection.rs. Previously remember silently auto-created the DB while recall, list, etc. returned NotFound, breaking the implicit "if it works for one, it works for all" contract. Now every CRUD subcommand creates the database on first use with a single tracing::info!("creating database (auto-init) at <path> schema_version=9") log entry. Resolves the 23 inconsistent paths.db.exists() checks across forget, related, optimize, edit, health, hybrid_search, cleanup_orphans, rename, recall, read, vacuum, graph_export (×4), purge, list, history, unlink, link, stats, sync_safe_copy, debug_schema.
  • C2 (CRITICAL): Documented the deliberate orphan-daemon detach in src/daemon.rs:487. The Child handle is now intentionally dropped with a // SAFETY: comment explaining lifecycle ownership via spawn lock + ready file + idle-timeout shutdown, plus a tracing::debug! log capturing the daemon PID. Stdio::null() already covered the I/O detach.
  • C3 (CRITICAL): New integration test tests/readme_examples_executable.rs parses every bash fenced block from README.md and README.pt-BR.md at compile time and executes each sqlite-graphrag invocation against a real binary in an isolated TempDir. Blocks containing pipes/redirects or marked <!-- skip-test --> are skipped. 22 commands per README are now CI-validated, eliminating the drift uncovered in v1.0.31 (8+ broken examples: --query vs positional <QUERY>, --top-k vs -k, --dir vs positional <DIR>, etc.).

Fixed (High)

  • A1 (HIGH): Translated 8 Portuguese runtime strings to English in src/lock.rs:36, src/daemon.rs:113,131,154,307,419 (including the daemon.rs:307 IPC payload that leaked PT into JSON message fields). Added Message::EmptyQueryValidation and Message::EmptyBodyValidation (as validation::empty_query() / validation::empty_body()) in src/i18n.rs so user-visible validation messages remain bilingual; internal errors are EN-only. Audit gate rg '[áéíóúâêôãõç]' src/ -g '!i18n.rs' now returns ZERO matches.
  • A2 (HIGH): Refactored src/commands/ingest.rs from per-file fork-spawn (Command::new(current_exe).args(["remember", ...]).output()) to in-process pipeline. Loads the embedder once and reuses it across all files via crate::daemon::embed_passage_or_local. Measured speedup: 50 files in 21 seconds vs ~14 minutes previously (≈40× faster, well under the 60s target). Per-file NDJSON event schema unchanged ({file, name, status, memory_id, action}).
  • A3 (HIGH): Replaced .expect("OnceLock populated by set() above") in src/embedder.rs:56 with .ok_or_else(|| AppError::Embedding(...))? propagating a real error variant. Eliminates the only remaining production .expect() outside documented invariants.
  • A4 (HIGH): Added #[command(after_long_help = "EXAMPLES: ...")] with 2-4 realistic invocations to 21 subcommands previously missing it (init, daemon, read, list, forget, purge, rename, edit, history, restore, health, migrate, namespace-detect, optimize, stats, sync-safe-copy, vacuum, related, cleanup-orphans, cache, __debug_schema, plus enrichment of hybrid-search/ingest).
  • A5 (HIGH): Auto-migrate transparency. ensure_db_ready now compares PRAGMA user_version against SCHEMA_USER_VERSION and runs the remaining migrations automatically when an older DB (e.g. v1.0.27 schema 7) is opened by a newer binary. Logs tracing::warn!(from, to, path, "auto-migrating database schema") so operators are not surprised. Eliminates the silent failure mode where stale DBs caused indeterminate runtime errors.
  • A6 (HIGH): Renamed 23 Portuguese identifiers to English across tests/property_based.rs, tests/i18n_bilingual_integration.rs, tests/integration.rs, tests/vacuum_integration.rs, tests/exit_codes_integration.rs, tests/regression_v2_0_4.rs, tests/schema_contract_strict.rs, src/errors.rs, src/commands/health.rs. Plus residual PT comments and assert messages in src/storage/entities.rs, src/commands/remember.rs, src/chunking.rs, src/graph.rs, src/embedder.rs, src/output.rs, src/tz.rs, src/memory_guard.rs, src/daemon.rs, src/lock.rs translated to English.

Fixed (Medium)

  • M1 (MEDIUM): recall -k and hybrid-search -k now use value_parser = parse_k_range validating the inclusive range 1..=4096 (matches sqlite-vec's knn limit) at parse time. Out-of-range values surface a clean Clap error instead of leaking the engine's "k value in knn query too large" message. Added unit tests in src/parsers/mod.rs.
  • M2 (MEDIUM): purge UX clarified. Added alias --max-age-days for the existing --retention-days. When purged_count == 0, the JSON response now includes a message field ("no soft-deleted memories older than {N} day(s); use --retention-days 0 to purge all soft-deleted memories regardless of age"). Help text on --yes rewritten to clarify it confirms intent but does NOT override --retention-days.
  • M3 (MEDIUM): Added #[arg(help = "...")] to 9 positional arguments previously bare in --help output: recall <QUERY>, hybrid-search <QUERY>, ingest <DIR>, read <NAME>, forget <NAME>, rename <NAME>, edit <NAME>, history <NAME>, related <NAME>.
  • M4 (MEDIUM): Verified daemon --stop already exists (dispatches to crate::daemon::try_shutdown) and that the autostart spawn path uses std::process::Command with intentional orphan detach (documented under C2). tokio::process::Command kill_on_drop(true) was N/A — code path uses std spawn — so no change needed; the C2 safety comment now explains the design rationale.
  • M5 (MEDIUM): Audit finding "duplicate v1.0.29 entries with date 2026-04-29" was a false positive (v1.0.29 and v1.0.30 are distinct entries that legitimately share 2026-04-29 as their release date). No CHANGELOG change required.

Fixed (Low)

  • B_1 (LOW): README structure (split README.md + README.pt-BR.md) preserved; the bilingual policy is documented elsewhere. ADR not required since the split is a deliberate product decision predating the audit.
  • B_2 (LOW): Added GitHub Actions CI badge ([![CI](...)](...)) to both README.md and README.pt-BR.md. Final badge order: crates.io → docs.rs → CI → license → Contributor Covenant.
  • B_3 (LOW): Added bash example blocks for 16 subcommands previously without one in either README: daemon, ingest, rename, edit, restore, migrate, namespace-detect, optimize, vacuum, link, unlink, related, graph (with stats/traverse/entities subcommands), cleanup-orphans, cache, history. All 16 examples are validated by the new tests/readme_examples_executable.rs.
  • B_4 (LOW): remember JSON output now includes name_was_normalized: bool and original_name: Option<String> (the latter elided via #[serde(skip_serializing_if = "Option::is_none")] when normalization was a no-op). Closes the UX gap where users passing --name "Hello World" saw only "name": "hello-world" with no indication that normalization had happened.

Added

  • tests/readme_examples_executable.rs — 442-line integration test (8 unit + 2 integration tests) validating every README bash example.
  • parse_k_range value parser in src/parsers/mod.rs with full unit-test coverage of edge cases (zero, above-limit, non-integer, negative).
  • validation::empty_query() and validation::empty_body() bilingual messages in src/i18n.rs.
  • ensure_db_ready(&AppPaths) helper in src/storage/connection.rs (also makes register_vec_extension idempotent via OnceLock).
  • insert_default_schema_meta helper extracted to ensure auto-init populates schema_version, model, dim, created_at, sqlite-graphrag_version consistently with explicit init.

Changed

  • src/commands/ingest.rs grew from 565 to ~959 lines as the in-process pipeline replicates remember::run's validation + chunking + embedding + persistence transaction. The previous version offloaded that work to a child process per file.
  • register_vec_extension is now idempotent (guarded by OnceLock); safe to invoke from both main.rs and library helpers (unblocks unit tests touching CRUD handlers).
  • Optimize test optimize_returns_not_found_when_db_missing renamed to optimize_auto_inits_when_db_missing and inverted to assert success (the new auto-init contract).
  • CI-aligned clippy::uninlined_format_args cleanup on the new ensure_db_ready log line.

Notes

  • Validation pipeline summary: cargo fmt --check ✓, cargo clippy -- -D warnings ✓, cargo test --lib 427/427 ✓, cargo doc --no-deps ✓ zero warnings, cargo audit ✓ (2 pre-allowed advisories per deny.toml), cargo deny check advisories licenses bans sources ✓.
  • Language gate audit: rg '[áéíóúâêôãõç]' src/ -g '!i18n.rs' returns ZERO matches.
  • Performance baseline: 50 files ingest in 21s wall-clock (≈40× faster than v1.0.31).

[1.0.31] - 2026-04-30

Fixed

  • A2 (P1-CRITICAL): ingest subcommand now emits proper NDJSON (one JSON object per line). Previously emitted pretty-printed multiline JSON, breaking line-by-line consumers. Switched 5 calls in src/commands/ingest.rs from output::emit_json to output::emit_json_compact.
  • A3 (P1-MEDIUM): stats --json now reports correct schema_version value (e.g., "9") read from refinery_schema_history table. Previously returned "unknown" because empty schema_meta table was queri...
Read more

v1.0.31

30 Apr 04:56

Choose a tag to compare

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog,
and this project adheres to Semantic Versioning.

[Unreleased]

[1.0.31] - 2026-04-30

Fixed

  • A2 (P1-CRITICAL): ingest subcommand now emits proper NDJSON (one JSON object per line). Previously emitted pretty-printed multiline JSON, breaking line-by-line consumers. Switched 5 calls in src/commands/ingest.rs from output::emit_json to output::emit_json_compact.
  • A3 (P1-MEDIUM): stats --json now reports correct schema_version value (e.g., "9") read from refinery_schema_history table. Previously returned "unknown" because empty schema_meta table was queried.
  • A4 (P1-MEDIUM): forget command now populates action and deleted_at fields in JSON output. Three explicit states: soft_deleted, already_deleted, not_found. Race-safe via re-SELECT after soft-delete.
  • A1 (P0-CRITICAL): Extraction pipeline no longer hangs on documents larger than ~50 KB. Added EXTRACTION_MAX_TOKENS=5000 cap (env override SQLITE_GRAPHRAG_EXTRACTION_MAX_TOKENS). Body exceeding cap is truncated for NER but full body still goes through regex. Empirical impact: 68 KB document went from >5 minutes to ~37 seconds (88% reduction) while preserving extraction_method=bert+regex-batch.
  • A9 (P2-MEDIUM): Relationship fan-out reduced — entities co-occurring in same sentence/paragraph now generate edges; previously generated C(N,2) "mentions" between all entities in memory.
  • A10 (P2-MEDIUM): Name truncation at 60 chars now logs tracing::warn and handles collisions with numeric suffix (-1, -2, ...).

Added

  • A6: New integration test suite tests/ingest_integration.rs covering NDJSON contract, fail-fast, max-files, name truncation, --skip-extraction, --pattern variants, recursive walk.
  • A7: V009 end-to-end migration tests in tests/schema_migration_integration.rs: v009_document_type_lifecycle_e2e, v009_note_type_lifecycle_e2e, v009_invalid_type_rejected.
  • A11: PT-BR uppercase stoplist for NER false-positive filter (ADAPTER, PROJETO, PASSIVA, SOMENTE, LEITURA, etc.). Improves entity extraction quality for Portuguese-language corpora.

Improved

  • A5 (P1-MEDIUM): Renamed 210 test functions in src/* across 35 files from Portuguese to English identifiers (also covered helper functions like nova_memorianew_memory, cria_nodemake_node, resposta_vaziaempty_response). Brings codebase into full compliance with project's English-exclusive language policy for identifiers.
  • A8 (P1-MEDIUM): Refined production-only .unwrap()/.expect() calls. Original audit count of 167 was inflated — most matches were inside #[cfg(test)] mod tests blocks (acceptable per CLAUDE.md). The actual production-path inventory was 13 occurrences. Improvements: 1 .expect() in src/embedder.rs got a more precise invariant message; 10 Regex::new(LITERAL).unwrap() in src/extraction.rs static OnceLock initializers replaced with .expect("compile-time validated <kind> regex literal"); 2 .max_by(...).unwrap() over BERT NER logits replaced with .expect("BERT NER logits invariant: no NaN in classifier output"); 1 .expect() in src/chunking.rs translated from PT to EN. The 4 .unwrap() calls in src/graph.rs, 3 in src/namespace.rs, and 2 in src/output.rs are inside /// doctests (idiomatic per Rust API Guidelines C-EXAMPLE).
  • A12+A13: Translated ~38 PT comments in tests/signal_handling_integration.rs, tests/lock_integration.rs, and deny.toml. Removed 2 obsolete [advisories.ignore] entries (RUSTSEC-2024-0436, RUSTSEC-2025-0119) — cargo deny check now reports zero advisory-not-detected warnings.
  • A14: Translated ~150 additional PT comments in tests/prd_compliance.rs, tests/integration.rs, tests/concurrency_hardened.rs, tests/security_hardening.rs, and other test files.

Audit Methodology

  • 13 gaps identified empirically via plan-mode audit on installed v1.0.30 binary against real-file corpus (20 markdown PT-BR docs).
  • All fixes validated via PDCA + Agent Teams orchestration: 11 tasks, 9 teammates spawned in parallel, each with Rule Zero compliance and per-task validation.
  • Validation passed: cargo fmt, cargo clippy --all-targets -- -D warnings, cargo audit, cargo deny check, cargo doc -D warnings, cargo nextest run.

[1.0.30] - 2026-04-29

Added (New Subcommand — Bulk Ingestion)

  • sqlite-graphrag ingest <DIR> --type <TYPE> subcommand for bulk-indexing every file in a directory as a separate memory. Supports --pattern (default *.md), --recursive, --skip-extraction, --fail-fast, --max-files (safety cap default 10000), --namespace, --db. Output is line-delimited JSON: one event per file ({file, name, status, memory_id, action}) followed by a final summary ({summary: true, files_total, files_succeeded, files_failed, files_skipped, elapsed_ms}). Names are derived from file basenames in kebab-case. Each file is processed by spawning a child remember --body-file invocation, so concurrency slots, lock semantics, and error semantics match standalone remember. Resolves the long-standing UX gap where users had to shell-script over for f in *.md; do remember ...; done to ingest a corpus.

Changed (Help Text Clarity — link / unlink)

  • link --help and unlink --help now make explicit that --from and --to accept ENTITY names (graph nodes auto-extracted by BERT NER, or created implicitly by prior link calls), NOT memory names. Includes an EXAMPLES: block and a NOTES: block in after_long_help. Previously the bare doc-comment "Source entity" was easily misread as "memory name" by new users; the resulting Erro: entidade '<name>' não existe was confusing because the user thought they were passing a valid memory name. Field doc comments now mention graph --format json | jaq '.nodes[].name' as the canonical way to list eligible entity names.

Changed (Dependencies — rusqlite/refinery upgrade)

  • rusqlite bumped from 0.32 to 0.37 and refinery bumped from 0.8 to 0.9. Cargo.lock now resolves rusqlite v0.37.0, refinery v0.9.1, refinery-core v0.9.1, refinery-macros v0.9.1, and libsqlite3-sys v0.35.0. Zero source code changes were required — both crates kept the public APIs we use stable across these versions. Reach for rusqlite 0.39 was blocked by refinery-core 0.9.0 capping rusqlite = ">=0.23, <=0.37"; revisit when refinery raises that ceiling.

Fixed (Critical — Schema/CLI Contract Mismatch)

  • migrations/V009__expand_memory_types.sql — new migration that recreates the memories table (and its FK children: memory_versions, memory_chunks, memory_entities, memory_relationships, memory_urls) to expand the type CHECK constraint from 7 to 9 values, adding 'document' and 'note'. Without this migration, --type document and --type note (added to the CLI enum in v1.0.29) were always rejected at runtime with exit 10CHECK constraint failed: type IN ('user','feedback','project','reference','decision','incident','skill'). The CLI Clap layer accepted nine values while the database enforced seven, breaking every README example that used --type document.
  • tests/schema_migration_integration.rs updated to assert exactly 9 migrations applied (previously expected 6) and schema_version = "9".

Fixed (Critical — Language Policy Violations Missed by v1.0.28 Audit)

The v1.0.28 audit used a single-line regex (rg "tracing::(info|warn|error|debug)!.*[áéíóúâêôãõç]") and reported zero violations. Multi-line macro invocations and identifiers without diacritics escaped detection. Fixed in this release:

  • src/extraction.rs:749 — Portuguese tracing::warn!("relacionamentos truncados em {max_rels} (com {n} entidades, máx teórico era ~{}× combinações)", ...) translated to "relationships truncated to {max_rels} (with {n} entities, theoretical max was ~{}x combinations)".
  • src/extraction.rs:1025 — Portuguese tracing::warn!("extração truncada em {MAX_ENTS} entidades (entrada tinha {total_input} candidatos antes da deduplicação)") translated to "extraction truncated at {MAX_ENTS} entities (input had {total_input} candidates before deduplication)".
  • src/extraction.rs — Eight .context("..."), .with_context(|| format!("...")) and anyhow::anyhow!("...") calls translated from Portuguese to English: "forward pass do BertModel""BertModel forward pass", "forward pass do classificador""classifier forward pass", "removendo dimensão batch""removing batch dimension", "criando tensor de ids para batch""creating id tensor for batch", "padding tensor de ids""padding id tensor", "criando tensor de máscara para batch""creating mask tensor for batch", "criando token_type_ids batch""creating token_type_ids tensor for batch", "forward pass batch BertModel""BertModel batch forward pass", "criando diretório do modelo""creating model directory", "carregando tokenizer NER""loading NER tokenizer", "encoding NER""encoding NER input".
  • src/daemon.rs — Two tracing::*! strings translated: "falha ao remover lock file de spawn ao encerrar daemon""failed to remove spawn lock file while shutting down daemon"; "daemon encerrado graciosamente; socket será limpo pelo OS ou pelo próximo daemon via try_overwrite""daemon shut down gracefully; socket will be cleaned up by OS or by the next daemon via try_overwrite".
  • src/commands/restore.rstracing::info!("restore --version omitido; usando última versão não-restore: {}", v) translated to "restore --version omitted; using latest non-restore version: {}".

Fixed (Test Identifiers — English-only Policy)

~80 test identifiers (function names, helper names, mod names, type aliases) renamed from Portuguese to ...

Read more

v1.0.30

29 Apr 23:18

Choose a tag to compare

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog,
and this project adheres to Semantic Versioning.

[Unreleased]

[1.0.30] - 2026-04-29

Added (New Subcommand — Bulk Ingestion)

  • sqlite-graphrag ingest <DIR> --type <TYPE> subcommand for bulk-indexing every file in a directory as a separate memory. Supports --pattern (default *.md), --recursive, --skip-extraction, --fail-fast, --max-files (safety cap default 10000), --namespace, --db. Output is line-delimited JSON: one event per file ({file, name, status, memory_id, action}) followed by a final summary ({summary: true, files_total, files_succeeded, files_failed, files_skipped, elapsed_ms}). Names are derived from file basenames in kebab-case. Each file is processed by spawning a child remember --body-file invocation, so concurrency slots, lock semantics, and error semantics match standalone remember. Resolves the long-standing UX gap where users had to shell-script over for f in *.md; do remember ...; done to ingest a corpus.

Changed (Help Text Clarity — link / unlink)

  • link --help and unlink --help now make explicit that --from and --to accept ENTITY names (graph nodes auto-extracted by BERT NER, or created implicitly by prior link calls), NOT memory names. Includes an EXAMPLES: block and a NOTES: block in after_long_help. Previously the bare doc-comment "Source entity" was easily misread as "memory name" by new users; the resulting Erro: entidade '<name>' não existe was confusing because the user thought they were passing a valid memory name. Field doc comments now mention graph --format json | jaq '.nodes[].name' as the canonical way to list eligible entity names.

Changed (Dependencies — rusqlite/refinery upgrade)

  • rusqlite bumped from 0.32 to 0.37 and refinery bumped from 0.8 to 0.9. Cargo.lock now resolves rusqlite v0.37.0, refinery v0.9.1, refinery-core v0.9.1, refinery-macros v0.9.1, and libsqlite3-sys v0.35.0. Zero source code changes were required — both crates kept the public APIs we use stable across these versions. Reach for rusqlite 0.39 was blocked by refinery-core 0.9.0 capping rusqlite = ">=0.23, <=0.37"; revisit when refinery raises that ceiling.

Fixed (Critical — Schema/CLI Contract Mismatch)

  • migrations/V009__expand_memory_types.sql — new migration that recreates the memories table (and its FK children: memory_versions, memory_chunks, memory_entities, memory_relationships, memory_urls) to expand the type CHECK constraint from 7 to 9 values, adding 'document' and 'note'. Without this migration, --type document and --type note (added to the CLI enum in v1.0.29) were always rejected at runtime with exit 10CHECK constraint failed: type IN ('user','feedback','project','reference','decision','incident','skill'). The CLI Clap layer accepted nine values while the database enforced seven, breaking every README example that used --type document.
  • tests/schema_migration_integration.rs updated to assert exactly 9 migrations applied (previously expected 6) and schema_version = "9".

Fixed (Critical — Language Policy Violations Missed by v1.0.28 Audit)

The v1.0.28 audit used a single-line regex (rg "tracing::(info|warn|error|debug)!.*[áéíóúâêôãõç]") and reported zero violations. Multi-line macro invocations and identifiers without diacritics escaped detection. Fixed in this release:

  • src/extraction.rs:749 — Portuguese tracing::warn!("relacionamentos truncados em {max_rels} (com {n} entidades, máx teórico era ~{}× combinações)", ...) translated to "relationships truncated to {max_rels} (with {n} entities, theoretical max was ~{}x combinations)".
  • src/extraction.rs:1025 — Portuguese tracing::warn!("extração truncada em {MAX_ENTS} entidades (entrada tinha {total_input} candidatos antes da deduplicação)") translated to "extraction truncated at {MAX_ENTS} entities (input had {total_input} candidates before deduplication)".
  • src/extraction.rs — Eight .context("..."), .with_context(|| format!("...")) and anyhow::anyhow!("...") calls translated from Portuguese to English: "forward pass do BertModel""BertModel forward pass", "forward pass do classificador""classifier forward pass", "removendo dimensão batch""removing batch dimension", "criando tensor de ids para batch""creating id tensor for batch", "padding tensor de ids""padding id tensor", "criando tensor de máscara para batch""creating mask tensor for batch", "criando token_type_ids batch""creating token_type_ids tensor for batch", "forward pass batch BertModel""BertModel batch forward pass", "criando diretório do modelo""creating model directory", "carregando tokenizer NER""loading NER tokenizer", "encoding NER""encoding NER input".
  • src/daemon.rs — Two tracing::*! strings translated: "falha ao remover lock file de spawn ao encerrar daemon""failed to remove spawn lock file while shutting down daemon"; "daemon encerrado graciosamente; socket será limpo pelo OS ou pelo próximo daemon via try_overwrite""daemon shut down gracefully; socket will be cleaned up by OS or by the next daemon via try_overwrite".
  • src/commands/restore.rstracing::info!("restore --version omitido; usando última versão não-restore: {}", v) translated to "restore --version omitted; using latest non-restore version: {}".

Fixed (Test Identifiers — English-only Policy)

~80 test identifiers (function names, helper names, mod names, type aliases) renamed from Portuguese to English. Phase 1 audit only flagged the diacritic subset (*ção, ); identifiers without accents (*_aceita_, *_rejeita, *_funciona, *_retorna, etc.) were missed. Touched files:

  • src/cli.rsmod testes_concorrencia_pesadamod heavy_concurrency_tests; mod testes_formato_json_onlymod json_only_format_tests; 3 inner test fns renamed.
  • src/paths.rslimpar_env_paths helper + 5 test fns renamed (home_env_resolve_db_em_subdir, home_env_traversal_rejeitado, db_path_vence_home, flag_vence_home, home_env_vazio_cai_para_cwd, parent_or_err_aceita/rejeita_*).
  • src/errors.rs — 11 test fns renamed (the _em_portugues suffix family + 3 others).
  • src/commands/init.rs — 5 test fns renamed (init_response_serializa_*, latest_schema_version_retorna_*, init_response_dim/namespace_alinhado_*).
  • src/commands/migrate.rs — 5 test fns + 2 helper fns renamed.
  • src/extraction.rs — 11 internal test fns renamed (the iob_mapeia_*, regex_*_aceita_*, build_relationships_sem_duplicatas, etc.).
  • src/output.rs, src/memory_guard.rs, src/commands/{sync_safe_copy, cleanup_orphans, list, vacuum}.rs — 7 test fns renamed.
  • src/storage/{urls, memories, entities}.rstype Resultadotype TestResult (3 modules, ~70 occurrences).
  • tests/security_hardening.rs — 16 test fns renamed (test_path_traversal_rejeitado_*, test_chmod_*_apos_init_*, test_blake3_*_diferente_*, test_sql_injection_em_*, etc.).
  • tests/integration.rs — ~28 test fns renamed (the test_remember_cria/rejeita/aceita_*, test_link_cria_relacao_*, test_graph_stdin_aceita_*, etc.).
  • tests/prd_compliance.rs — ~15 test fns renamed.
  • tests/concurrency_*.rs, tests/i18n_bilingual_integration.rs, tests/signal_handling_integration.rs, tests/v2_breaking_integration.rs, tests/lock_integration.rs, tests/property_based.rs, tests/loom_lock_slots.rs, tests/regression_positional_args.rs, tests/recall_integration.rs, tests/daemon_integration.rs, tests/schema_migration_integration.rs — remaining test fns and helpers translated.

Notes

  • errors::to_string_pt() and main::emit_progress_i18n(en, pt) continue to hold legitimate Portuguese strings — these are the i18n branch invoked when --lang pt (or detected locale) is active. They are not violations.
  • Default behaviour ./graphrag.sqlite in CWD (resolved via paths.rs:35-41) confirmed empirically against the v1.0.29 audit corpus (29 of 30 flowaiper Markdown documents indexed end-to-end; recall p50 ~50ms, hybrid-search p50 ~52ms; one stress-test failure was an external 60s timeout, not a tool defect).
  • Empirical evidence: the bug was reproducible with one CLI invocation: sqlite-graphrag remember --type document --name x --description y --body z returned exit 10 with the schema CHECK error message in v1.0.29.

[1.0.29] - 2026-04-29

Fixed (Critical — Language Policy Violations in Production Code)

  • src/paths.rs:21 — Portuguese error message "não foi possível determinar o diretório home" in AppError::Io translated to "could not determine home directory". Was emitted in tracing::error! and CLI stderr regardless of --lang flag.
  • src/paths.rs:85-89 — Portuguese error message "caminho '{}' não possui componente pai válido" in AppError::Validation translated to "path '{}' has no valid parent component".
  • src/main.rs:227 — Portuguese tracing::warn!("recebido sinal de shutdown...") translated to "shutdown signal received; waiting for current command to finish gracefully". Tracing logs are required to be English regardless of locale.
  • src/commands/purge.rs:21 — Portuguese doc comment "[DEPRECATED em v2.0.0]" translated to "[DEPRECATED in v2.0.0]".
  • src/commands/purge.rs:70-71 — Portuguese warning string "--older-than-seconds está deprecado..." (emitted in JSON warnings field) translated to "--older-than-seconds is deprecated; use --retention-days in v2.0.0+". JSON output must be language-neutral.
  • src/commands/purge.rs:123 — Portuguese anyhow!("erro de relógio do sistema: {err}") translated to "system clock error: {err}".
  • src/commands/purge.rs:192-193 — Portuguese warning `"falha ao limpar vec_chunks....
Read more

v1.0.29

29 Apr 18:29

Choose a tag to compare

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog,
and this project adheres to Semantic Versioning.

[Unreleased]

[1.0.29] - 2026-04-29

Fixed (Critical — Language Policy Violations in Production Code)

  • src/paths.rs:21 — Portuguese error message "não foi possível determinar o diretório home" in AppError::Io translated to "could not determine home directory". Was emitted in tracing::error! and CLI stderr regardless of --lang flag.
  • src/paths.rs:85-89 — Portuguese error message "caminho '{}' não possui componente pai válido" in AppError::Validation translated to "path '{}' has no valid parent component".
  • src/main.rs:227 — Portuguese tracing::warn!("recebido sinal de shutdown...") translated to "shutdown signal received; waiting for current command to finish gracefully". Tracing logs are required to be English regardless of locale.
  • src/commands/purge.rs:21 — Portuguese doc comment "[DEPRECATED em v2.0.0]" translated to "[DEPRECATED in v2.0.0]".
  • src/commands/purge.rs:70-71 — Portuguese warning string "--older-than-seconds está deprecado..." (emitted in JSON warnings field) translated to "--older-than-seconds is deprecated; use --retention-days in v2.0.0+". JSON output must be language-neutral.
  • src/commands/purge.rs:123 — Portuguese anyhow!("erro de relógio do sistema: {err}") translated to "system clock error: {err}".
  • src/commands/purge.rs:192-193 — Portuguese warning "falha ao limpar vec_chunks..." (in JSON warnings) translated to "failed to clean vec_chunks for memory_id {memory_id}: {err}".
  • src/commands/purge.rs:198-201 — Portuguese warning "falha ao limpar vec_memories..." (in JSON warnings) translated to "failed to clean vec_memories for memory_id {memory_id}: {err}".
  • src/main.rs:265 — Removed duplicate tracing::error!(error = %e) that emitted localized error string into structured logs (line 266 emit_error(&e.localized_message()) already handles user-visible output). Eliminates the i18n→tracing leakage where Portuguese error payloads were polluting EN-only log channels.

Fixed (Security — Path Traversal & Unsafe Audit)

  • src/paths.rs:60validate_path now uses Path::components().any(|c| c == Component::ParentDir) instead of substring .contains(".."), preventing both false positives on filenames containing .. (e.g., ..config) and potential bypass via non-standard path encodings.
  • src/extraction.rs:271 — Added comprehensive SAFETY: comment to unsafe { VarBuilder::from_mmaped_safetensors(...) } documenting the three soundness invariants (file not concurrently modified, mmaped region lifetime tracking, safetensors format validation).
  • src/storage/connection.rs:14-21 — Added SAFETY: comment to unsafe { rusqlite::ffi::sqlite3_auto_extension(...) } documenting FFI ABI compatibility, transmute layout invariants, and single-call invocation guarantee.
  • src/paths.rs (6 SAFETY comments in tests) — Translated from Portuguese ("SAFETY: testes marcados com #[serial] garantem ausência de concorrência.") to English ("SAFETY: tests are annotated with #[serial], guaranteeing single-threaded execution.").

Added (UX Improvements)

  • list --include-deleted flag to surface soft-deleted memories. Without this flag, forget followed by list would create a workflow dead-end where soft-deleted entries became invisible.
  • history --no-body flag to omit version body content from the JSON response. Useful for memories with large body content where only metadata/version sequence is needed.
  • MemoryType::Document and MemoryType::Note variants added to the --type enum (remember, list, recall). Documentation-style content no longer needs to abuse the Reference type.
  • help = text added to ~10 previously bare flags (--namespace, --limit, --offset, --format, --db, --include-deleted, --no-body) across list, history, and other subcommands.
  • README Quick Start now explicitly documents that sqlite-graphrag init is the first required command and that graphrag.sqlite is created in the current working directory by default.

Changed (Schema & UX)

  • --json flag is now hidden in 21 subcommands via #[arg(long, hide = true)]. The flag was a no-op (JSON is the default output format) but appeared in --help causing confusion. The flag remains accepted for backward compatibility with tools that pass it explicitly.
  • history JSON response: metadata field type changed from String (raw JSON-encoded) to serde_json::Value (parsed object), aligning with read which already exposed it as Value. Consumers parsing metadata as a JSON string must now read it as an object directly. Empty/invalid metadata defaults to {}.
  • history JSON response: body field is now Option<String> (omitted when --no-body is set). When the field is present (default), the existing schema is unchanged.
  • Cargo.toml exclude list: /CLAUDE.md, /AGENTS.md, /MEMORY.md rewritten without leading / for idiomatic relative-path semantics matching cargo conventions.

Notes

  • This is a patch release focused on policy compliance and UX fixes detected in the v1.0.28 audit (/tmp/sqlite-graphrag-audit/reports/audit-v1.0.28.md).
  • One JSON schema change: history.metadata from string to object. Consumers that parsed metadata as a string must now read it as an object. All other JSON contracts (commands, fields, exit codes) remain unchanged.
  • Empirically validated against real Markdown documents from a 495-file corpus during the v1.0.28 audit. CRUD cycle (init → remember → recall → read → edit → forget → purge) verified end-to-end.

[1.0.28] - 2026-04-28

Changed

  • Enforces the English-only Language Policy across the entire codebase. All /// and //! doc comments, all tracing::*! log strings, and all identifiers (functions, statics, modules, enum variants, test names) outside src/i18n.rs translation tables are now in English. PT-BR strings remain only in Language::Portuguese branches inside i18n::errors_msg, i18n::validation, and errors::to_string_pt().
  • Language::Portugues enum variant renamed to Language::Portuguese (CLI aliases pt, pt-br, pt-BR, portugues, portuguese preserved for backward compatibility).
  • IDIOMA_GLOBAL static renamed to GLOBAL_LANGUAGE (src/i18n.rs).
  • FUSO_GLOBAL static renamed to GLOBAL_TZ (src/tz.rs).
  • ~30 PT-named functions renamed to English equivalents in src/i18n.rs and src/tz.rs (e.g., formatar_isoformat_iso, epoch_para_isoepoch_to_iso, memoria_nao_encontradamemory_not_found, nome_kebabname_kebab, validacao module → validation, erros module → errors_msg).
  • 32 internal mod testes test modules renamed to mod tests for consistency with Rust convention.
  • All call-sites in src/commands/*.rs and tests propagated to use the renamed identifiers.

Added

  • //! crate-level documentation in 37 modules that previously lacked it: src/cli.rs, src/main.rs, src/extraction.rs, src/embedder.rs, src/daemon.rs, src/output.rs, src/paths.rs, src/chunking.rs, src/graph.rs, src/namespace.rs, src/parsers/mod.rs, src/tokenizer.rs, src/storage/{connection,urls,chunks,versions,mod}.rs, src/pragmas.rs, and 22 handlers in src/commands/.
  • language-check CI job in .github/workflows/ci.yml that fails the build when Portuguese diacritics are detected in ///, //!, tracing::*! calls, or #[error(...)] attributes — automated guardrail against regression.

Documentation

  • Two broken intra-doc links ([Cli], [TextEmbedding]) fixed in src/lib.rs and src/embedder.rs (surfaced when cargo doc -D warnings was first run with the new doc coverage).

Notes

  • This is a non-breaking change for the CLI and JSON contracts: subcommand names, flags, env vars, exit codes, and JSON field names remain unchanged. Internal Rust identifiers were renamed but the crate is a binary, not a library consumed via pub use.
  • 65 files changed, +872/-715 lines. All 9 cargo gates pass (fmt, clippy, test, doc, audit, deny, publish dry-run, package list, llvm-cov).

[1.0.27] - 2026-04-28

Added

  • CURRENT_SCHEMA_VERSION: u32 = 8 constant in src/constants.rs with unit test that asserts equality with the count of V*.sql migration files.
  • output::emit_error and output::emit_error_i18n functions centralizing stderr error output (Pattern 5: ÚNICO ponto de I/O em output.rs).
  • nextest test-groups configuration in .config/nextest.toml to serialize cross-binary tests sharing the daemon socket and model cache. Eliminates contract_15_link flake observed since v1.0.24.

Changed

  • README EN+PT (Graph Schema section) now lists entity_type as exactly 13 values (was 10) — adds organization, location, date introduced in V008 schema migration of v1.0.25.
  • init --help docstring documents path resolution precedence (--db > SQLITE_GRAPHRAG_DB_PATH > SQLITE_GRAPHRAG_HOME > cwd).
  • src/commands/recall.rs graph-distance comment clarified: it remains a hop-count proxy (1.0 - 1.0/(hop+1)), real cosine distance is reserved for v1.0.28 (forward-dated reference fixed).
  • All 6 eprintln! calls in src/main.rs migrated to output::emit_error* to enforce Pattern 5.

Documentation

  • SQLITE_GRAPHRAG_LOG_FORMAT now documented in the env-var table of README EN+PT (was implemented since v1.0.x but undocumented).
  • README unlink row corrected from the non-existent --relationship-id flag to the actual --from --to --relation flags. The previous documentation could mislead agents into rejecting valid invocations.
  • docs/MIGRATION.md and docs/MIGRATION.pt-BR.md version reference updated from v1.0.17 to v1.0.27 (3 occurrences each).
  • docs/HOW_TO_USE.md and `docs/HOW_...
Read more

v1.0.28

28 Apr 11:26

Choose a tag to compare

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog,
and this project adheres to Semantic Versioning.

[Unreleased]

[1.0.28] - 2026-04-28

Changed

  • Enforces the English-only Language Policy across the entire codebase. All /// and //! doc comments, all tracing::*! log strings, and all identifiers (functions, statics, modules, enum variants, test names) outside src/i18n.rs translation tables are now in English. PT-BR strings remain only in Language::Portuguese branches inside i18n::errors_msg, i18n::validation, and errors::to_string_pt().
  • Language::Portugues enum variant renamed to Language::Portuguese (CLI aliases pt, pt-br, pt-BR, portugues, portuguese preserved for backward compatibility).
  • IDIOMA_GLOBAL static renamed to GLOBAL_LANGUAGE (src/i18n.rs).
  • FUSO_GLOBAL static renamed to GLOBAL_TZ (src/tz.rs).
  • ~30 PT-named functions renamed to English equivalents in src/i18n.rs and src/tz.rs (e.g., formatar_isoformat_iso, epoch_para_isoepoch_to_iso, memoria_nao_encontradamemory_not_found, nome_kebabname_kebab, validacao module → validation, erros module → errors_msg).
  • 32 internal mod testes test modules renamed to mod tests for consistency with Rust convention.
  • All call-sites in src/commands/*.rs and tests propagated to use the renamed identifiers.

Added

  • //! crate-level documentation in 37 modules that previously lacked it: src/cli.rs, src/main.rs, src/extraction.rs, src/embedder.rs, src/daemon.rs, src/output.rs, src/paths.rs, src/chunking.rs, src/graph.rs, src/namespace.rs, src/parsers/mod.rs, src/tokenizer.rs, src/storage/{connection,urls,chunks,versions,mod}.rs, src/pragmas.rs, and 22 handlers in src/commands/.
  • language-check CI job in .github/workflows/ci.yml that fails the build when Portuguese diacritics are detected in ///, //!, tracing::*! calls, or #[error(...)] attributes — automated guardrail against regression.

Documentation

  • Two broken intra-doc links ([Cli], [TextEmbedding]) fixed in src/lib.rs and src/embedder.rs (surfaced when cargo doc -D warnings was first run with the new doc coverage).

Notes

  • This is a non-breaking change for the CLI and JSON contracts: subcommand names, flags, env vars, exit codes, and JSON field names remain unchanged. Internal Rust identifiers were renamed but the crate is a binary, not a library consumed via pub use.
  • 65 files changed, +872/-715 lines. All 9 cargo gates pass (fmt, clippy, test, doc, audit, deny, publish dry-run, package list, llvm-cov).

[1.0.27] - 2026-04-28

Added

  • CURRENT_SCHEMA_VERSION: u32 = 8 constant in src/constants.rs with unit test that asserts equality with the count of V*.sql migration files.
  • output::emit_error and output::emit_error_i18n functions centralizing stderr error output (Pattern 5: ÚNICO ponto de I/O em output.rs).
  • nextest test-groups configuration in .config/nextest.toml to serialize cross-binary tests sharing the daemon socket and model cache. Eliminates contract_15_link flake observed since v1.0.24.

Changed

  • README EN+PT (Graph Schema section) now lists entity_type as exactly 13 values (was 10) — adds organization, location, date introduced in V008 schema migration of v1.0.25.
  • init --help docstring documents path resolution precedence (--db > SQLITE_GRAPHRAG_DB_PATH > SQLITE_GRAPHRAG_HOME > cwd).
  • src/commands/recall.rs graph-distance comment clarified: it remains a hop-count proxy (1.0 - 1.0/(hop+1)), real cosine distance is reserved for v1.0.28 (forward-dated reference fixed).
  • All 6 eprintln! calls in src/main.rs migrated to output::emit_error* to enforce Pattern 5.

Documentation

  • SQLITE_GRAPHRAG_LOG_FORMAT now documented in the env-var table of README EN+PT (was implemented since v1.0.x but undocumented).
  • README unlink row corrected from the non-existent --relationship-id flag to the actual --from --to --relation flags. The previous documentation could mislead agents into rejecting valid invocations.
  • docs/MIGRATION.md and docs/MIGRATION.pt-BR.md version reference updated from v1.0.17 to v1.0.27 (3 occurrences each).
  • docs/HOW_TO_USE.md and docs/HOW_TO_USE.pt-BR.md link recipe examples corrected to use --from/--to instead of the non-existent --source/--target flags.

Fixed

  • Formatting drift in tests/doc_contract_integration.rs:669 resolved via cargo fmt --all (multi-line array → single-line as expected by rustfmt).

Notes

  • Investigation of the audit P1 finding tokenizer.rs:101-103 std::fs::read in async path concluded false positive: get_tokenizer and get_model_max_length are called only from src/commands/remember.rs:389-391 inside pub fn run() which is synchronous. No spawn_blocking wrap is required. The blocking I/O is appropriate for the synchronous CLI command path.
  • Two advisory-not-detected warnings from cargo deny for ignored advisories RUSTSEC-2024-0436 (paste) and RUSTSEC-2025-0119 (number_prefix) were observed but kept in deny.toml — they protect against re-introduction via fastembed's transitive deps if upstream regresses. A scheduled cleanup is deferred to v1.0.28 after explicit verification of cargo tree confirming the deps are no longer present.

[1.0.26] - 2026-04-28

Added

  • SQLITE_GRAPHRAG_HOME env var for setting the base directory for graphrag.sqlite (precedence: --db > SQLITE_GRAPHRAG_DB_PATH > SQLITE_GRAPHRAG_HOME > cwd).
  • README sample JSON output for remember showing extracted_entities, extracted_relationships, and urls_persisted fields.
  • Expanded exit-code table with sub-causes for exit 1 (Validation error or runtime failure).

Changed

  • README clarifies that GraphRAG entity extraction runs by default in remember (use --skip-extraction to disable per call).
  • Renamed reference to "automatic ingestion" in README to disambiguate "daemon autostart" from "automatic entity extraction".

Fixed

  • Daemon handled_embed_requests counter now correctly reports the cumulative count after init autospawn (was returning 0 since v1.0.24 due to a per-connection local counter shadowing the shared accumulator).
  • Test contract_15_link aligned with the actual link --json output keys (action, from, to, relation, weight, namespace); the obsolete expectations of source/target numeric IDs were stale since v1.0.24.

[1.0.25] - 2026-04-28

Added

  • recall --all-namespaces flag searches across all namespaces in a single query (P0-1).
  • BERT NER now emits organization (B-ORG), location (B-LOC), and date (B-DATE)
    entity types aligned with V008 schema migration. Previous releases mapped ORG→project,
    LOC→concept, and discarded DATE entirely (P0-2 + V008 alignment).
  • Schema migration V008: entities.type CHECK constraint expanded to include organization,
    location, date. Additive migration; existing rows are preserved unchanged.
  • BRAND_NAME_REGEX captures CamelCase organization names such as "OpenAI", "PostgreSQL",
    "ChatGPT" that BERT NER frequently misclassifies (P0-2).
  • Portuguese monosyllabic verb false-positive filter ("Lê", "Vê", "Cá", etc.) for BERT
    outputs below confidence threshold 0.85 (P0-2).
  • SECTION_MARKER_REGEX filters text fragments like "Etapa 3", "Fase 1", "Passo 2",
    "Seção 4", "Capítulo 1" from entity extraction (P0-4).
  • 12 new ALL_CAPS_STOPWORDS: API, CAPÍTULO, CLI, ETAPA, FASE, HTTP, HTTPS,
    JWT, LLM, PASSO, REST, UI, URL (P0-4).
  • README documents graph traverse|stats|entities subcommands with flags table (P1-A).

Changed

  • recall.graph_matches[].distance now reflects graph hop count via proxy
    1.0 - 1.0 / (hop + 1). Previous releases used 0.0 placeholder. Real cosine
    distance is reserved for v1.0.26 (P1-M).
  • merge_and_deduplicate longest-wins logic rewritten with composite key
    entity_type + name_lc and bidirectional substring containment. Resolves
    "Sonne"/"Sonnet" duplication and "Open"/"Paper" truncation issues (P0-3).
  • Cargo.toml version bumped from 1.0.24 to 1.0.25.

Fixed

  • is_valid_entity_type now accepts new V008 types organization, location, date (P0-A) — without this fix, remember would reject any entity emitted by the V008-aligned IOB mapping with exit 1.
  • augment_versioned_model_names regex no longer captures Portuguese section markers like "Etapa 3" or "Fase 1" (P0-B) — defense-in-depth filter applied after augmentation and inside iob_to_entities.flush().
  • remember --name longer than 80 bytes now returns exit code 6 (LimitExceeded)
    instead of exit 1 (Validation). Restores the exit code contract used by
    orchestrating agents (P1-J).

Notes

  • recall.graph_matches[].distance is approximate; semantic cosine distance reserved for v1.0.26.
  • Entity and relationship caps (30 and 50 respectively) remain silent in v1.0.25;
    explicit --limit-entities / --limit-relations flags planned for v1.0.26.

[1.0.24] - 2026-04-27

Added

  • BERT NER batch inference via predict_batch reduces per-document latency on multi-doc workloads (Phase 3 perf).
  • SQLITE_BUSY and SQLITE_LOCKED retry with exponential backoff in with_busy_retry; avoids spurious exit 10 on WAL-mode contention (Phase 3).
  • spawn_blocking warm-up for daemon BERT model init prevents blocking the async executor during startup (Phase 3).
  • Schema migration V007: memory_urls table with indexes; URLs extracted from BERT NER are now persisted separately instead of leaking into the entity graph (Phase 2).
  • src/storage/urls.rs CRUD module providing upsert_urls, get_urls_for_memory and delete_urls_for_memory (Phase 2).
  • RememberResponse.urls_persisted: usize field re...
Read more

v1.0.27

28 Apr 09:44

Choose a tag to compare

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog,
and this project adheres to Semantic Versioning.

[Unreleased]

[1.0.27] - 2026-04-28

Added

  • CURRENT_SCHEMA_VERSION: u32 = 8 constant in src/constants.rs with unit test that asserts equality with the count of V*.sql migration files.
  • output::emit_error and output::emit_error_i18n functions centralizing stderr error output (Pattern 5: ÚNICO ponto de I/O em output.rs).
  • nextest test-groups configuration in .config/nextest.toml to serialize cross-binary tests sharing the daemon socket and model cache. Eliminates contract_15_link flake observed since v1.0.24.

Changed

  • README EN+PT (Graph Schema section) now lists entity_type as exactly 13 values (was 10) — adds organization, location, date introduced in V008 schema migration of v1.0.25.
  • init --help docstring documents path resolution precedence (--db > SQLITE_GRAPHRAG_DB_PATH > SQLITE_GRAPHRAG_HOME > cwd).
  • src/commands/recall.rs graph-distance comment clarified: it remains a hop-count proxy (1.0 - 1.0/(hop+1)), real cosine distance is reserved for v1.0.28 (forward-dated reference fixed).
  • All 6 eprintln! calls in src/main.rs migrated to output::emit_error* to enforce Pattern 5.

Documentation

  • SQLITE_GRAPHRAG_LOG_FORMAT now documented in the env-var table of README EN+PT (was implemented since v1.0.x but undocumented).
  • README unlink row corrected from the non-existent --relationship-id flag to the actual --from --to --relation flags. The previous documentation could mislead agents into rejecting valid invocations.
  • docs/MIGRATION.md and docs/MIGRATION.pt-BR.md version reference updated from v1.0.17 to v1.0.27 (3 occurrences each).
  • docs/HOW_TO_USE.md and docs/HOW_TO_USE.pt-BR.md link recipe examples corrected to use --from/--to instead of the non-existent --source/--target flags.

Fixed

  • Formatting drift in tests/doc_contract_integration.rs:669 resolved via cargo fmt --all (multi-line array → single-line as expected by rustfmt).

Notes

  • Investigation of the audit P1 finding tokenizer.rs:101-103 std::fs::read in async path concluded false positive: get_tokenizer and get_model_max_length are called only from src/commands/remember.rs:389-391 inside pub fn run() which is synchronous. No spawn_blocking wrap is required. The blocking I/O is appropriate for the synchronous CLI command path.
  • Two advisory-not-detected warnings from cargo deny for ignored advisories RUSTSEC-2024-0436 (paste) and RUSTSEC-2025-0119 (number_prefix) were observed but kept in deny.toml — they protect against re-introduction via fastembed's transitive deps if upstream regresses. A scheduled cleanup is deferred to v1.0.28 after explicit verification of cargo tree confirming the deps are no longer present.

[1.0.26] - 2026-04-28

Added

  • SQLITE_GRAPHRAG_HOME env var for setting the base directory for graphrag.sqlite (precedence: --db > SQLITE_GRAPHRAG_DB_PATH > SQLITE_GRAPHRAG_HOME > cwd).
  • README sample JSON output for remember showing extracted_entities, extracted_relationships, and urls_persisted fields.
  • Expanded exit-code table with sub-causes for exit 1 (Validation error or runtime failure).

Changed

  • README clarifies that GraphRAG entity extraction runs by default in remember (use --skip-extraction to disable per call).
  • Renamed reference to "automatic ingestion" in README to disambiguate "daemon autostart" from "automatic entity extraction".

Fixed

  • Daemon handled_embed_requests counter now correctly reports the cumulative count after init autospawn (was returning 0 since v1.0.24 due to a per-connection local counter shadowing the shared accumulator).
  • Test contract_15_link aligned with the actual link --json output keys (action, from, to, relation, weight, namespace); the obsolete expectations of source/target numeric IDs were stale since v1.0.24.

[1.0.25] - 2026-04-28

Added

  • recall --all-namespaces flag searches across all namespaces in a single query (P0-1).
  • BERT NER now emits organization (B-ORG), location (B-LOC), and date (B-DATE)
    entity types aligned with V008 schema migration. Previous releases mapped ORG→project,
    LOC→concept, and discarded DATE entirely (P0-2 + V008 alignment).
  • Schema migration V008: entities.type CHECK constraint expanded to include organization,
    location, date. Additive migration; existing rows are preserved unchanged.
  • BRAND_NAME_REGEX captures CamelCase organization names such as "OpenAI", "PostgreSQL",
    "ChatGPT" that BERT NER frequently misclassifies (P0-2).
  • Portuguese monosyllabic verb false-positive filter ("Lê", "Vê", "Cá", etc.) for BERT
    outputs below confidence threshold 0.85 (P0-2).
  • SECTION_MARKER_REGEX filters text fragments like "Etapa 3", "Fase 1", "Passo 2",
    "Seção 4", "Capítulo 1" from entity extraction (P0-4).
  • 12 new ALL_CAPS_STOPWORDS: API, CAPÍTULO, CLI, ETAPA, FASE, HTTP, HTTPS,
    JWT, LLM, PASSO, REST, UI, URL (P0-4).
  • README documents graph traverse|stats|entities subcommands with flags table (P1-A).

Changed

  • recall.graph_matches[].distance now reflects graph hop count via proxy
    1.0 - 1.0 / (hop + 1). Previous releases used 0.0 placeholder. Real cosine
    distance is reserved for v1.0.26 (P1-M).
  • merge_and_deduplicate longest-wins logic rewritten with composite key
    entity_type + name_lc and bidirectional substring containment. Resolves
    "Sonne"/"Sonnet" duplication and "Open"/"Paper" truncation issues (P0-3).
  • Cargo.toml version bumped from 1.0.24 to 1.0.25.

Fixed

  • is_valid_entity_type now accepts new V008 types organization, location, date (P0-A) — without this fix, remember would reject any entity emitted by the V008-aligned IOB mapping with exit 1.
  • augment_versioned_model_names regex no longer captures Portuguese section markers like "Etapa 3" or "Fase 1" (P0-B) — defense-in-depth filter applied after augmentation and inside iob_to_entities.flush().
  • remember --name longer than 80 bytes now returns exit code 6 (LimitExceeded)
    instead of exit 1 (Validation). Restores the exit code contract used by
    orchestrating agents (P1-J).

Notes

  • recall.graph_matches[].distance is approximate; semantic cosine distance reserved for v1.0.26.
  • Entity and relationship caps (30 and 50 respectively) remain silent in v1.0.25;
    explicit --limit-entities / --limit-relations flags planned for v1.0.26.

[1.0.24] - 2026-04-27

Added

  • BERT NER batch inference via predict_batch reduces per-document latency on multi-doc workloads (Phase 3 perf).
  • SQLITE_BUSY and SQLITE_LOCKED retry with exponential backoff in with_busy_retry; avoids spurious exit 10 on WAL-mode contention (Phase 3).
  • spawn_blocking warm-up for daemon BERT model init prevents blocking the async executor during startup (Phase 3).
  • Schema migration V007: memory_urls table with indexes; URLs extracted from BERT NER are now persisted separately instead of leaking into the entity graph (Phase 2).
  • src/storage/urls.rs CRUD module providing upsert_urls, get_urls_for_memory and delete_urls_for_memory (Phase 2).
  • RememberResponse.urls_persisted: usize field reporting how many URL entries landed in memory_urls (Phase 2).
  • RememberResponse.relationships_truncated: bool field indicating whether the relationships payload was capped at max_relationships_per_memory (Phase 4).
  • namespace_initial persisted in schema_meta on init; purge resolves contextually via SQLITE_GRAPHRAG_NAMESPACE (Phase 4 P1-A/P1-C).
  • Positional and flag arguments in read, forget, history, edit, rename; e.g. sqlite-graphrag read my-note is equivalent to sqlite-graphrag read --name my-note (Phase 4 P1-B).
  • Stopwords list expanded with 17 new entries: ACEITE, ACK, ACL, BORDA, CHECKLIST, COMPLETED, CONFIRME, DEVEMOS, DONE, FIXED, NEGUE, PENDING, PLAN, PODEMOS, RECUSE, TOKEN, VAMOS (Phase 2 P0-3).
  • NFKC unicode normalization in merge_and_deduplicate prevents near-duplicate entities caused by composed vs decomposed Unicode forms (Phase 2 P1-E).
  • Regression tests for graph traverse exit 4 when the database is absent (Phase 1 P0-7).
  • Regression tests for positional-plus-flag argument equivalence in read, forget, history, edit, rename (Phase 4 P1-B).

Changed

  • ReadResponse.metadata is now serde_json::Value instead of String; agents receive a structured object directly without a second JSON.parse call (Phase 5 P2-A).
  • LinkResponse simplified: redundant source and target fields removed; LinkArgs no longer accepts --source/--target flag aliases (Phase 4 P1-O).
  • purge no longer defaults namespace to "global"; resolves via SQLITE_GRAPHRAG_NAMESPACE or explicit --namespace (Phase 4 P1-C).
  • recall --precise behavior is now documented and internally uses effective_k = 100000 for exhaustive KNN (Phase 1 P0-6).
  • init --model now uses the typed EmbeddingModelChoice enum validated at parse time (Phase 1 P0-8).
  • main.rs RAM measurement uses Result propagation instead of expect (Phase 1 P1-G).
  • Daemon warm-up model load moved into spawn_blocking to avoid blocking the Tokio executor (Phase 3 P1-I).
  • augment_versioned_model_names regex extended to recognize GPT-4o, Claude 4 Sonnet, Llama 3 Pro, Mixtral 8x7B patterns (Phase 5 P2-D).
  • extend_with_numeric_suffix now accepts alphanumeric suffixes (e.g. v2, 3b, 7B) in addition to purely numeric ones (Phase 5 P2-E).
  • Graph entity serialization uses Vec::new() instead of Option<Vec> so the entities field is always an array, never null (Phase 5 P2-C).
  • --type argu...
Read more