fix(indexer): wikilinks are directional; record unresolved targets#37
Open
jdubdevs wants to merge 1 commit into
Open
fix(indexer): wikilinks are directional; record unresolved targets#37jdubdevs wants to merge 1 commit into
jdubdevs wants to merge 1 commit into
Conversation
Closes devwhodevs#35, closes devwhodevs#36. Two ingest-time bugs in `build_edges_for_file`: 1. (devwhodevs#35) unresolved_links table never populated. Indexer extracted wikilinks but the `Some(target_id)` branch was the only branch — `None` (target didn't resolve) was a silent drop. The helper `Store::insert_unresolved_link` existed and worked but was never called from the index path. 2. (devwhodevs#36) edges stored bidirectionally. For each `[[target]]` resolved in source A → target B, the indexer wrote TWO directed edges: `(A, B, 'wikilink')` AND `(B, A, 'wikilink')`. The reverse edge was inserted regardless of B's actual content. Effect: outgoing- edge queries returned each file's actual wikilink targets PLUS every file that linked INTO it; asymmetric-link detection via `NOT EXISTS (reverse edge)` was structurally always-false. Fix: - Resolve `file_id → source_path` via `get_file_by_id` so we can pass `source_file` to `insert_unresolved_link`. - Clear pre-existing `unresolved_links` entries for the source file before re-recording — supports incremental re-index without stale entries accumulating. - Insert exactly ONE directed edge `(file_id → target_id)` per resolved wikilink. The reverse edge is inserted (if it should exist) when `build_edges_for_file` is called on the target file with its own content — which is the natural sequence during a full index pass. - Self-links (target_id == file_id) continue to be ignored. - Unresolved targets land in `unresolved_links` via `insert_unresolved_link`. Tests added in `src/indexer.rs::tests`: - `test_wikilink_edges_are_directional_not_bidirectional` — A has `[[B]]`, B has no wikilinks; verifies A→B exists, B→A doesn't, and B has 1 incoming. - `test_unresolved_wikilinks_are_recorded` — A has one resolving and one broken wikilink; verifies the broken target lands in `unresolved_links` with correct source. - `test_unresolved_links_cleared_on_re_index` — when source is re-indexed with the broken wikilink removed, the prior unresolved entry is cleared (no stale accumulation). The existing `test_edge_building_during_index` continues to pass because that test happens to use files where both directions DO have wikilinks in source content — so the natural a-then-b indexing sequence still produces both directed edges correctly. All 461 library tests pass (`cargo test --lib`). Verified against a 1,646-file Obsidian vault after `engraph index --rebuild` with the patched binary: previously-empty `unresolved_links` table now populated, previously-fabricated reverse edges no longer present.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #35 and #36 — two ingest-time bugs in
build_edges_for_file(src/indexer.rs).What was broken
Bug #35 —
unresolved_linksnever populatedWhen
resolve_link_targetreturnedNone, the code path silently dropped the unresolved target. TheStore::insert_unresolved_linkhelper existed and worked but was never called from the index path.Bug #36 — wikilinks stored as undirected edges
For each
[[B]]in source A, the indexer wrote TWO directed edges:(A, B, 'wikilink')AND(B, A, 'wikilink'). The reverse edge was inserted regardless of B's actual content. Effect: outgoing-edge queries returned the file's actual wikilink targets PLUS every file that linked into it; asymmetric-link detection viaNOT EXISTS (reverse edge)was structurally always-false.The fix
Three changes:
file_id → source_pathupfront viaget_file_by_idso we can passsource_filetoinsert_unresolved_link.unresolved_linksfor the source file before re-recording (supports incremental re-index).build_edges_for_fileis called on the target during the index pass.Tests
Added 3 regression tests in
src/indexer.rs::tests:test_wikilink_edges_are_directional_not_bidirectional— A has[[B]], B has zero wikilinks. Verifies A→B exists, B→A does NOT, and B has 1 incoming from A. (Pre-fix, this test would fail: B→A would also exist.)test_unresolved_wikilinks_are_recorded— A has one resolving and one broken wikilink. Verifies the broken target lands inunresolved_linkswith the correct source path. (Pre-fix, this test would fail:unresolved_linkswould be empty.)test_unresolved_links_cleared_on_re_index— when source is re-indexed after the broken wikilink is removed from content, the stale unresolved entry is cleared. Tests the incremental-update path.The existing
test_edge_building_during_indexcontinues to pass because that test happens to use files where both directions DO have wikilinks in source content — so the natural a-then-b indexing sequence still produces both directed edges correctly under the new behavior.Verification
Also verified against a real 1,646-file Obsidian vault (~10K wikilinks, ~19K wikilink occurrences total):
Before (engraph 1.6.1 from brew):
After (patched binary +
engraph index --rebuild):Compatibility notes
The schema is unchanged. Existing consumers of
unresolved_links(get_unresolved_links, etc.) start receiving rows; consumers ofedgessee roughly half as manywikilinkrows (the spurious reverse direction is gone). Health/lint tooling that hand-rolled wikilink-resolution to work around the bugs can now read from the canonical tables.If any consumer was implicitly relying on the bidirectional storage (treating
wikilinkedges as adjacency), the recommended migration is to add a'backlink'or'wikilink_undirected'edge_type if that semantic is genuinely desired — separately from source-content wikilinks.