Skip to content

fix(indexer): wikilinks are directional; record unresolved targets#37

Open
jdubdevs wants to merge 1 commit into
devwhodevs:mainfrom
jdubdevs:fix/wikilink-indexer-bugs
Open

fix(indexer): wikilinks are directional; record unresolved targets#37
jdubdevs wants to merge 1 commit into
devwhodevs:mainfrom
jdubdevs:fix/wikilink-indexer-bugs

Conversation

@jdubdevs
Copy link
Copy Markdown

Fixes #35 and #36 — two ingest-time bugs in build_edges_for_file (src/indexer.rs).

What was broken

Bug #35unresolved_links never populated

When resolve_link_target returned None, the code path silently dropped the unresolved target. The Store::insert_unresolved_link helper existed and worked but was never called from the index path.

Bug #36 — wikilinks stored as undirected edges

For each [[B]] in source A, the indexer wrote TWO directed edges: (A, B, 'wikilink') AND (B, A, 'wikilink'). The reverse edge was inserted regardless of B's actual content. Effect: outgoing-edge queries returned the file's actual wikilink targets PLUS every file that linked into it; asymmetric-link detection via NOT EXISTS (reverse edge) was structurally always-false.

The fix

pub fn build_edges_for_file(store: &Store, file_id: i64, content: &str) -> Result<()> {
    let source_path = match store.get_file_by_id(file_id)? {
        Some(f) => f.path,
        None => return Ok(()),
    };

    // Clear pre-existing unresolved entries for this source — supports
    // incremental re-index without stale accumulation.
    store.clear_unresolved_links_for_file(&source_path)?;

    let targets = extract_wikilink_targets(content);
    for target in targets {
        match resolve_link_target(store, &target)? {
            Some(target_id) if target_id != file_id => {
                store.insert_edge(file_id, target_id, "wikilink")?;
                // NOTE: do NOT insert reverse. Wikilinks are directional;
                // the reverse edge is inserted when build_edges_for_file
                // is called on the target file with its own content.
            }
            Some(_) => { /* self-link, ignore */ }
            None => {
                store.insert_unresolved_link(&source_path, &target)?;
            }
        }
    }
    Ok(())
}

Three changes:

  1. Resolve file_id → source_path upfront via get_file_by_id so we can pass source_file to insert_unresolved_link.
  2. Clear pre-existing unresolved_links for the source file before re-recording (supports incremental re-index).
  3. Insert ONE directed edge per resolved wikilink. Reverse edges only exist when the target file's own content contains a wikilink back — and that gets inserted naturally when build_edges_for_file is called on the target during the index pass.

Tests

Added 3 regression tests in src/indexer.rs::tests:

  • test_wikilink_edges_are_directional_not_bidirectional — A has [[B]], B has zero wikilinks. Verifies A→B exists, B→A does NOT, and B has 1 incoming from A. (Pre-fix, this test would fail: B→A would also exist.)

  • test_unresolved_wikilinks_are_recorded — A has one resolving and one broken wikilink. Verifies the broken target lands in unresolved_links with the correct source path. (Pre-fix, this test would fail: unresolved_links would be empty.)

  • test_unresolved_links_cleared_on_re_index — when source is re-indexed after the broken wikilink is removed from content, the stale unresolved entry is cleared. Tests the incremental-update path.

The existing test_edge_building_during_index continues to pass because that test happens to use files where both directions DO have wikilinks in source content — so the natural a-then-b indexing sequence still produces both directed edges correctly under the new behavior.

Verification

cargo test --lib
# 461 passed; 0 failed; 0 ignored

Also verified against a real 1,646-file Obsidian vault (~10K wikilinks, ~19K wikilink occurrences total):

Before (engraph 1.6.1 from brew):

SELECT COUNT(*) FROM unresolved_links;
-- 0  (despite ~118 known broken-wikilink occurrences)

-- coaching-approach.md has ZERO wikilinks to CLAUDE.md
SELECT f1.path, f2.path FROM edges e ... 
  WHERE f1.path LIKE '%coaching-approach%' AND f2.path = 'CLAUDE.md';
-- Returns 1 row anyway (fabricated reverse edge)

After (patched binary + engraph index --rebuild):

SELECT COUNT(*) FROM unresolved_links;
-- ~118  (broken wikilinks correctly recorded)

SELECT f1.path, f2.path FROM edges e ... 
  WHERE f1.path LIKE '%coaching-approach%' AND f2.path = 'CLAUDE.md';
-- Returns 0 rows (reverse edge no longer fabricated)

Compatibility notes

The schema is unchanged. Existing consumers of unresolved_links (get_unresolved_links, etc.) start receiving rows; consumers of edges see roughly half as many wikilink rows (the spurious reverse direction is gone). Health/lint tooling that hand-rolled wikilink-resolution to work around the bugs can now read from the canonical tables.

If any consumer was implicitly relying on the bidirectional storage (treating wikilink edges as adjacency), the recommended migration is to add a 'backlink' or 'wikilink_undirected' edge_type if that semantic is genuinely desired — separately from source-content wikilinks.

Closes devwhodevs#35, closes devwhodevs#36.

Two ingest-time bugs in `build_edges_for_file`:

1. (devwhodevs#35) unresolved_links table never populated. Indexer extracted
   wikilinks but the `Some(target_id)` branch was the only branch —
   `None` (target didn't resolve) was a silent drop. The helper
   `Store::insert_unresolved_link` existed and worked but was never
   called from the index path.

2. (devwhodevs#36) edges stored bidirectionally. For each `[[target]]` resolved
   in source A → target B, the indexer wrote TWO directed edges:
   `(A, B, 'wikilink')` AND `(B, A, 'wikilink')`. The reverse edge
   was inserted regardless of B's actual content. Effect: outgoing-
   edge queries returned each file's actual wikilink targets PLUS
   every file that linked INTO it; asymmetric-link detection via
   `NOT EXISTS (reverse edge)` was structurally always-false.

Fix:
- Resolve `file_id → source_path` via `get_file_by_id` so we can
  pass `source_file` to `insert_unresolved_link`.
- Clear pre-existing `unresolved_links` entries for the source file
  before re-recording — supports incremental re-index without stale
  entries accumulating.
- Insert exactly ONE directed edge `(file_id → target_id)` per resolved
  wikilink. The reverse edge is inserted (if it should exist) when
  `build_edges_for_file` is called on the target file with its own
  content — which is the natural sequence during a full index pass.
- Self-links (target_id == file_id) continue to be ignored.
- Unresolved targets land in `unresolved_links` via
  `insert_unresolved_link`.

Tests added in `src/indexer.rs::tests`:
- `test_wikilink_edges_are_directional_not_bidirectional` — A has
  `[[B]]`, B has no wikilinks; verifies A→B exists, B→A doesn't, and
  B has 1 incoming.
- `test_unresolved_wikilinks_are_recorded` — A has one resolving and
  one broken wikilink; verifies the broken target lands in
  `unresolved_links` with correct source.
- `test_unresolved_links_cleared_on_re_index` — when source is
  re-indexed with the broken wikilink removed, the prior unresolved
  entry is cleared (no stale accumulation).

The existing `test_edge_building_during_index` continues to pass
because that test happens to use files where both directions DO have
wikilinks in source content — so the natural a-then-b indexing sequence
still produces both directed edges correctly.

All 461 library tests pass (`cargo test --lib`). Verified against a
1,646-file Obsidian vault after `engraph index --rebuild` with the
patched binary: previously-empty `unresolved_links` table now populated,
previously-fabricated reverse edges no longer present.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

unresolved_links table is empty after engraph index despite known broken wikilinks

1 participant