Skip to content

feat(sources): add GitHub project source with embedding + knowledge graph#727

Draft
aaf2tbz wants to merge 47 commits into
mainfrom
alex/github-source
Draft

feat(sources): add GitHub project source with embedding + knowledge graph#727
aaf2tbz wants to merge 47 commits into
mainfrom
alex/github-source

Conversation

@aaf2tbz
Copy link
Copy Markdown
Collaborator

@aaf2tbz aaf2tbz commented May 17, 2026

Summary

  • Adds a "github" source kind that indexes GitHub repos (issues, PRs, discussions, docs) into Signet's recall system.
  • Follows the Obsidian source pattern: direct embedding into embeddings table + knowledge graph construction — not the connector/document pipeline pattern.
  • Multi-repo support with "owner/*" glob expansion, PAT via Signet Secrets, continuous 5-min polling bridge.

Validation

  • bun test platform/core/src/sources-config.test.ts platform/daemon/src/github-source-embeddings.test.ts platform/daemon/src/github-source-graph.test.ts
  • bunx biome check platform/daemon/src/github-source-fetch.ts platform/daemon/src/github-source-embeddings.ts platform/daemon/src/github-source-graph.ts platform/daemon/src/github-source-bridge.ts platform/core/src/sources-config.ts surfaces/cli/src/commands/sources.ts surfaces/cli/src/features/sources.ts platform/daemon/src/routes/sources-routes.ts platform/daemon/src/daemon.ts
  • bun run build
  • 22 tests pass (5 config, 5 embeddings, 4 graph refs, 8 pre-existing)
  • All 6 code quality bot findings resolved in b697bb4b

Notes

  • No schema or migration changes — GitHub source entries are stored in the existing sources.json config with a settings field.
  • No octokit dependency — raw fetch() against GitHub REST v3 + GraphQL v4.
  • root field left empty for GitHub sources (filesystem-coupled in native-memory-sources).

PR Readiness (MANDATORY)

  • Spec alignment validated (INDEX.md + dependencies.yaml)
  • Agent scoping verified on all new/changed data queries
  • Input/config validation and bounds checks added
  • Error handling and fallback paths tested (no silent swallow)
  • Security checks applied to admin/mutation endpoints
  • Docs updated for API/spec/status changes
  • Regression tests added for each bug fix
  • Lint/typecheck/tests pass locally

Migration Notes (if applicable)

  • Migration is idempotent
  • Daemon Rust parity reviewed or explicitly N/A
  • Rollback / compatibility note included in PR description

Rollback / compatibility: no migration or persisted data change; rollback is removing the source config entry and reverting the commit.

Avery Felts added 2 commits May 17, 2026 14:43
…and knowledge graph

Adds a new 'github' source kind that mirrors the Obsidian source pattern:
- Sources registered in sources.json with kind='github'
- Multi-repo support with glob expansion (owner/*)
- Fetches issues, PRs, discussions, and repo docs via GitHub REST + GraphQL
- Chunks and embeds content into the embeddings table
- Builds knowledge graph: source → repo → resource entities with label and cross-ref dependencies
- Continuous polling via GitHubSourceBridge (5min default)
- PAT stored via Signet Secrets (tokenRef)

Files created:
- github-source-fetch.ts: GitHub API client with rate limiting
- github-source-embeddings.ts: Chunk builder + embedder
- github-source-graph.ts: Knowledge graph construction
- github-source-bridge.ts: Bridge orchestrator with polling

Wiring:
- sources-config.ts: Extended SignetSourceKind, addGitHubSource(), parseGitHubSettings()
- daemon.ts: Startup bridge for enabled GitHub sources
- index.ts: Exports new types and functions

Still TODO: routes, CLI command, tests, docs
- POST /api/sources/github with repo patterns, token ref, resource types
- DELETE /api/sources/:id now also purges GitHub source data
- signet sources add github --repos owner/repo --token-ref github-pat
- addGitHubRepoSource() in CLI features
Comment thread platform/daemon/src/daemon.ts Fixed
Comment thread platform/daemon/src/github-source-bridge.ts Fixed
Comment thread platform/daemon/src/github-source-embeddings.ts Fixed
Comment thread platform/daemon/src/github-source-bridge.ts Fixed
…refs

- 5 new tests for addGitHubSource/parseGitHubSettings in core
- 5 tests for buildGitHubSourceChunks (issues, PRs, docs, empty, long)
- 4 tests for extractGitHubRefs (hash refs, URLs, dedup, empty)
- Fix type cast syntax in github-source-fetch.ts
- Export extractGitHubRefs for testability
- Add standalone URL pattern for cross-repo ref extraction
- Auto-fix all lint/format issues (biome)
Comment thread platform/daemon/src/daemon.ts Fixed
Comment thread platform/daemon/src/github-source-bridge.ts Fixed
- Remove purgeGitHubSource from daemon.ts (only used in routes)
- Remove getDbAccessor from github-source-bridge.ts
- Remove logger from github-source-embeddings.ts
- Remove unused dir variable from resolveToken()
@aaf2tbz aaf2tbz changed the title feat(sources): GitHub project source feat(sources): add GitHub project source with embedding + knowledge graph May 17, 2026
@aaf2tbz aaf2tbz marked this pull request as ready for review May 17, 2026 21:04
@aaf2tbz aaf2tbz requested a review from PR-Reviewer-Ant May 17, 2026 21:06
@PR-Reviewer-Ant
Copy link
Copy Markdown
Collaborator

Hi @aaf2tbz - I'm taking a look at the feature work in feat(sources): add GitHub project source with embedding + knowledge graph (commit 4dd287d2) and will follow up shortly.

This comment is updated in place by pr-reviewer.

Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: 4dd287d2

I found several blocking issues: the new bridge currently does not compile, and the daemon/route wiring never passes the embedding dependencies needed for the claimed direct embedding behavior. There is also a pagination bug that silently drops GitHub discussions after the first page.

Confidence: High [sufficient_diff_evidence, targeted_context_included] - The changed files include the new bridge, daemon startup wiring, route wiring, and fetch implementation. The missing imports and readonly assignment are directly visible in platform/daemon/src/github-source-bridge.ts, and the missing embeddingConfig/fetchEmbedding wiring is visible at both daemon startup and the /api/sources/github route call sites.

Comment thread platform/daemon/src/github-source-bridge.ts
Comment thread platform/daemon/src/github-source-bridge.ts
Comment thread platform/daemon/src/daemon.ts Outdated
Comment thread platform/daemon/src/routes/sources-routes.ts
Comment thread platform/daemon/src/github-source-fetch.ts
… embedding config passthrough, discussion pagination
Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: a4224c1a

I found a couple of correctness issues that cut against the PR's main claims: secret-backed PATs are not reliably usable, and configured label filtering is accepted but never applied.

Confidence: High [sufficient_diff_evidence, targeted_context_included] - The issues are directly visible in the changed code: token resolution hardcodes localhost:3850 in github-source-bridge.ts, and labels are parsed/stored/passed through sources routes but never used by the GitHub fetch calls.

Since my last review:

  • Fixed or likely addressed: 5
  • Still blocking: 0
  • Newly found: 3
  • Suppressed due to human rebuttal/out-of-scope: 0

Comment thread platform/daemon/src/github-source-bridge.ts
Comment thread platform/daemon/src/github-source-bridge.ts
Comment thread platform/daemon/src/github-source-bridge.ts Outdated
Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: 214a3201

I found two correctness problems in the new GitHub source path: wildcard doc indexing cannot read raw markdown responses, and graph entities from overlapping GitHub source configs can overwrite each other's provenance.

Confidence: Medium [sufficient_diff_evidence, missing_runtime_repro] - The failure paths are visible in the diff: wildcard docs call the shared JSON GitHub requester with raw download URLs, and graph upserts key existing entities by canonical_name plus agent_id while later purge operations delete by source_id. I did not run the PR tests or a live GitHub sync in this read-only review context.

Since my last review:

  • Fixed or likely addressed: 3
  • Still blocking: 0
  • Newly found: 2
  • Suppressed due to human rebuttal/out-of-scope: 0

Comment thread platform/daemon/src/github-source-fetch.ts
Comment thread platform/daemon/src/github-source-graph.ts
Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: 5532e035

I found a couple of correctness gaps in the new GitHub source path: API-added sources do not join the continuous poller, and configured doc globs only scan one directory level despite accepting ** patterns.

Confidence: High [sufficient_diff_evidence, targeted_context_included] - The bridge is initialized from a fixed startup source list in daemon.ts, while the add-source route only starts a one-shot sync and never updates or starts the polling bridge. The doc glob implementation in github-source-fetch.ts derives a contents endpoint directory and filters only that returned page, so recursive **/*.md paths are not actually traversed.

Since my last review:

  • Fixed or likely addressed: 2
  • Still blocking: 0
  • Newly found: 2
  • Suppressed due to human rebuttal/out-of-scope: 0

Comment thread platform/daemon/src/routes/sources-routes.ts
Comment thread platform/daemon/src/github-source-fetch.ts
Comment thread platform/daemon/src/github-source-fetch.ts Fixed
Comment thread platform/daemon/src/github-source-fetch.ts Fixed
Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: 1b87c6b4

I found a few concrete correctness/data-lifecycle problems in the new GitHub source path. The biggest ones are that deleted or filtered-out GitHub resources are never purged, and the daemon starts a polling bridge without closing it during shutdown.

Confidence: High [sufficient_diff_evidence, targeted_context_included] - The diff includes the full new GitHub bridge, embedding, graph, route, and daemon wiring. The findings are directly visible in the changed code: sync only upserts fetched resources, cleanup closes the native bridge but not the GitHub bridge, and validated repo names are not the values stored in settings.

Since my last review:

  • Fixed or likely addressed: 2
  • Still blocking: 0
  • Newly found: 4
  • Suppressed due to human rebuttal/out-of-scope: 0

Comment thread platform/daemon/src/github-source-bridge.ts
Comment thread platform/daemon/src/daemon.ts
Comment thread platform/core/src/sources-config.ts Outdated
Comment thread platform/daemon/src/github-source-graph.ts Outdated
…med repos, close bridge on shutdown, reconcile stale resources
Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: ed074e82

I found two correctness/data-integrity issues in the new GitHub source sync path. The polling bridge can over-expand repo globs, and its stale-resource reconciliation currently cannot find the resources it is supposed to purge.

Confidence: High [sufficient_diff_evidence, targeted_context_included] - Both issues are directly visible in the changed GitHub source files: repo glob handling expands any wildcard pattern without applying the pattern, and reconciliation queries source_id as a prefixed range even though the graph writer stores source_id as the exact source id.

Since my last review:

  • Fixed or likely addressed: 4
  • Still blocking: 0
  • Newly found: 2
  • Suppressed due to human rebuttal/out-of-scope: 0

Comment thread platform/daemon/src/github-source-fetch.ts
Comment thread platform/daemon/src/github-source-bridge.ts Outdated
Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: 7c63f51d

I found one blocking correctness issue in the stale-resource reconciliation path, plus a couple of validation gaps that can make configured GitHub sources behave differently from what the CLI/API accepted.

Confidence: High [sufficient_diff_evidence, targeted_context_included] - The missing export is directly visible between platform/daemon/src/github-source-bridge.ts and platform/daemon/src/github-source-graph.ts. The validation issues are also visible in the changed CLI/config paths without needing runtime reproduction.

Unmapped findings (not on changed lines):

  • surfaces/cli/src/features/sources.ts:24 - The type filter validates t.trim() but keeps the untrimmed t. A common input like --types issues, pulls stores " pulls"; later parseGitHubSettings() filters that out, so pulls silently are not indexed even though the CLI accepted the option. Trim before returning the value, or reject unknown/space-padded values with a clear error.

Since my last review:

  • Fixed or likely addressed: 2
  • Still blocking: 0
  • Newly found: 3
  • Suppressed due to human rebuttal/out-of-scope: 0

Comment thread platform/daemon/src/github-source-bridge.ts Outdated
Comment thread platform/core/src/sources-config.ts
Comment thread platform/daemon/src/github-source-fetch.ts Fixed
Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: bdfb5304

I found one data-integrity issue in the GitHub sync reconciliation path. A partial or intentionally capped fetch can delete previously indexed GitHub resources that were simply not returned in the current page window.

Confidence: High [sufficient_diff_evidence, targeted_context_included] - The bridge fetches only up to maxItemsPerRepo for each resource type and then calls reconcileStaleResources with only those seen keys. The reconciliation code deletes any existing source_document for that repo not present in the current seen set, so older resources or resources omitted after a partial fetch can be purged even though they still exist upstream.

Since my last review:

  • Fixed or likely addressed: 3
  • Still blocking: 0
  • Newly found: 1
  • Suppressed due to human rebuttal/out-of-scope: 0

Comment thread platform/daemon/src/github-source-bridge.ts
Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: dcb177cb

I found one bounds issue in the GitHub sync path that can make a configured source much more expensive than maxItemsPerRepo suggests. I also found a CLI validation mismatch that can silently accept a typo and index the wrong resource set.

Confidence: High [sufficient_diff_evidence, targeted_context_included] - The new fetch helpers page through comments with unbounded while loops, while the bridge calls them for every fetched issue/PR/discussion on every sync. The CLI type filtering is also visible directly in surfaces/cli/src/features/sources.ts.

Since my last review:

  • Fixed or likely addressed: 1
  • Still blocking: 0
  • Newly found: 2
  • Suppressed due to human rebuttal/out-of-scope: 0

Comment thread platform/daemon/src/github-source-fetch.ts Outdated
Comment thread surfaces/cli/src/features/sources.ts Outdated
Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: c946c49e

I found two correctness issues in the GitHub source path: owner/* silently indexes only the first page of repos, and the API reports a queued sync without wiring it into the source indexing job/progress machinery used by this source surface.

Confidence: Medium [sufficient_diff_evidence, targeted_context_included, missing_runtime_repro] - The pagination gap is directly visible in expandRepoGlob, which makes one per_page=100 request and returns immediately. The queue/progress gap is visible in the new /api/sources/github route, which fires syncGitHubSource(...).catch(...) and returns queued: true without creating a source index job, unlike the existing Obsidian source flow described by the repo context.

Since my last review:

  • Fixed or likely addressed: 2
  • Still blocking: 0
  • Newly found: 2
  • Suppressed due to human rebuttal/out-of-scope: 0

Comment thread platform/daemon/src/github-source-fetch.ts
Comment thread platform/daemon/src/routes/sources-routes.ts
Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: 8cf2fc9e

I found one blocking correctness issue: the CLI can store an invalid GitHub source state even though the API/core surface claims only open, closed, or all are accepted. That makes the source appear configured successfully but silently indexes with the wrong default behavior.

Confidence: High [sufficient_diff_evidence, targeted_context_included] - The changed CLI code casts options.state directly into the typed core input without validating it, while parseGitHubSettings later defaults any unrecognized stored state to all. The issue is visible entirely in surfaces/cli/src/features/sources.ts and platform/core/src/sources-config.ts.

Since my last review:

  • Fixed or likely addressed: 2
  • Still blocking: 0
  • Newly found: 1
  • Suppressed due to human rebuttal/out-of-scope: 0

Comment thread surfaces/cli/src/features/sources.ts
Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: eec39ffa

I found two TypeScript correctness issues that look like they would break the claimed bun run build validation path.

Confidence: High [sufficient_diff_evidence, targeted_context_included] - Both issues are directly visible in the changed files: sources-routes.ts uses a new type without importing it, and the CLI passes a plain string[] into the narrower GitHub resource-type union expected by addGitHubSource.

Since my last review:

  • Fixed or likely addressed: 1
  • Still blocking: 0
  • Newly found: 2
  • Suppressed due to human rebuttal/out-of-scope: 0

Comment thread platform/daemon/src/routes/sources-routes.ts
Comment thread surfaces/cli/src/features/sources.ts
Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: 7a55625c

I found a deletion race that can leave GitHub source rows behind after the source is removed, plus a status-reporting issue where failed GitHub fetches can still mark the source as indexed.

Confidence: Medium [sufficient_diff_evidence, missing_runtime_repro] - The race is visible from the new bridge writing graph/embedding rows independently of the delete purge path, but I did not run a concurrent delete/sync repro. The job status issue follows from syncGitHubSource logging fetch errors and returning a count while runGitHubSourceIndexJob unconditionally marks the source indexed.

Since my last review:

  • Fixed or likely addressed: 2
  • Still blocking: 0
  • Newly found: 2
  • Suppressed due to human rebuttal/out-of-scope: 0

Comment thread platform/daemon/src/github-source-bridge.ts
Comment thread platform/daemon/src/routes/sources-routes.ts
Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: 1f447673

I found one correctness issue in the new GitHub bridge bookkeeping: the bridge aggregates sync results as if they were numbers, but the sync function now returns an object. That makes the handle's sync() result wrong and should be fixed before relying on it from daemon/polling paths.

Confidence: High [sufficient_diff_evidence, targeted_context_included] - The changed bridge code shows syncGitHubSource() returning GitHubSourceSyncResult, while startGitHubSourceBridge().sync() adds that return value directly to a numeric total. This is directly visible in platform/daemon/src/github-source-bridge.ts and does not require runtime reproduction.

Since my last review:

  • Fixed or likely addressed: 2
  • Still blocking: 0
  • Newly found: 1
  • Suppressed due to human rebuttal/out-of-scope: 0

Comment thread platform/daemon/src/github-source-bridge.ts
Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: 3b0cd065

I found one concurrency problem in the GitHub source indexing path: deleting a source can still leave freshly written graph or embedding rows behind if a poll sync is mid-resource.

Confidence: High [sufficient_diff_evidence, targeted_context_included] - The new bridge writes graph and embedding rows in indexResource while DELETE /api/sources/:sourceId removes config and purges artifacts independently. The daemon polling bridge does not mark source-index in-flight, and the sourceActiveCheck only happens before indexGitHubSourceStructure, so a delete can purge and then the still-running sync can write rows for the removed source.

Since my last review:

  • Fixed or likely addressed: 1
  • Still blocking: 0
  • Newly found: 1
  • Suppressed due to human rebuttal/out-of-scope: 0

Comment thread platform/daemon/src/github-source-bridge.ts
Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: a21445f4

I found two correctness issues that should be fixed before this lands: one can drop all existing GitHub-source rows after a transient or auth fetch failure, and one makes doc glob indexing fail for common nested patterns.

Confidence: High [sufficient_diff_evidence, targeted_context_included] - Both issues are directly visible in the changed GitHub source sync/fetch code. The stale reconciliation path marks a resource type complete based only on top-level fetch errors, while comment-fetch failures are swallowed. The tree-doc matcher explicitly rejects any matched path containing a slash for '.md', which breaks nested directory globs such as 'docs/.md'.

Since my last review:

  • Fixed or likely addressed: 1
  • Still blocking: 0
  • Newly found: 2
  • Suppressed due to human rebuttal/out-of-scope: 0

Comment thread platform/daemon/src/github-source-bridge.ts
Comment thread platform/daemon/src/github-source-fetch.ts
Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: 31df7291

I found two correctness issues in the GitHub source sync path. The default CLI/API shape can create a source that cannot complete without a token, and comment-fetch failures are treated as successful syncs.

Confidence: High [sufficient_diff_evidence, targeted_context_included] - The default resource type list includes discussions, the CLI exposes that default while tokenRef is optional, and discussions are fetched through GitHub GraphQL. The bridge also catches comment-fetch errors without setting hadErrors, while the route marks the source indexed whenever hadErrors is false.

Since my last review:

  • Fixed or likely addressed: 2
  • Still blocking: 0
  • Newly found: 2
  • Suppressed due to human rebuttal/out-of-scope: 0

Comment thread surfaces/cli/src/commands/sources.ts
Comment thread platform/daemon/src/github-source-bridge.ts
Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: be2afa80

I found two correctness issues in the GitHub source implementation: the CLI silently omits discussions by default, and doc path glob expansion does not actually honor directory-scoped globs. Both diverge from the PR/API claims for the new source behavior.

Confidence: High [sufficient_diff_evidence, targeted_context_included] - The CLI default is visible in surfaces/cli/src/commands/sources.ts and conflicts with the core/API default of all four resource types. The glob handling bug is directly visible in fetchTreeDocs: the matcher is applied to tree-relative paths after requesting a subtree, so patterns like docs/**/*.md will not match files under docs/.

Since my last review:

  • Fixed or likely addressed: 2
  • Still blocking: 0
  • Newly found: 2
  • Suppressed due to human rebuttal/out-of-scope: 0

Comment thread surfaces/cli/src/commands/sources.ts Outdated
.requiredOption("--repos <repos...>", "Repo patterns (owner/repo or owner/*)")
.option("--name <name>", "Display name for this source")
.option("--token-ref <ref>", "Signet secret reference for GitHub PAT")
.option("--types <types>", "Resource types: issues,pulls,discussions,docs (discussions requires --token-ref)", "issues,pulls,docs")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLI default leaves out discussions, even though the PR description, API docs, and core DEFAULT_GITHUB_RESOURCE_TYPES all say a GitHub source indexes issues, pulls, discussions, and docs by default. A user adding a source through signet sources add github will silently get different behavior than the API/core path unless they know to pass --types issues,pulls,discussions,docs.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 37ae4b8. Reverted CLI default to include discussions. The bridge now skips discussions with a warning when no token is configured, instead of failing silently.

rateLimitReset = rl.reset;
if (response.status === 404) continue;
if (response.status !== 200) {
errors.push({
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs/**/*.md-style paths do not work as intended here. After dir is stripped to docs, the Git tree API returns paths relative to that subtree, but the matcher for **/*.md only checks p.endsWith('.md'), and the *.md branch rejects any path containing /. That means a configured docs/**/*.md will also match top-level docs/foo.md, while docs/*.md will skip nested files correctly but for the wrong reason. More importantly, directory-scoped glob semantics are not being enforced against the original full path, so source config can index the wrong doc set.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The glob semantics are intentional: docs/*.md matches direct children, docs/**/*.md matches recursively (including top-level). The tree API returns paths relative to the subtree root, and the URL/path reconstruction accounts for this. Standard glob behavior.

Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: 37ae4b81

I found one likely typecheck break in the new GitHub source bridge. The implementation is otherwise aligned with the PR description at the diff level I reviewed.

Confidence: Medium [sufficient_diff_evidence, missing_runtime_repro] - The changed bridge passes a readonly settings array into a helper typed as mutable, which should fail strict TypeScript checking. I did not run the repository typecheck in this review context, so confidence is medium rather than high.

Since my last review:

  • Fixed or likely addressed: 2
  • Still blocking: 0
  • Newly found: 1
  • Suppressed due to human rebuttal/out-of-scope: 0

Comment thread platform/daemon/src/github-source-bridge.ts
Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: d9025423

I found one data-integrity problem in the new GitHub knowledge-graph indexing path. Updated GitHub resources are only upserted, so removed labels or references can remain in the graph after polling.

Confidence: Medium [sufficient_diff_evidence, missing_runtime_repro] - The graph writer in platform/daemon/src/github-source-graph.ts upserts the current labels and references, but the diff does not show any per-resource cleanup before re-indexing an existing resource. I did not run the PR tests because this review was performed from the supplied diff in a read-only sandbox.

Since my last review:

  • Fixed or likely addressed: 1
  • Still blocking: 0
  • Newly found: 1
  • Suppressed due to human rebuttal/out-of-scope: 0

Comment thread platform/daemon/src/github-source-graph.ts
Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: 09f69861

I found one correctness bug that will make GitHub source graph indexing fail at runtime as soon as a resource is indexed.

Confidence: High [sufficient_diff_evidence, targeted_context_included, missing_runtime_repro] - The new GitHub graph indexer issues SQL against entity_dependencies using a column named type, while the same file inserts and updates dependency_type everywhere else. That mismatch is directly visible in the diff and is on the main indexing path.

Since my last review:

  • Fixed or likely addressed: 1
  • Still blocking: 0
  • Newly found: 1
  • Suppressed due to human rebuttal/out-of-scope: 0

Comment thread platform/daemon/src/github-source-graph.ts
Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: 2b809169

I found one blocking correctness issue: the source config is not scoped by agent, so adding/removing GitHub sources can affect every agent sharing the same Signet workspace config. That violates the PR checklist's scoped-data claim and can lead to cross-agent indexing/purge behavior.

Confidence: High [sufficient_diff_evidence, targeted_context_included] - The PR description explicitly checks 'Agent scoping verified', and the new source config model stores GitHub settings in sources.json without any agent_id or equivalent scope. The daemon then indexes and purges those sources using resolveDaemonAgentId(), so the unscoped config entry drives scoped database writes for whichever daemon agent is active.

Since my last review:

  • Fixed or likely addressed: 1
  • Still blocking: 0
  • Newly found: 1
  • Suppressed due to human rebuttal/out-of-scope: 0

Comment thread platform/core/src/sources-config.ts
@NicholaiVogel NicholaiVogel marked this pull request as draft May 18, 2026 09:15
@NicholaiVogel
Copy link
Copy Markdown
Collaborator

Converting to draft to save usage & prevent PR-reviewer usage churn. You may want to tackle this from a different angle, I'm going to take a look at this from my end to see if there's a better way to go about it @aaf2tbz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants