feat: sync from remote loop prevention for self-authored commits#22
feat: sync from remote loop prevention for self-authored commits#22JohnRDOrazio wants to merge 1 commit intodevfrom
Conversation
📝 WalkthroughWalkthroughAdded centralized OntoKit committer identity constants and updated the GitHub push webhook handler to ignore OntoKit-authored commits when deciding whether to enqueue upstream remote-sync checks; signatures for created commits were refactored to use the new constants and tests for loop-prevention were added. Changes
Sequence DiagramsequenceDiagram
actor GitHub as GitHub
participant Webhook as Webhook Handler
participant DB as RemoteSyncConfig Store
participant Queue as Task Queue
participant Sync as Upstream Check Worker
GitHub->>Webhook: push event (commits[])
Webhook->>Webhook: compute external_commits = commits - ONTOKIT_COMMITTER_EMAILS
alt external_commits is empty
Webhook->>Webhook: log "all commits from OntoKit" and skip
else external_commits has items
Webhook->>DB: fetch RemoteSyncConfig for repo/branch
DB-->>Webhook: sync_config
Webhook->>Webhook: compute touched_files from external_commits
alt sync_config.file_path in touched_files
Webhook->>Webhook: set PR status "checking"
Webhook->>Queue: enqueue run_remote_check_task (external_commits)
Queue->>Sync: worker processes external_commits only
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
tests/unit/test_upstream_loop_prevention.py (1)
26-30: Test helper duplicates production filtering logic.The
_filter_externalhelper reproduces the filtering expression frompull_requests.py. This is acceptable for unit test isolation, but consider adding a comment noting this must be kept in sync with the production code, or alternatively import and test the actual filtering function if it were extracted.def _filter_external(commits: list[dict[str, Any]]) -> list[dict[str, Any]]: - """Reproduce the filtering logic from the webhook handler.""" + """Reproduce the filtering logic from the webhook handler. + + NOTE: Keep in sync with ontokit/api/routes/pull_requests.py lines 669-673. + """ return [ c for c in commits if c.get("committer", {}).get("email") not in ONTOKIT_COMMITTER_EMAILS ]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/test_upstream_loop_prevention.py` around lines 26 - 30, The test helper _filter_external duplicates the production filtering expression used in pull_requests.py against ONTOKIT_COMMITTER_EMAILS; either add a short inline comment above _filter_external stating "Must be kept in sync with pull_requests.py filtering logic (uses ONTOKIT_COMMITTER_EMAILS)" or, preferably, import and use the actual production filtering function from pull_requests.py (or extract that logic into a shared function) so the test doesn't diverge—update the test to reference the production symbol instead of duplicating the expression.ontokit/services/github_sync.py (1)
194-199: Consider moving import to module level.The import is placed inside the function, which is typically done to avoid circular imports. However,
ontokit.core.constantshas no dependencies that would cause a cycle here. Moving it to the top of the file (alongside other imports) would be more conventional and slightly improve readability.♻️ Suggested refactor
Add at the top of the file with other imports:
from ontokit.core.constants import ( ONTOKIT_SYNC_COMMITTER_EMAIL, ONTOKIT_SYNC_COMMITTER_NAME, )Then simplify lines 194-199:
local_commit = cast(pygit2.Commit, repo.get(local_oid)) remote_commit = cast(pygit2.Commit, repo.get(remote_oid)) - from ontokit.core.constants import ( - ONTOKIT_SYNC_COMMITTER_EMAIL, - ONTOKIT_SYNC_COMMITTER_NAME, - ) - sig = pygit2.Signature(ONTOKIT_SYNC_COMMITTER_NAME, ONTOKIT_SYNC_COMMITTER_EMAIL)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ontokit/services/github_sync.py` around lines 194 - 199, Move the local import of ONTOKIT_SYNC_COMMITTER_NAME and ONTOKIT_SYNC_COMMITTER_EMAIL out of the function and add it to the module-level imports at the top of ontokit/services/github_sync.py (alongside other imports); then remove the inner "from ontokit.core.constants import (...)" block and leave the pygit2.Signature(...) call (sig = pygit2.Signature(ONTOKIT_SYNC_COMMITTER_NAME, ONTOKIT_SYNC_COMMITTER_EMAIL)) as-is so it uses the module-level constants.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ontokit/api/routes/pull_requests.py`:
- Around line 702-708: The code sets sync_config.status = "checking" and
commits, then calls get_arq_pool() and pool.enqueue_job() without verifying pool
or handling enqueue failures; wrap the enqueue in a try/except and first check
that pool is not None (from get_arq_pool()); if pool is None, revert
sync_config.status (or set to an error state) and commit the change, and if
pool.enqueue_job() raises, catch the exception, update sync_config.status back
to the previous state or an "error" state and commit/rollback via service.db to
avoid leaving it stuck in "checking", and re-raise or log the exception as
appropriate (refer to get_arq_pool, sync_config.status, service.db.commit, and
pool.enqueue_job to locate the lines).
---
Nitpick comments:
In `@ontokit/services/github_sync.py`:
- Around line 194-199: Move the local import of ONTOKIT_SYNC_COMMITTER_NAME and
ONTOKIT_SYNC_COMMITTER_EMAIL out of the function and add it to the module-level
imports at the top of ontokit/services/github_sync.py (alongside other imports);
then remove the inner "from ontokit.core.constants import (...)" block and leave
the pygit2.Signature(...) call (sig =
pygit2.Signature(ONTOKIT_SYNC_COMMITTER_NAME, ONTOKIT_SYNC_COMMITTER_EMAIL))
as-is so it uses the module-level constants.
In `@tests/unit/test_upstream_loop_prevention.py`:
- Around line 26-30: The test helper _filter_external duplicates the production
filtering expression used in pull_requests.py against ONTOKIT_COMMITTER_EMAILS;
either add a short inline comment above _filter_external stating "Must be kept
in sync with pull_requests.py filtering logic (uses ONTOKIT_COMMITTER_EMAILS)"
or, preferably, import and use the actual production filtering function from
pull_requests.py (or extract that logic into a shared function) so the test
doesn't diverge—update the test to reference the production symbol instead of
duplicating the expression.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 873d3b8d-bfea-4ef3-9593-537eace9fbaa
📒 Files selected for processing (5)
ontokit/api/routes/pull_requests.pyontokit/core/constants.pyontokit/git/bare_repository.pyontokit/services/github_sync.pytests/unit/test_upstream_loop_prevention.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ontokit/api/routes/pull_requests.py`:
- Around line 695-700: The current check builds touched_files from
commit.get("added", []) and commit.get("modified", []) and therefore misses
deletions; update the loop that aggregates touched_files (where external_commits
is iterated and touched_files is updated) to also include commit.get("removed",
[]) so that deletions of sync_config.file_path will trigger the sync check
(ensure the same set[str] touched_files variable and the if
sync_config.file_path in touched_files condition are used).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e24f2fd3-eeda-43d3-bac5-436db8434acd
📒 Files selected for processing (3)
ontokit/api/routes/pull_requests.pyontokit/core/constants.pyontokit/git/bare_repository.py
✅ Files skipped from review due to trivial changes (1)
- ontokit/core/constants.py
| touched_files: set[str] = set() | ||
| for commit in external_commits: | ||
| touched_files.update(commit.get("added", [])) | ||
| touched_files.update(commit.get("modified", [])) | ||
|
|
||
| if sync_config.file_path in touched_files: |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, locate and check the file
git ls-files | grep -E "pull_requests\.py"Repository: CatholicOS/ontokit-api
Length of output: 175
🏁 Script executed:
# Check the file size first
wc -l ontokit/api/routes/pull_requests.pyRepository: CatholicOS/ontokit-api
Length of output: 105
🏁 Script executed:
# Read the specific lines and surrounding context
sed -n '685,710p' ontokit/api/routes/pull_requests.pyRepository: CatholicOS/ontokit-api
Length of output: 1351
🏁 Script executed:
# Look at broader context - find where external_commits is defined and what triggers this code
sed -n '650,720p' ontokit/api/routes/pull_requests.pyRepository: CatholicOS/ontokit-api
Length of output: 3264
🏁 Script executed:
# Search for other references to "removed" in the file to understand the context better
grep -n "removed" ontokit/api/routes/pull_requests.pyRepository: CatholicOS/ontokit-api
Length of output: 48
🏁 Script executed:
# Search the entire codebase for usage of "removed" key in commits
rg "removed" --type py -B 2 -A 2Repository: CatholicOS/ontokit-api
Length of output: 16306
🏁 Script executed:
# Check GitHub webhook structure - search for any documentation or references
rg "commit.*get\(.*removed" --type pyRepository: CatholicOS/ontokit-api
Length of output: 48
🌐 Web query:
GitHub webhook push event commit object structure "removed" field
💡 Result:
In a GitHub “push” webhook delivery (the push event sent to your webhook URL), each item in the top-level commits array is a “commit summary” object that can include these file lists:
{
"id": "…",
"message": "…",
"timestamp": "…",
"url": "…",
"author": { "name": "…", "email": "…" },
"committer": { "name": "…", "email": "…" },
"added": ["path/created.ext", "..."],
"modified": ["path/changed.ext", "..."],
"removed": ["path/deleted.ext", "..."]
}What removed means: it’s an array of repository-relative file paths that were deleted by that commit (as represented in the push payload’s commit summaries). [1]
Important caveat (common source of confusion): if you’re looking at the push event payload inside GitHub Actions (github.event / event.json), GitHub explicitly does not include the added, removed, or modified arrays in the commit objects there; you must call the API to fetch file-level details. [2][3]
Sources:
[1] Push event payload example showing removed in commit objects: GitHub Events API docs mirror (PushEvent example payload) (docs2.lfe.io)
[2] GitHub Actions docs noting added/removed/modified are not present in Actions push payload: (docs.github.com)
[3] GitHub changelog describing removal of those fields from the Actions push payload: (github.blog)
Citations:
- 1: https://docs2.lfe.io/v3/activity/events/types/?utm_source=openai
- 2: https://docs.github.com/actions/reference/events-that-trigger-workflows?utm_source=openai
- 3: https://github.blog/changelog/2019-10-16-changes-in-github-actions-push-event-payload?utm_source=openai
🏁 Script executed:
# Search for tests related to webhook push handling
rg "webhook.*push\|push.*webhook" --type py -i -A 5Repository: CatholicOS/ontokit-api
Length of output: 48
🏁 Script executed:
# Look for tests of the touched_files logic
rg "touched_files\|external_commits" --type py -B 3 -A 3Repository: CatholicOS/ontokit-api
Length of output: 48
🏁 Script executed:
# Check if there are any tests in test_pull_requests or test_webhooks
fd -e py "test.*webhook\|test.*pull_request" tests/Repository: CatholicOS/ontokit-api
Length of output: 48
🏁 Script executed:
# List test files to understand test coverage
find tests -name "*.py" -type f | head -20Repository: CatholicOS/ontokit-api
Length of output: 693
🏁 Script executed:
# Search for RemoteSyncConfig model to understand the data structure
rg "class RemoteSyncConfig" --type py -A 15Repository: CatholicOS/ontokit-api
Length of output: 4149
🏁 Script executed:
# Check if there are any existing comments or logic about file deletion handling
rg "delete\|removed" ontokit/api/routes/pull_requests.py -B 2 -A 2Repository: CatholicOS/ontokit-api
Length of output: 48
🏁 Script executed:
# Search for run_remote_check_task to understand what happens after sync is triggered
rg "run_remote_check_task" --type py -B 2 -A 5Repository: CatholicOS/ontokit-api
Length of output: 8190
🏁 Script executed:
# Look at the broader context of the webhook handler to confirm understanding
sed -n '640,730p' ontokit/api/routes/pull_requests.pyRepository: CatholicOS/ontokit-api
Length of output: 3765
🏁 Script executed:
# Check test_pull_requests_routes.py for webhook push tests
grep -n "test_push\|external_commits" tests/unit/test_pull_requests_routes.pyRepository: CatholicOS/ontokit-api
Length of output: 442
🏁 Script executed:
# Look at the actual test to see what scenarios are covered
sed -n '1,100p' tests/unit/test_pull_requests_routes.pyRepository: CatholicOS/ontokit-api
Length of output: 2348
🏁 Script executed:
# Look at the specific test cases for push webhook handling
sed -n '525,650p' tests/unit/test_pull_requests_routes.pyRepository: CatholicOS/ontokit-api
Length of output: 4855
🏁 Script executed:
# Look for any other tests related to file deletion in webhooks
grep -n "removed\|delete" tests/unit/test_pull_requests_routes.pyRepository: CatholicOS/ontokit-api
Length of output: 587
Include deleted files when checking whether the tracked file changed.
touched_files only aggregates added and modified commits. If an external push deletes sync_config.file_path, the sync check is never triggered, causing webhook-based sync to miss legitimate upstream changes.
🔧 Suggested fix
touched_files: set[str] = set()
for commit in external_commits:
- touched_files.update(commit.get("added", []))
- touched_files.update(commit.get("modified", []))
+ touched_files.update(commit.get("added") or [])
+ touched_files.update(commit.get("modified") or [])
+ touched_files.update(commit.get("removed") or [])🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ontokit/api/routes/pull_requests.py` around lines 695 - 700, The current
check builds touched_files from commit.get("added", []) and
commit.get("modified", []) and therefore misses deletions; update the loop that
aggregates touched_files (where external_commits is iterated and touched_files
is updated) to also include commit.get("removed", []) so that deletions of
sync_config.file_path will trigger the sync check (ensure the same set[str]
touched_files variable and the if sync_config.file_path in touched_files
condition are used).
Filter webhook push events by committer email to prevent feedback loops when a project uses both GitHub Integration (push) and Upstream Source Tracking (pull) on the same repository. OntoKit-authored commits are detected via centralized committer identity constants and skipped before enqueuing unnecessary upstream check tasks. Closes #21 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
f1607e7 to
667f6ff
Compare
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Summary
noreply@ontokit.dev,sync@ontokit.dev) inontokit/core/constants.pybare_repository.pyandgithub_sync.pyto use centralized constants instead of hardcoded stringsDesign
Follows the Weblate VCS model for loop prevention:
Cross-references
Test plan
tests/unit/test_upstream_loop_prevention.py)🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
Tests