Skip to content

feat: sync from remote loop prevention for self-authored commits#22

Open
JohnRDOrazio wants to merge 1 commit intodevfrom
feat/upstream-sync-loop-prevention
Open

feat: sync from remote loop prevention for self-authored commits#22
JohnRDOrazio wants to merge 1 commit intodevfrom
feat/upstream-sync-loop-prevention

Conversation

@JohnRDOrazio
Copy link
Copy Markdown
Member

@JohnRDOrazio JohnRDOrazio commented Apr 1, 2026

Summary

  • Centralize OntoKit committer identity constants (noreply@ontokit.dev, sync@ontokit.dev) in ontokit/core/constants.py
  • Filter webhook push events by committer email to skip upstream sync checks when all commits are OntoKit-authored
  • Update bare_repository.py and github_sync.py to use centralized constants instead of hardcoded strings
  • Add unit tests covering all filtering edge cases (all-OntoKit, all-external, mixed, empty, malformed)

Design

Follows the Weblate VCS model for loop prevention:

  1. Webhook-level filtering (active) — Before enqueuing an upstream check task, the push webhook handler checks if all commits were made by OntoKit. If so, the check is skipped entirely with an info log.
  2. Content idempotency (passive, existing) — The upstream check task already compares file content. Even if a round-tripped commit slips through, identical content means no sync event is created.
  3. Distinctive committer identity (existing) — OntoKit uses recognizable committer emails, now centralized as constants for consistency.

Cross-references

Test plan

  • All 8 unit tests pass (tests/unit/test_upstream_loop_prevention.py)
  • Ruff lint + format pass
  • Mypy strict mode passes
  • Integration test: push from OntoKit → webhook → verify upstream check is skipped
  • Integration test: external push → webhook → verify upstream check is triggered

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Prevented upstream sync loops by excluding system-generated commits from triggering remote sync checks; remote sync decisions now consider only external commits.
  • Tests

    • Added unit tests covering loop-prevention, commit-source identification, touched-file extraction from external commits, and edge-case handling.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 1, 2026

📝 Walkthrough

Walkthrough

Added centralized OntoKit committer identity constants and updated the GitHub push webhook handler to ignore OntoKit-authored commits when deciding whether to enqueue upstream remote-sync checks; signatures for created commits were refactored to use the new constants and tests for loop-prevention were added.

Changes

Cohort / File(s) Summary
Constants Definition
ontokit/core/constants.py
New module declaring OntoKit committer name/email constants and ONTOKIT_COMMITTER_EMAILS frozenset.
Webhook Commit Filtering
ontokit/api/routes/pull_requests.py
Push webhook now builds external_commits by excluding commits whose committer.email is in ONTOKIT_COMMITTER_EMAILS; skips remote-sync fetch/queue when none remain and computes touched files from external commits only.
Signature Usage Refactor
ontokit/git/bare_repository.py, ontokit/services/github_sync.py
Replaced hardcoded committer name/email with imports from ontokit.core.constants when constructing pygit2 signatures for commits.
Unit Tests
tests/unit/test_upstream_loop_prevention.py
New tests validating the constants, commit-filtering behavior, edge cases (empty/malformed commits), and that touched-file computation uses only external commits.

Sequence Diagram

sequenceDiagram
    actor GitHub as GitHub
    participant Webhook as Webhook Handler
    participant DB as RemoteSyncConfig Store
    participant Queue as Task Queue
    participant Sync as Upstream Check Worker

    GitHub->>Webhook: push event (commits[])
    Webhook->>Webhook: compute external_commits = commits - ONTOKIT_COMMITTER_EMAILS
    alt external_commits is empty
        Webhook->>Webhook: log "all commits from OntoKit" and skip
    else external_commits has items
        Webhook->>DB: fetch RemoteSyncConfig for repo/branch
        DB-->>Webhook: sync_config
        Webhook->>Webhook: compute touched_files from external_commits
        alt sync_config.file_path in touched_files
            Webhook->>Webhook: set PR status "checking"
            Webhook->>Queue: enqueue run_remote_check_task (external_commits)
            Queue->>Sync: worker processes external_commits only
        end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰 I hop through commits, soft and spry,
I skip my own footprints as I fly,
Two emails kept in a tidy set,
No loops, no echoes — no double-bet,
Cheers to clean syncs and a sky so high!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed All coding requirements from issue #21 are met: centralized committer constants [ontokit/core/constants.py], webhook filtering logic for OntoKit commits [ontokit/api/routes/pull_requests.py], constant usage updates [bare_repository.py, github_sync.py], and comprehensive unit tests [test_upstream_loop_prevention.py].
Out of Scope Changes check ✅ Passed All changes are directly aligned with loop prevention objectives: centralizing committer constants, filtering webhook commits by email, updating signature usage, and adding targeted tests. No unrelated modifications detected.
Docstring Coverage ✅ Passed Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.
Title check ✅ Passed The title 'feat: sync from remote loop prevention for self-authored commits' accurately describes the main change: implementing loop prevention for upstream sync triggered by OntoKit's own commits.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/upstream-sync-loop-prevention

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
tests/unit/test_upstream_loop_prevention.py (1)

26-30: Test helper duplicates production filtering logic.

The _filter_external helper reproduces the filtering expression from pull_requests.py. This is acceptable for unit test isolation, but consider adding a comment noting this must be kept in sync with the production code, or alternatively import and test the actual filtering function if it were extracted.

 def _filter_external(commits: list[dict[str, Any]]) -> list[dict[str, Any]]:
-    """Reproduce the filtering logic from the webhook handler."""
+    """Reproduce the filtering logic from the webhook handler.
+    
+    NOTE: Keep in sync with ontokit/api/routes/pull_requests.py lines 669-673.
+    """
     return [
         c for c in commits if c.get("committer", {}).get("email") not in ONTOKIT_COMMITTER_EMAILS
     ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/test_upstream_loop_prevention.py` around lines 26 - 30, The test
helper _filter_external duplicates the production filtering expression used in
pull_requests.py against ONTOKIT_COMMITTER_EMAILS; either add a short inline
comment above _filter_external stating "Must be kept in sync with
pull_requests.py filtering logic (uses ONTOKIT_COMMITTER_EMAILS)" or,
preferably, import and use the actual production filtering function from
pull_requests.py (or extract that logic into a shared function) so the test
doesn't diverge—update the test to reference the production symbol instead of
duplicating the expression.
ontokit/services/github_sync.py (1)

194-199: Consider moving import to module level.

The import is placed inside the function, which is typically done to avoid circular imports. However, ontokit.core.constants has no dependencies that would cause a cycle here. Moving it to the top of the file (alongside other imports) would be more conventional and slightly improve readability.

♻️ Suggested refactor

Add at the top of the file with other imports:

from ontokit.core.constants import (
    ONTOKIT_SYNC_COMMITTER_EMAIL,
    ONTOKIT_SYNC_COMMITTER_NAME,
)

Then simplify lines 194-199:

         local_commit = cast(pygit2.Commit, repo.get(local_oid))
         remote_commit = cast(pygit2.Commit, repo.get(remote_oid))
-        from ontokit.core.constants import (
-            ONTOKIT_SYNC_COMMITTER_EMAIL,
-            ONTOKIT_SYNC_COMMITTER_NAME,
-        )
-
         sig = pygit2.Signature(ONTOKIT_SYNC_COMMITTER_NAME, ONTOKIT_SYNC_COMMITTER_EMAIL)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ontokit/services/github_sync.py` around lines 194 - 199, Move the local
import of ONTOKIT_SYNC_COMMITTER_NAME and ONTOKIT_SYNC_COMMITTER_EMAIL out of
the function and add it to the module-level imports at the top of
ontokit/services/github_sync.py (alongside other imports); then remove the inner
"from ontokit.core.constants import (...)" block and leave the
pygit2.Signature(...) call (sig = pygit2.Signature(ONTOKIT_SYNC_COMMITTER_NAME,
ONTOKIT_SYNC_COMMITTER_EMAIL)) as-is so it uses the module-level constants.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ontokit/api/routes/pull_requests.py`:
- Around line 702-708: The code sets sync_config.status = "checking" and
commits, then calls get_arq_pool() and pool.enqueue_job() without verifying pool
or handling enqueue failures; wrap the enqueue in a try/except and first check
that pool is not None (from get_arq_pool()); if pool is None, revert
sync_config.status (or set to an error state) and commit the change, and if
pool.enqueue_job() raises, catch the exception, update sync_config.status back
to the previous state or an "error" state and commit/rollback via service.db to
avoid leaving it stuck in "checking", and re-raise or log the exception as
appropriate (refer to get_arq_pool, sync_config.status, service.db.commit, and
pool.enqueue_job to locate the lines).

---

Nitpick comments:
In `@ontokit/services/github_sync.py`:
- Around line 194-199: Move the local import of ONTOKIT_SYNC_COMMITTER_NAME and
ONTOKIT_SYNC_COMMITTER_EMAIL out of the function and add it to the module-level
imports at the top of ontokit/services/github_sync.py (alongside other imports);
then remove the inner "from ontokit.core.constants import (...)" block and leave
the pygit2.Signature(...) call (sig =
pygit2.Signature(ONTOKIT_SYNC_COMMITTER_NAME, ONTOKIT_SYNC_COMMITTER_EMAIL))
as-is so it uses the module-level constants.

In `@tests/unit/test_upstream_loop_prevention.py`:
- Around line 26-30: The test helper _filter_external duplicates the production
filtering expression used in pull_requests.py against ONTOKIT_COMMITTER_EMAILS;
either add a short inline comment above _filter_external stating "Must be kept
in sync with pull_requests.py filtering logic (uses ONTOKIT_COMMITTER_EMAILS)"
or, preferably, import and use the actual production filtering function from
pull_requests.py (or extract that logic into a shared function) so the test
doesn't diverge—update the test to reference the production symbol instead of
duplicating the expression.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 873d3b8d-bfea-4ef3-9593-537eace9fbaa

📥 Commits

Reviewing files that changed from the base of the PR and between b722d11 and 838e310.

📒 Files selected for processing (5)
  • ontokit/api/routes/pull_requests.py
  • ontokit/core/constants.py
  • ontokit/git/bare_repository.py
  • ontokit/services/github_sync.py
  • tests/unit/test_upstream_loop_prevention.py

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ontokit/api/routes/pull_requests.py`:
- Around line 695-700: The current check builds touched_files from
commit.get("added", []) and commit.get("modified", []) and therefore misses
deletions; update the loop that aggregates touched_files (where external_commits
is iterated and touched_files is updated) to also include commit.get("removed",
[]) so that deletions of sync_config.file_path will trigger the sync check
(ensure the same set[str] touched_files variable and the if
sync_config.file_path in touched_files condition are used).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e24f2fd3-eeda-43d3-bac5-436db8434acd

📥 Commits

Reviewing files that changed from the base of the PR and between 838e310 and f1607e7.

📒 Files selected for processing (3)
  • ontokit/api/routes/pull_requests.py
  • ontokit/core/constants.py
  • ontokit/git/bare_repository.py
✅ Files skipped from review due to trivial changes (1)
  • ontokit/core/constants.py

Comment on lines +695 to +700
touched_files: set[str] = set()
for commit in external_commits:
touched_files.update(commit.get("added", []))
touched_files.update(commit.get("modified", []))

if sync_config.file_path in touched_files:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, locate and check the file
git ls-files | grep -E "pull_requests\.py"

Repository: CatholicOS/ontokit-api

Length of output: 175


🏁 Script executed:

# Check the file size first
wc -l ontokit/api/routes/pull_requests.py

Repository: CatholicOS/ontokit-api

Length of output: 105


🏁 Script executed:

# Read the specific lines and surrounding context
sed -n '685,710p' ontokit/api/routes/pull_requests.py

Repository: CatholicOS/ontokit-api

Length of output: 1351


🏁 Script executed:

# Look at broader context - find where external_commits is defined and what triggers this code
sed -n '650,720p' ontokit/api/routes/pull_requests.py

Repository: CatholicOS/ontokit-api

Length of output: 3264


🏁 Script executed:

# Search for other references to "removed" in the file to understand the context better
grep -n "removed" ontokit/api/routes/pull_requests.py

Repository: CatholicOS/ontokit-api

Length of output: 48


🏁 Script executed:

# Search the entire codebase for usage of "removed" key in commits
rg "removed" --type py -B 2 -A 2

Repository: CatholicOS/ontokit-api

Length of output: 16306


🏁 Script executed:

# Check GitHub webhook structure - search for any documentation or references
rg "commit.*get\(.*removed" --type py

Repository: CatholicOS/ontokit-api

Length of output: 48


🌐 Web query:

GitHub webhook push event commit object structure "removed" field

💡 Result:

In a GitHub “push” webhook delivery (the push event sent to your webhook URL), each item in the top-level commits array is a “commit summary” object that can include these file lists:

{
  "id": "",
  "message": "",
  "timestamp": "",
  "url": "",
  "author": { "name": "", "email": "" },
  "committer": { "name": "", "email": "" },

  "added":    ["path/created.ext", "..."],
  "modified": ["path/changed.ext", "..."],
  "removed":  ["path/deleted.ext", "..."]
}

What removed means: it’s an array of repository-relative file paths that were deleted by that commit (as represented in the push payload’s commit summaries). [1]

Important caveat (common source of confusion): if you’re looking at the push event payload inside GitHub Actions (github.event / event.json), GitHub explicitly does not include the added, removed, or modified arrays in the commit objects there; you must call the API to fetch file-level details. [2][3]

Sources:
[1] Push event payload example showing removed in commit objects: GitHub Events API docs mirror (PushEvent example payload) (docs2.lfe.io)
[2] GitHub Actions docs noting added/removed/modified are not present in Actions push payload: (docs.github.com)
[3] GitHub changelog describing removal of those fields from the Actions push payload: (github.blog)

Citations:


🏁 Script executed:

# Search for tests related to webhook push handling
rg "webhook.*push\|push.*webhook" --type py -i -A 5

Repository: CatholicOS/ontokit-api

Length of output: 48


🏁 Script executed:

# Look for tests of the touched_files logic
rg "touched_files\|external_commits" --type py -B 3 -A 3

Repository: CatholicOS/ontokit-api

Length of output: 48


🏁 Script executed:

# Check if there are any tests in test_pull_requests or test_webhooks
fd -e py "test.*webhook\|test.*pull_request" tests/

Repository: CatholicOS/ontokit-api

Length of output: 48


🏁 Script executed:

# List test files to understand test coverage
find tests -name "*.py" -type f | head -20

Repository: CatholicOS/ontokit-api

Length of output: 693


🏁 Script executed:

# Search for RemoteSyncConfig model to understand the data structure
rg "class RemoteSyncConfig" --type py -A 15

Repository: CatholicOS/ontokit-api

Length of output: 4149


🏁 Script executed:

# Check if there are any existing comments or logic about file deletion handling
rg "delete\|removed" ontokit/api/routes/pull_requests.py -B 2 -A 2

Repository: CatholicOS/ontokit-api

Length of output: 48


🏁 Script executed:

# Search for run_remote_check_task to understand what happens after sync is triggered
rg "run_remote_check_task" --type py -B 2 -A 5

Repository: CatholicOS/ontokit-api

Length of output: 8190


🏁 Script executed:

# Look at the broader context of the webhook handler to confirm understanding
sed -n '640,730p' ontokit/api/routes/pull_requests.py

Repository: CatholicOS/ontokit-api

Length of output: 3765


🏁 Script executed:

# Check test_pull_requests_routes.py for webhook push tests
grep -n "test_push\|external_commits" tests/unit/test_pull_requests_routes.py

Repository: CatholicOS/ontokit-api

Length of output: 442


🏁 Script executed:

# Look at the actual test to see what scenarios are covered
sed -n '1,100p' tests/unit/test_pull_requests_routes.py

Repository: CatholicOS/ontokit-api

Length of output: 2348


🏁 Script executed:

# Look at the specific test cases for push webhook handling
sed -n '525,650p' tests/unit/test_pull_requests_routes.py

Repository: CatholicOS/ontokit-api

Length of output: 4855


🏁 Script executed:

# Look for any other tests related to file deletion in webhooks
grep -n "removed\|delete" tests/unit/test_pull_requests_routes.py

Repository: CatholicOS/ontokit-api

Length of output: 587


Include deleted files when checking whether the tracked file changed.

touched_files only aggregates added and modified commits. If an external push deletes sync_config.file_path, the sync check is never triggered, causing webhook-based sync to miss legitimate upstream changes.

🔧 Suggested fix
                     touched_files: set[str] = set()
                     for commit in external_commits:
-                        touched_files.update(commit.get("added", []))
-                        touched_files.update(commit.get("modified", []))
+                        touched_files.update(commit.get("added") or [])
+                        touched_files.update(commit.get("modified") or [])
+                        touched_files.update(commit.get("removed") or [])
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ontokit/api/routes/pull_requests.py` around lines 695 - 700, The current
check builds touched_files from commit.get("added", []) and
commit.get("modified", []) and therefore misses deletions; update the loop that
aggregates touched_files (where external_commits is iterated and touched_files
is updated) to also include commit.get("removed", []) so that deletions of
sync_config.file_path will trigger the sync check (ensure the same set[str]
touched_files variable and the if sync_config.file_path in touched_files
condition are used).

@JohnRDOrazio JohnRDOrazio changed the base branch from main to dev April 9, 2026 02:29
Filter webhook push events by committer email to prevent feedback loops
when a project uses both GitHub Integration (push) and Upstream Source
Tracking (pull) on the same repository. OntoKit-authored commits are
detected via centralized committer identity constants and skipped before
enqueuing unnecessary upstream check tasks.

Closes #21

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@JohnRDOrazio JohnRDOrazio force-pushed the feat/upstream-sync-loop-prevention branch from f1607e7 to 667f6ff Compare April 9, 2026 02:37
@JohnRDOrazio JohnRDOrazio changed the title feat: upstream sync loop prevention for self-authored commits feat: sync from remote loop prevention for self-authored commits Apr 9, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 9, 2026

Codecov Report

❌ Patch coverage is 93.75000% with 2 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
ontokit/api/routes/pull_requests.py 91.66% 1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement upstream sync loop prevention (filter OntoKit-authored commits)

1 participant