Skip to content

feat: AUTH_MODE, anonymous suggestions, seed script, and index fix#27

Open
damienriehl wants to merge 9 commits intodevfrom
feat/seed-project-script
Open

feat: AUTH_MODE, anonymous suggestions, seed script, and index fix#27
damienriehl wants to merge 9 commits intodevfrom
feat/seed-project-script

Conversation

@damienriehl
Copy link
Copy Markdown
Contributor

@damienriehl damienriehl commented Apr 3, 2026

Summary

AUTH_MODE (Phase 8)

  • AUTH_MODE=required|optional|disabled in config + auth module
  • ANONYMOUS_USER returned in disabled mode, existing behavior in required/optional
  • 11 tests covering all three modes

Anonymous Suggestions (Phase 10)

  • ontokit/core/anonymous_token.py — HMAC-signed tokens with 24h TTL
  • ontokit/api/routes/anonymous_suggestions.py — 5 endpoints (create, save, submit, discard, beacon) gated on AUTH_MODE != required
  • Rate limiting: 5 sessions per IP per hour
  • Honeypot validation: silent fake-success when bot fills hidden field
  • suggestion_sessions model extended with is_anonymous, submitter_name, submitter_email, client_ip
  • Alembic migration for new columns

Seed Script

  • scripts/seed-project.py — import OWL files with direct PostgreSQL indexing
  • Supports URL/file import, upstream sync config, index-only rebuild

Index Fix

  • ontology_index.py — include skos:altLabel, skos:prefLabel as annotations in class detail
  • Previously translations/synonyms were excluded from both labels and annotations

Test plan

  • python -m pytest tests/unit/test_auth_disabled.py -v — 11 tests pass
  • AUTH_MODE=optional — GET endpoints work without auth, POST returns 401
  • Anonymous suggestion flow: create session → save → submit with credit info
  • python scripts/seed-project.py --help — shows usage

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Configurable auth modes ("required", "optional", "disabled") and automatic anonymous viewer access when disabled
    • Anonymous suggestion flow: create/save/submit/discard anonymous sessions, token-based access, honeypot protection, and anonymous beacon endpoint
    • Anonymous session tokens for secure, time-limited access
  • Database Migration

    • Added anonymous suggestion fields (anonymous flag, submitter name/email, client IP)
  • New Scripts

    • Project seeding CLI to initialize projects, repos, and indexes without external services
  • Tests

    • Added coverage validating auth modes and anonymous behavior

damienriehl and others added 3 commits April 3, 2026 11:19
…odes

- Add auth_mode field (required/optional/disabled) to Settings class in config.py
- Add ANONYMOUS_USER constant to auth.py (id=anonymous, roles=[viewer])
- Add early returns in get_current_user, get_current_user_optional, get_current_user_with_token for disabled mode
- Optional mode: RequiredUser still 401s (write protection), OptionalUser returns None (browse works)
…disabled)

- Test ANONYMOUS_USER properties (id, roles, type, not superadmin)
- Test disabled mode: all three auth functions return ANONYMOUS_USER
- Test required mode: get_current_user raises 401, get_current_user_optional returns None
- Test optional mode: RequiredUser raises 401 (write protection), OptionalUser returns None (browse works)
Bypasses MinIO (for file upload) and Redis (for background jobs) by
directly creating the git repo, DB records, and PostgreSQL ontology
index. Supports downloading from URL, upstream sync config, and
index-only mode for rebuilding existing projects.

Usage: python scripts/seed-project.py --name "FOLIO" \
  --owl-url "https://raw.githubusercontent.com/.../FOLIO.owl" \
  --public --upstream-repo "alea-institute/FOLIO"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 3, 2026

📝 Walkthrough

Walkthrough

Adds anonymous suggestion support and a disable‑auth mode: new auth_mode setting and ANONYMOUS_USER behavior, anonymous token generation/verification, anonymous suggestion endpoints and service logic, DB schema and migration for anonymous metadata, a project seeding script, schema updates, and tests covering auth modes.

Changes

Cohort / File(s) Summary
Auth config & helpers
ontokit/core/config.py, ontokit/core/auth.py
Add Settings.auth_mode: str = "required" and module-level ANONYMOUS_USER; auth helpers short‑circuit to return anonymous user/token when auth_mode == "disabled" instead of raising 401.
Anonymous token
ontokit/core/anonymous_token.py
New HMAC-based anonymous token creation and verification functions (create_anonymous_token, verify_anonymous_token) with secret key validation and TTL handling.
Anonymous suggestion API
ontokit/api/routes/anonymous_suggestions.py, ontokit/api/routes/__init__.py
New FastAPI router for anonymous suggestion sessions (create, save, submit, discard, beacon); router registered under /projects and gated by auth_mode.
Suggestion session model & migration
ontokit/models/suggestion_session.py, alembic/versions/t8u9v0w1x2y3_add_anonymous_suggestion_fields.py
Add is_anonymous, submitter_name, submitter_email, client_ip columns to SuggestionSession and corresponding alembic migration.
Service layer changes
ontokit/services/suggestion_service.py, ontokit/services/ontology_index.py
Add anonymous session lifecycle methods (create/save/submit/discard), token generation, rate limiting, anonymous PR attribution; adjust ontology index label/annotation aggregation logic.
Schemas
ontokit/schemas/anonymous_suggestion.py, ontokit/schemas/suggestion.py
New Pydantic schemas for anonymous endpoints and add is_anonymous to SuggestionSessionSummary.
Project seeding script
scripts/seed-project.py
New executable to seed projects, create bare git repos, optionally parse RDF and invoke index rebuild; supports index-only and upstream sync options.
Tests
tests/unit/test_auth_disabled.py
New unit tests validating ANONYMOUS_USER and auth helper behavior across disabled, required, and optional modes.

Sequence Diagram

sequenceDiagram
    participant Client as Client
    participant API as API (anonymous_suggestions)
    participant Auth as Auth/token
    participant Service as SuggestionService
    participant DB as Database
    participant Git as Git Repo
    Client->>API: POST /projects/{id}/suggestions/anonymous/sessions (create)
    API->>Auth: create_anonymous_token(session_id)
    Auth-->>API: anonymous_token
    API->>Service: create_anonymous_session(project_id, client_ip)
    Service->>DB: insert session (is_anonymous=true)
    DB-->>Service: session persisted
    Service-->>API: AnonymousSessionCreateResponse (token)
    Client->>API: POST submit with X-Anonymous-Token
    API->>Auth: verify_anonymous_token(token)
    Auth-->>API: session_id (or None)
    API->>Service: submit_anonymous(project_id, session_id, data)
    Service->>Git: create branch, commit, push
    Service->>DB: update session -> SUBMITTED, store submitter fields
    Service->>Service: _create_pr_for_session -> call PR creation flow
    Service-->>API: AnonymousSubmitResponse
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

enhancement

Poem

🐰 In quiet code-burrows I hop and sing,

Tokens hum softly and anonymous springs,
Branches appear, RDF blossoms take flight,
Disabled auth lets a friendly hare write—
Seeded projects and tests snug as a spring night.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the main changes: AUTH_MODE support, anonymous suggestions, the seed script, and the index fix—all primary components of the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 95.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/seed-project-script

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
ontokit/core/config.py (1)

83-85: Use Literal type for auth_mode to ensure validation and type safety.

The app_env field on line 22 uses Literal["development", "staging", "production"] for validation. Apply the same pattern to auth_mode for consistency, runtime validation, and better MyPy strict mode support.

Suggested fix
-    # Auth mode: "required" (default), "optional" (browse without login, sign in for editing), "disabled" (no auth)
-    auth_mode: str = "required"
+    # Auth mode: "required" (default), "optional" (browse without login, sign in for editing), "disabled" (no auth)
+    auth_mode: Literal["required", "optional", "disabled"] = "required"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ontokit/core/config.py` around lines 83 - 85, Replace the plain str
annotation for auth_mode with a Literal type to enforce allowed values: change
auth_mode: str = "required" to auth_mode: Literal["required", "optional",
"disabled"] = "required", and add/import Literal from typing at the top of the
module; keep the default value "required" and ensure any runtime/config parsing
respects these options (reference the auth_mode field in the config class).
tests/unit/test_auth_disabled.py (1)

83-101: Consider adding test for get_current_user_with_token in required mode.

The TestAuthModeRequired class tests get_current_user and get_current_user_optional, but doesn't test get_current_user_with_token to verify it raises 401 when credentials are missing.

Suggested additional test
    `@pytest.mark.asyncio`
    `@patch`("ontokit.core.auth.settings")
    async def test_required_get_current_user_with_token_raises_401_without_credentials(self, mock_settings) -> None:  # noqa: ANN001
        """In required mode, get_current_user_with_token raises 401 when no credentials provided."""
        mock_settings.auth_mode = "required"
        with pytest.raises(HTTPException) as exc_info:
            await get_current_user_with_token(credentials=None)
        assert exc_info.value.status_code == 401
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/test_auth_disabled.py` around lines 83 - 101, Add a new async test
in TestAuthModeRequired to assert get_current_user_with_token raises a 401 when
credentials are None; patch ontokit.core.auth.settings to set
mock_settings.auth_mode = "required", call await
get_current_user_with_token(credentials=None) inside a
pytest.raises(HTTPException) context manager, and assert
exc_info.value.status_code == 401 to match the existing tests for
get_current_user and get_current_user_optional.
scripts/seed-project.py (2)

103-144: Synchronous subprocess calls block the event loop.

The create_git_repo function uses synchronous subprocess.run calls, which block the async event loop. For a CLI script this is acceptable, but consider using asyncio.subprocess for consistency with async patterns.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/seed-project.py` around lines 103 - 144, The function create_git_repo
blocks the async loop by using synchronous subprocess.run; convert it to an
async function (async def create_git_repo(...)) and replace each subprocess.run
call with asyncio.create_subprocess_exec (or asyncio.create_subprocess_shell)
plus await proc.communicate() to capture stdout/stderr, then check
proc.returncode and raise or handle errors as before; keep the same git command
arguments and use text decoding on communicate() results; update any callers to
await create_git_repo and ensure the tempdir/file writes remain compatible (use
pathlib.write_bytes directly or offload blocking file ops with asyncio.to_thread
if needed).

247-248: Use async_sessionmaker for SQLAlchemy 2.0 async sessions.

The sessionmaker class is the synchronous variant. For async SQLAlchemy, use async_sessionmaker from sqlalchemy.ext.asyncio, which was introduced in SQLAlchemy 2.0 and is the recommended approach. When using async_sessionmaker, the class_=AsyncSession parameter is implicit and should be removed.

Suggested fix
-from sqlalchemy.orm import sessionmaker
+from sqlalchemy.ext.asyncio import async_sessionmaker

 ...

     engine = create_async_engine(str(settings.database_url))
-    async_session = sessionmaker(engine, class_=AsyncSession, expire_on_commit=False)
+    async_session = async_sessionmaker(engine, expire_on_commit=False)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/seed-project.py` around lines 247 - 248, The code uses sessionmaker
to create an async session which is the synchronous API; replace
sessionmaker(engine, class_=AsyncSession, expire_on_commit=False) with
async_sessionmaker from sqlalchemy.ext.asyncio so the async session is created
correctly: import async_sessionmaker, then set async_session =
async_sessionmaker(engine, expire_on_commit=False) (remove the
class_=AsyncSession argument) while leaving engine =
create_async_engine(str(settings.database_url)) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/seed-project.py`:
- Around line 230-234: Replace the synchronous urllib.request.urlopen call that
reads args.owl_url with an async HTTP client call: make the surrounding function
async (or create an async helper) and use httpx.AsyncClient with a timeout to
fetch the URL (await response.aread() or response.read()) and assign to
owl_content; ensure you call it via asyncio.run if needed, and add basic
exception handling for timeouts and HTTP errors so the code no longer performs
blocking I/O when reading args.owl_url.
- Around line 296-306: The script currently prints an error when
args.upstream_repo is not in "owner/name" format but continues execution; modify
the block that parses parts = args.upstream_repo.split("/") so that when
len(parts) != 2 it exits with a non-zero status (e.g., call sys.exit(1) or raise
SystemExit) after printing the error, preventing the later await
configure_upstream_sync(...) from running; ensure you import sys if needed and
keep the rest of the flow (session, project_id, filename,
configure_upstream_sync) unchanged.
- Around line 91-98: The INSERT into project_members in seed-project.py uses a
non-existent column joined_at; update the SQL to use created_at (matching the
ProjectMember model) and keep the NOW() value (i.e., INSERT ... created_at) so
the seed uses the correct column for timestamps; locate the INSERT block that
binds id/project_id/user_id and replace joined_at with created_at.

---

Nitpick comments:
In `@ontokit/core/config.py`:
- Around line 83-85: Replace the plain str annotation for auth_mode with a
Literal type to enforce allowed values: change auth_mode: str = "required" to
auth_mode: Literal["required", "optional", "disabled"] = "required", and
add/import Literal from typing at the top of the module; keep the default value
"required" and ensure any runtime/config parsing respects these options
(reference the auth_mode field in the config class).

In `@scripts/seed-project.py`:
- Around line 103-144: The function create_git_repo blocks the async loop by
using synchronous subprocess.run; convert it to an async function (async def
create_git_repo(...)) and replace each subprocess.run call with
asyncio.create_subprocess_exec (or asyncio.create_subprocess_shell) plus await
proc.communicate() to capture stdout/stderr, then check proc.returncode and
raise or handle errors as before; keep the same git command arguments and use
text decoding on communicate() results; update any callers to await
create_git_repo and ensure the tempdir/file writes remain compatible (use
pathlib.write_bytes directly or offload blocking file ops with asyncio.to_thread
if needed).
- Around line 247-248: The code uses sessionmaker to create an async session
which is the synchronous API; replace sessionmaker(engine, class_=AsyncSession,
expire_on_commit=False) with async_sessionmaker from sqlalchemy.ext.asyncio so
the async session is created correctly: import async_sessionmaker, then set
async_session = async_sessionmaker(engine, expire_on_commit=False) (remove the
class_=AsyncSession argument) while leaving engine =
create_async_engine(str(settings.database_url)) unchanged.

In `@tests/unit/test_auth_disabled.py`:
- Around line 83-101: Add a new async test in TestAuthModeRequired to assert
get_current_user_with_token raises a 401 when credentials are None; patch
ontokit.core.auth.settings to set mock_settings.auth_mode = "required", call
await get_current_user_with_token(credentials=None) inside a
pytest.raises(HTTPException) context manager, and assert
exc_info.value.status_code == 401 to match the existing tests for
get_current_user and get_current_user_optional.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 72398f10-540f-4f05-aea9-3b082538e4fb

📥 Commits

Reviewing files that changed from the base of the PR and between b8e0bab and ab81fb1.

📒 Files selected for processing (4)
  • ontokit/core/auth.py
  • ontokit/core/config.py
  • scripts/seed-project.py
  • tests/unit/test_auth_disabled.py

Comment on lines +91 to +98
await session.execute(
text("""
INSERT INTO project_members (id, project_id, user_id, role, joined_at)
VALUES (:id, :project_id, :user_id, 'owner', NOW())
ON CONFLICT (project_id, user_id) DO NOTHING
"""),
{"id": str(uuid4()), "project_id": str(project_id), "user_id": owner_id},
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify the actual column name in the ProjectMember model
rg -n 'class ProjectMember' -A 20 --type=py

Repository: CatholicOS/ontokit-api

Length of output: 1586


Fix column name in INSERT statement: use created_at instead of joined_at.

The SQL statement references a non-existent joined_at column. The ProjectMember model defines the column as created_at (line 88 of ontokit/models/project.py), which will cause a database error during seed execution.

Suggested fix
     await session.execute(
         text("""
-            INSERT INTO project_members (id, project_id, user_id, role, joined_at)
-            VALUES (:id, :project_id, :user_id, 'owner', NOW())
+            INSERT INTO project_members (id, project_id, user_id, role, created_at)
+            VALUES (:id, :project_id, :user_id, 'owner', NOW())
             ON CONFLICT (project_id, user_id) DO NOTHING
         """),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
await session.execute(
text("""
INSERT INTO project_members (id, project_id, user_id, role, joined_at)
VALUES (:id, :project_id, :user_id, 'owner', NOW())
ON CONFLICT (project_id, user_id) DO NOTHING
"""),
{"id": str(uuid4()), "project_id": str(project_id), "user_id": owner_id},
)
await session.execute(
text("""
INSERT INTO project_members (id, project_id, user_id, role, created_at)
VALUES (:id, :project_id, :user_id, 'owner', NOW())
ON CONFLICT (project_id, user_id) DO NOTHING
"""),
{"id": str(uuid4()), "project_id": str(project_id), "user_id": owner_id},
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/seed-project.py` around lines 91 - 98, The INSERT into
project_members in seed-project.py uses a non-existent column joined_at; update
the SQL to use created_at (matching the ProjectMember model) and keep the NOW()
value (i.e., INSERT ... created_at) so the seed uses the correct column for
timestamps; locate the INSERT block that binds id/project_id/user_id and replace
joined_at with created_at.

Comment on lines +230 to +234
if args.owl_url:
print(f"Downloading {args.owl_url}...")
with urllib.request.urlopen(args.owl_url) as response:
owl_content = response.read()
print(f" Downloaded {len(owl_content)} bytes")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use async HTTP client instead of synchronous urllib.request.

urllib.request.urlopen is synchronous I/O, which violates the coding guideline to use async/await for all I/O operations. Additionally, it lacks a timeout parameter, which could cause the script to hang indefinitely.

As per coding guidelines: "Use async/await for all I/O operations in Python code".

Suggested fix using httpx
-import urllib.request
+import httpx

 ...

-    if args.owl_url:
-        print(f"Downloading {args.owl_url}...")
-        with urllib.request.urlopen(args.owl_url) as response:
-            owl_content = response.read()
-        print(f"  Downloaded {len(owl_content)} bytes")
+    if args.owl_url:
+        print(f"Downloading {args.owl_url}...")
+        async with httpx.AsyncClient() as client:
+            response = await client.get(args.owl_url, timeout=60.0)
+            response.raise_for_status()
+            owl_content = response.content
+        print(f"  Downloaded {len(owl_content)} bytes")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/seed-project.py` around lines 230 - 234, Replace the synchronous
urllib.request.urlopen call that reads args.owl_url with an async HTTP client
call: make the surrounding function async (or create an async helper) and use
httpx.AsyncClient with a timeout to fetch the URL (await response.aread() or
response.read()) and assign to owl_content; ensure you call it via asyncio.run
if needed, and add basic exception handling for timeouts and HTTP errors so the
code no longer performs blocking I/O when reading args.owl_url.

Comment on lines +296 to +306
parts = args.upstream_repo.split("/")
if len(parts) != 2:
print(f"ERROR: --upstream-repo must be 'owner/name' format, got '{args.upstream_repo}'")
else:
await configure_upstream_sync(
session, project_id,
repo_owner=parts[0],
repo_name=parts[1],
branch=args.upstream_branch,
file_path=args.upstream_file or filename,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Script continues after error instead of exiting.

When --upstream-repo format is invalid, the script prints an error but continues execution and reports success. It should exit with a non-zero status code.

Suggested fix
             if args.upstream_repo:
                 parts = args.upstream_repo.split("/")
                 if len(parts) != 2:
                     print(f"ERROR: --upstream-repo must be 'owner/name' format, got '{args.upstream_repo}'")
+                    sys.exit(1)
                 else:
                     await configure_upstream_sync(
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
parts = args.upstream_repo.split("/")
if len(parts) != 2:
print(f"ERROR: --upstream-repo must be 'owner/name' format, got '{args.upstream_repo}'")
else:
await configure_upstream_sync(
session, project_id,
repo_owner=parts[0],
repo_name=parts[1],
branch=args.upstream_branch,
file_path=args.upstream_file or filename,
)
parts = args.upstream_repo.split("/")
if len(parts) != 2:
print(f"ERROR: --upstream-repo must be 'owner/name' format, got '{args.upstream_repo}'")
sys.exit(1)
else:
await configure_upstream_sync(
session, project_id,
repo_owner=parts[0],
repo_name=parts[1],
branch=args.upstream_branch,
file_path=args.upstream_file or filename,
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/seed-project.py` around lines 296 - 306, The script currently prints
an error when args.upstream_repo is not in "owner/name" format but continues
execution; modify the block that parses parts = args.upstream_repo.split("/") so
that when len(parts) != 2 it exits with a non-zero status (e.g., call
sys.exit(1) or raise SystemExit) after printing the error, preventing the later
await configure_upstream_sync(...) from running; ensure you import sys if needed
and keep the rest of the flow (session, project_id, filename,
configure_upstream_sync) unchanged.

damienriehl and others added 5 commits April 3, 2026 17:45
- Add ontokit/core/anonymous_token.py with HMAC-signed create/verify functions (24h TTL, anon: prefix to prevent token type confusion with beacon tokens)
- Extend SuggestionSession model with is_anonymous, submitter_name, submitter_email, and client_ip columns
- Add ontokit/schemas/anonymous_suggestion.py with AnonymousSessionCreateResponse, AnonymousSubmitRequest (honeypot field aliased as 'website'), and AnonymousSubmitResponse
…e limiting

- Add ontokit/api/routes/anonymous_suggestions.py with create, save, submit, discard, and beacon endpoints
- All endpoints gate on AUTH_MODE != 'required' (return 403 otherwise)
- Rate limit: create endpoint checks for >= 5 anonymous sessions from same IP in last hour, returns 429
- Save/submit/discard authenticate via X-Anonymous-Token header using verify_anonymous_token()
- Submit endpoint silently returns fake success for honeypot-filled bot requests
- Add create_anonymous_session, save_anonymous, submit_anonymous, discard_anonymous methods to SuggestionService
- Register anonymous_suggestions router in ontokit/api/routes/__init__.py under /projects prefix
- Add is_anonymous field to SuggestionSessionSummary schema
- _build_summary: use submitter_name/email (credit info) over user_name/email for anonymous sessions
- _create_pr_for_session: show 'Submitted anonymously' or 'Submitted by {name}' for anonymous sessions
- Existing authenticated session summaries are unchanged (backward compatible)
Adds is_anonymous, submitter_name, submitter_email, client_ip columns
to suggestion_sessions table. These were added to the model in plan
10-01 but the migration was missing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The index path was only returning rdfs:label as labels and excluding
all LABEL_PROPERTIES (skos:altLabel, skos:prefLabel, etc.) from
annotations. This caused translations and synonyms stored as
skos:altLabel to be invisible in browse mode.

Fix: include non-rdfs:label entries from indexed_labels as annotations,
matching the RDFLib fallback path behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@damienriehl damienriehl changed the title feat: add seed-project.py and AUTH_MODE support feat: AUTH_MODE, anonymous suggestions, seed script, and index fix Apr 4, 2026
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
ontokit/services/ontology_index.py (1)

659-684: ⚠️ Potential issue | 🟠 Major

Avoid double-counting label properties in annotations.

Line 662 now includes all non-comment annotation rows, and Lines 679-684 append non-rdfs:label
rows from IndexedLabel into the same annotations_by_prop. This can duplicate values for overlapping
properties.

💡 Suggested fix
+        non_rdfs_label_iris = [iri for iri, _ in LABEL_PROPERTIES if iri != rdfs_label_iri]
         annotations_result = await self.db.execute(
             select(IndexedAnnotation).where(
                 IndexedAnnotation.entity_id == entity.id,
                 IndexedAnnotation.property_iri != rdfs_comment_iri,
+                IndexedAnnotation.property_iri.notin_(non_rdfs_label_iris),
             )
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ontokit/services/ontology_index.py` around lines 659 - 684, The annotations
aggregation currently double-counts properties present in both IndexedAnnotation
and IndexedLabel; when iterating non_rdfs_labels_result (from
select(IndexedLabel)...), check against annotations_by_prop (the dict populated
from annotations_result/scalars()) and skip or merge only new values for any
property_iri already present. Update the loop that consumes
non_rdfs_labels_result to reference annotations_by_prop and either continue if
key in annotations_by_prop or deduplicate entries before appending so properties
from IndexedLabel do not duplicate those from IndexedAnnotation (use the
variables/objects IndexedLabel, IndexedAnnotation, annotations_result,
non_rdfs_labels_result, annotations_by_prop, and rdfs_label_iri to locate the
logic).
🧹 Nitpick comments (3)
ontokit/api/routes/__init__.py (1)

43-45: Minor tag naming inconsistency.

The tag "anonymous-suggestions" uses lowercase-hyphenated format while other tags use title case (e.g., "Suggestions", "Pull Requests"). Consider using "Anonymous Suggestions" for consistency in OpenAPI documentation.

Suggested fix
 router.include_router(
-    anonymous_suggestions.router, prefix="/projects", tags=["anonymous-suggestions"]
+    anonymous_suggestions.router, prefix="/projects", tags=["Anonymous Suggestions"]
 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ontokit/api/routes/__init__.py` around lines 43 - 45, The OpenAPI tag for the
anonymous suggestions router is inconsistent; update the tag string passed to
router.include_router for anonymous_suggestions.router (currently
"anonymous-suggestions") to use title case "Anonymous Suggestions" to match
other tags like "Suggestions" and "Pull Requests" so documentation labels are
consistent.
ontokit/api/routes/anonymous_suggestions.py (1)

123-126: Honeypot response leaks detection to sophisticated bots.

Returning pr_number=0 for honeypot-triggered requests differs from legitimate submissions that return actual PR numbers. A sophisticated bot could detect this pattern and retry without filling the honeypot field. Consider returning a realistic-looking fake PR number or using a randomized value.

Suggested improvement
     if data.honeypot is not None and data.honeypot != "":
         # Silent fake success — do not create anything
-        return AnonymousSubmitResponse(pr_number=0, pr_url=None, status="submitted")
+        import random
+        fake_pr = random.randint(100, 9999)
+        return AnonymousSubmitResponse(pr_number=fake_pr, pr_url=None, status="submitted")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ontokit/api/routes/anonymous_suggestions.py` around lines 123 - 126, The
honeypot branch in anonymous_suggestions.py currently returns a fixed
pr_number=0 which leaks detection; change the honeypot handling in the function
that constructs AnonymousSubmitResponse so it returns a realistic-looking
randomized PR number (e.g., a random integer in the same range/format as real
PRs) and a matching pr_url pattern or None to match normal responses, keeping
status value identical to legitimate submissions; use data.honeypot and
AnonymousSubmitResponse as the reference points when implementing the
randomized/fake-but-plausible response.
ontokit/models/suggestion_session.py (1)

57-61: Consider adding an index for the rate-limiting query.

The anonymous session rate-limiting query in suggestion_service.py filters on (project_id, is_anonymous, client_ip, created_at). Without an index, this will require a full table scan as the suggestion_sessions table grows. Consider adding a partial index for anonymous sessions:

__table_args__ = (
    UniqueConstraint("project_id", "session_id", name="uq_suggestion_session"),
    Index(
        "ix_suggestion_sessions_rate_limit",
        "project_id", "client_ip", "created_at",
        postgresql_where=text("is_anonymous = true"),
    ),
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ontokit/models/suggestion_session.py` around lines 57 - 61, Add a partial
index to the SuggestionSession ORM model to support the rate-limiting query
(which filters on project_id, is_anonymous, client_ip, created_at): modify the
SuggestionSession class in suggestion_session.py to include a __table_args__
tuple that preserves the existing UniqueConstraint on (project_id, session_id)
and adds an Index named e.g. "ix_suggestion_sessions_rate_limit" over
("project_id", "client_ip", "created_at") with a PostgreSQL WHERE clause
limiting it to is_anonymous = true (use sqlalchemy.text for the postgresql_where
expression) so anonymous-session lookups used by the logic in
suggestion_service.py can use the index.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ontokit/api/routes/anonymous_suggestions.py`:
- Around line 151-176: The anonymous beacon handler calls
service.beacon_save(project_id, data, data.session_id) but beacon_save expects a
beacon token and will call verify_beacon_token(token); instead, after
verify_anonymous_token(token) returns verified_session_id you must load the
SuggestionSession for that session ID and extract its stored beacon_token, then
call await service.beacon_save(project_id, data, beacon_token); update
anonymous_beacon_save (and/or add a service method like beacon_save_anonymous on
SuggestionService) to use the session's beacon_token rather than
data.session_id, referencing verify_anonymous_token, anonymous_beacon_save,
SuggestionService.beacon_save, and SuggestionBeaconRequest to locate the code.

In `@ontokit/models/suggestion_session.py`:
- Around line 57-61: The Alembic revision
t8u9v0w1x2y3_add_anonymous_suggestion_fields.py adds columns without length
constraints that conflict with the SuggestionSession model fields
(submitter_name, submitter_email, client_ip) which use String(255)/String(45);
update the migration's op.add_column calls to use sa.String(255) for
submitter_name and submitter_email and sa.String(45) for client_ip so the DB
schema matches the model definitions.

In `@ontokit/services/ontology_index.py`:
- Around line 689-693: The code appends label-derived annotations with an empty
language tag (lbl.lang or ""), causing inconsistent semantics; update the
fallback to a concrete tag (e.g., use "und" or "en") by replacing the
empty-string fallback in the annotations_by_prop append (the place that uses
lbl.value and lbl.lang) so it writes lbl.lang or "und" (or "en") instead of
lbl.lang or "" to ensure consistent language metadata.

---

Outside diff comments:
In `@ontokit/services/ontology_index.py`:
- Around line 659-684: The annotations aggregation currently double-counts
properties present in both IndexedAnnotation and IndexedLabel; when iterating
non_rdfs_labels_result (from select(IndexedLabel)...), check against
annotations_by_prop (the dict populated from annotations_result/scalars()) and
skip or merge only new values for any property_iri already present. Update the
loop that consumes non_rdfs_labels_result to reference annotations_by_prop and
either continue if key in annotations_by_prop or deduplicate entries before
appending so properties from IndexedLabel do not duplicate those from
IndexedAnnotation (use the variables/objects IndexedLabel, IndexedAnnotation,
annotations_result, non_rdfs_labels_result, annotations_by_prop, and
rdfs_label_iri to locate the logic).

---

Nitpick comments:
In `@ontokit/api/routes/__init__.py`:
- Around line 43-45: The OpenAPI tag for the anonymous suggestions router is
inconsistent; update the tag string passed to router.include_router for
anonymous_suggestions.router (currently "anonymous-suggestions") to use title
case "Anonymous Suggestions" to match other tags like "Suggestions" and "Pull
Requests" so documentation labels are consistent.

In `@ontokit/api/routes/anonymous_suggestions.py`:
- Around line 123-126: The honeypot branch in anonymous_suggestions.py currently
returns a fixed pr_number=0 which leaks detection; change the honeypot handling
in the function that constructs AnonymousSubmitResponse so it returns a
realistic-looking randomized PR number (e.g., a random integer in the same
range/format as real PRs) and a matching pr_url pattern or None to match normal
responses, keeping status value identical to legitimate submissions; use
data.honeypot and AnonymousSubmitResponse as the reference points when
implementing the randomized/fake-but-plausible response.

In `@ontokit/models/suggestion_session.py`:
- Around line 57-61: Add a partial index to the SuggestionSession ORM model to
support the rate-limiting query (which filters on project_id, is_anonymous,
client_ip, created_at): modify the SuggestionSession class in
suggestion_session.py to include a __table_args__ tuple that preserves the
existing UniqueConstraint on (project_id, session_id) and adds an Index named
e.g. "ix_suggestion_sessions_rate_limit" over ("project_id", "client_ip",
"created_at") with a PostgreSQL WHERE clause limiting it to is_anonymous = true
(use sqlalchemy.text for the postgresql_where expression) so anonymous-session
lookups used by the logic in suggestion_service.py can use the index.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 28db8a0d-f232-4e64-8329-cd0181080a9c

📥 Commits

Reviewing files that changed from the base of the PR and between ab81fb1 and 81c5e14.

📒 Files selected for processing (9)
  • alembic/versions/t8u9v0w1x2y3_add_anonymous_suggestion_fields.py
  • ontokit/api/routes/__init__.py
  • ontokit/api/routes/anonymous_suggestions.py
  • ontokit/core/anonymous_token.py
  • ontokit/models/suggestion_session.py
  • ontokit/schemas/anonymous_suggestion.py
  • ontokit/schemas/suggestion.py
  • ontokit/services/ontology_index.py
  • ontokit/services/suggestion_service.py
✅ Files skipped from review due to trivial changes (2)
  • alembic/versions/t8u9v0w1x2y3_add_anonymous_suggestion_fields.py
  • ontokit/schemas/anonymous_suggestion.py

Comment on lines +151 to +176
@router.post(
"/{project_id}/suggestions/anonymous/beacon",
status_code=status.HTTP_204_NO_CONTENT,
)
async def anonymous_beacon_save(
project_id: UUID,
data: SuggestionBeaconRequest,
service: Annotated[SuggestionService, Depends(get_service)],
token: str = Query(..., description="Anonymous session token for authentication"),
) -> Response:
"""Handle a sendBeacon flush for anonymous sessions.

Authenticated via 'token' query parameter (same pattern as authenticated beacon).
"""
_require_anonymous_mode()
verified_session_id = verify_anonymous_token(token)
if verified_session_id is None:
from fastapi import HTTPException

raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Invalid or expired anonymous token",
)
# Delegate to the existing beacon_save (session lookup is by session_id, no user check)
await service.beacon_save(project_id, data, data.session_id)
return Response(status_code=status.HTTP_204_NO_CONTENT)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if there's a beacon_save_anonymous method or similar in the service
ast-grep --pattern 'async def beacon_save$_($$$) {
  $$$
}'

Repository: CatholicOS/ontokit-api

Length of output: 48


🏁 Script executed:

#!/bin/bash
# Find the suggestion service file
fd -type f -name "*suggestion*service*"

Repository: CatholicOS/ontokit-api

Length of output: 236


🏁 Script executed:

#!/bin/bash
# Search for the beacon_save method definition
rg -A 30 "async def beacon_save" --type py

Repository: CatholicOS/ontokit-api

Length of output: 4580


🏁 Script executed:

#!/bin/bash
# Also search for any beacon_save_anonymous method
rg "beacon_save" --type py -B 2 -A 10

Repository: CatholicOS/ontokit-api

Length of output: 5167


🏁 Script executed:

#!/bin/bash
# Search for beacon_save_anonymous or similar methods in the service
rg "def beacon_save" ontokit/services/ -A 3

Repository: CatholicOS/ontokit-api

Length of output: 414


🏁 Script executed:

#!/bin/bash
# Check what verify_anonymous_token returns and understand the token flow
rg "verify_anonymous_token|beacon_token" -B 3 -A 3

Repository: CatholicOS/ontokit-api

Length of output: 14050


Critical bug: beacon_save expects a beacon token, not a session ID — all anonymous beacon saves will fail.

The beacon_save method signature is async def beacon_save(self, project_id: UUID, data: SuggestionBeaconRequest, token: str), and its implementation calls verify_beacon_token(token) to validate the token. However, line 175 passes data.session_id (a plain identifier string) instead of a beacon token. This will cause verify_beacon_token(data.session_id) to fail immediately with "Invalid or expired beacon token" on every call.

Each session stores two separate tokens: an anonymous_token for initial authentication (verified at line 166 to get verified_session_id) and a beacon_token stored in the SuggestionSession model for beacon operations. The current code extracts verified_session_id but never uses it.

Pass the session's stored beacon_token to service.beacon_save instead, or create a dedicated beacon_save_anonymous method in the service that bypasses beacon token re-verification.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ontokit/api/routes/anonymous_suggestions.py` around lines 151 - 176, The
anonymous beacon handler calls service.beacon_save(project_id, data,
data.session_id) but beacon_save expects a beacon token and will call
verify_beacon_token(token); instead, after verify_anonymous_token(token) returns
verified_session_id you must load the SuggestionSession for that session ID and
extract its stored beacon_token, then call await service.beacon_save(project_id,
data, beacon_token); update anonymous_beacon_save (and/or add a service method
like beacon_save_anonymous on SuggestionService) to use the session's
beacon_token rather than data.session_id, referencing verify_anonymous_token,
anonymous_beacon_save, SuggestionService.beacon_save, and
SuggestionBeaconRequest to locate the code.

Comment on lines +57 to +61
# Anonymous session fields
is_anonymous: Mapped[bool] = mapped_column(Boolean, default=False, server_default="false")
submitter_name: Mapped[str | None] = mapped_column(String(255), nullable=True)
submitter_email: Mapped[str | None] = mapped_column(String(255), nullable=True)
client_ip: Mapped[str | None] = mapped_column(String(45), nullable=True)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Migration schema mismatch with model definitions.

The Alembic migration (per context snippet alembic/versions/t8u9v0w1x2y3_add_anonymous_suggestion_fields.py:18-22) uses sa.String() without length constraints for submitter_name, submitter_email, and client_ip, but the model specifies String(255), String(255), and String(45) respectively. On databases like MySQL that require explicit VARCHAR lengths, this mismatch will cause schema inconsistencies or failures.

Update the migration to include the length constraints:

op.add_column("suggestion_sessions", sa.Column("submitter_name", sa.String(255), nullable=True))
op.add_column("suggestion_sessions", sa.Column("submitter_email", sa.String(255), nullable=True))
op.add_column("suggestion_sessions", sa.Column("client_ip", sa.String(45), nullable=True))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ontokit/models/suggestion_session.py` around lines 57 - 61, The Alembic
revision t8u9v0w1x2y3_add_anonymous_suggestion_fields.py adds columns without
length constraints that conflict with the SuggestionSession model fields
(submitter_name, submitter_email, client_ip) which use String(255)/String(45);
update the migration's op.add_column calls to use sa.String(255) for
submitter_name and submitter_email and sa.String(45) for client_ip so the DB
schema matches the model definitions.

Comment on lines +689 to +693
annotations_by_prop[key].append(
{
"value": lbl.value,
"lang": lbl.lang or "",
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use a non-empty language fallback for label-derived annotation values.

On Line 692, lbl.lang or "" emits empty language tags for text labels. Prefer a concrete fallback
(e.g. "en" or "und") to keep response semantics consistent.

💡 Suggested fix
             annotations_by_prop[key].append(
                 {
                     "value": lbl.value,
-                    "lang": lbl.lang or "",
+                    "lang": lbl.lang or "en",
                 }
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ontokit/services/ontology_index.py` around lines 689 - 693, The code appends
label-derived annotations with an empty language tag (lbl.lang or ""), causing
inconsistent semantics; update the fallback to a concrete tag (e.g., use "und"
or "en") by replacing the empty-string fallback in the annotations_by_prop
append (the place that uses lbl.value and lbl.lang) so it writes lbl.lang or
"und" (or "en") instead of lbl.lang or "" to ensure consistent language
metadata.

@JohnRDOrazio JohnRDOrazio changed the base branch from main to dev April 9, 2026 02:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant