feat: AUTH_MODE, anonymous suggestions, seed script, and index fix#27
feat: AUTH_MODE, anonymous suggestions, seed script, and index fix#27damienriehl wants to merge 9 commits intodevfrom
Conversation
…odes - Add auth_mode field (required/optional/disabled) to Settings class in config.py - Add ANONYMOUS_USER constant to auth.py (id=anonymous, roles=[viewer]) - Add early returns in get_current_user, get_current_user_optional, get_current_user_with_token for disabled mode - Optional mode: RequiredUser still 401s (write protection), OptionalUser returns None (browse works)
…disabled) - Test ANONYMOUS_USER properties (id, roles, type, not superadmin) - Test disabled mode: all three auth functions return ANONYMOUS_USER - Test required mode: get_current_user raises 401, get_current_user_optional returns None - Test optional mode: RequiredUser raises 401 (write protection), OptionalUser returns None (browse works)
Bypasses MinIO (for file upload) and Redis (for background jobs) by directly creating the git repo, DB records, and PostgreSQL ontology index. Supports downloading from URL, upstream sync config, and index-only mode for rebuilding existing projects. Usage: python scripts/seed-project.py --name "FOLIO" \ --owl-url "https://raw.githubusercontent.com/.../FOLIO.owl" \ --public --upstream-repo "alea-institute/FOLIO" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughAdds anonymous suggestion support and a disable‑auth mode: new Changes
Sequence DiagramsequenceDiagram
participant Client as Client
participant API as API (anonymous_suggestions)
participant Auth as Auth/token
participant Service as SuggestionService
participant DB as Database
participant Git as Git Repo
Client->>API: POST /projects/{id}/suggestions/anonymous/sessions (create)
API->>Auth: create_anonymous_token(session_id)
Auth-->>API: anonymous_token
API->>Service: create_anonymous_session(project_id, client_ip)
Service->>DB: insert session (is_anonymous=true)
DB-->>Service: session persisted
Service-->>API: AnonymousSessionCreateResponse (token)
Client->>API: POST submit with X-Anonymous-Token
API->>Auth: verify_anonymous_token(token)
Auth-->>API: session_id (or None)
API->>Service: submit_anonymous(project_id, session_id, data)
Service->>Git: create branch, commit, push
Service->>DB: update session -> SUBMITTED, store submitter fields
Service->>Service: _create_pr_for_session -> call PR creation flow
Service-->>API: AnonymousSubmitResponse
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (4)
ontokit/core/config.py (1)
83-85: UseLiteraltype forauth_modeto ensure validation and type safety.The
app_envfield on line 22 usesLiteral["development", "staging", "production"]for validation. Apply the same pattern toauth_modefor consistency, runtime validation, and better MyPy strict mode support.Suggested fix
- # Auth mode: "required" (default), "optional" (browse without login, sign in for editing), "disabled" (no auth) - auth_mode: str = "required" + # Auth mode: "required" (default), "optional" (browse without login, sign in for editing), "disabled" (no auth) + auth_mode: Literal["required", "optional", "disabled"] = "required"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ontokit/core/config.py` around lines 83 - 85, Replace the plain str annotation for auth_mode with a Literal type to enforce allowed values: change auth_mode: str = "required" to auth_mode: Literal["required", "optional", "disabled"] = "required", and add/import Literal from typing at the top of the module; keep the default value "required" and ensure any runtime/config parsing respects these options (reference the auth_mode field in the config class).tests/unit/test_auth_disabled.py (1)
83-101: Consider adding test forget_current_user_with_tokenin required mode.The
TestAuthModeRequiredclass testsget_current_userandget_current_user_optional, but doesn't testget_current_user_with_tokento verify it raises 401 when credentials are missing.Suggested additional test
`@pytest.mark.asyncio` `@patch`("ontokit.core.auth.settings") async def test_required_get_current_user_with_token_raises_401_without_credentials(self, mock_settings) -> None: # noqa: ANN001 """In required mode, get_current_user_with_token raises 401 when no credentials provided.""" mock_settings.auth_mode = "required" with pytest.raises(HTTPException) as exc_info: await get_current_user_with_token(credentials=None) assert exc_info.value.status_code == 401🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/test_auth_disabled.py` around lines 83 - 101, Add a new async test in TestAuthModeRequired to assert get_current_user_with_token raises a 401 when credentials are None; patch ontokit.core.auth.settings to set mock_settings.auth_mode = "required", call await get_current_user_with_token(credentials=None) inside a pytest.raises(HTTPException) context manager, and assert exc_info.value.status_code == 401 to match the existing tests for get_current_user and get_current_user_optional.scripts/seed-project.py (2)
103-144: Synchronous subprocess calls block the event loop.The
create_git_repofunction uses synchronoussubprocess.runcalls, which block the async event loop. For a CLI script this is acceptable, but consider usingasyncio.subprocessfor consistency with async patterns.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/seed-project.py` around lines 103 - 144, The function create_git_repo blocks the async loop by using synchronous subprocess.run; convert it to an async function (async def create_git_repo(...)) and replace each subprocess.run call with asyncio.create_subprocess_exec (or asyncio.create_subprocess_shell) plus await proc.communicate() to capture stdout/stderr, then check proc.returncode and raise or handle errors as before; keep the same git command arguments and use text decoding on communicate() results; update any callers to await create_git_repo and ensure the tempdir/file writes remain compatible (use pathlib.write_bytes directly or offload blocking file ops with asyncio.to_thread if needed).
247-248: Useasync_sessionmakerfor SQLAlchemy 2.0 async sessions.The
sessionmakerclass is the synchronous variant. For async SQLAlchemy, useasync_sessionmakerfromsqlalchemy.ext.asyncio, which was introduced in SQLAlchemy 2.0 and is the recommended approach. When usingasync_sessionmaker, theclass_=AsyncSessionparameter is implicit and should be removed.Suggested fix
-from sqlalchemy.orm import sessionmaker +from sqlalchemy.ext.asyncio import async_sessionmaker ... engine = create_async_engine(str(settings.database_url)) - async_session = sessionmaker(engine, class_=AsyncSession, expire_on_commit=False) + async_session = async_sessionmaker(engine, expire_on_commit=False)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/seed-project.py` around lines 247 - 248, The code uses sessionmaker to create an async session which is the synchronous API; replace sessionmaker(engine, class_=AsyncSession, expire_on_commit=False) with async_sessionmaker from sqlalchemy.ext.asyncio so the async session is created correctly: import async_sessionmaker, then set async_session = async_sessionmaker(engine, expire_on_commit=False) (remove the class_=AsyncSession argument) while leaving engine = create_async_engine(str(settings.database_url)) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/seed-project.py`:
- Around line 230-234: Replace the synchronous urllib.request.urlopen call that
reads args.owl_url with an async HTTP client call: make the surrounding function
async (or create an async helper) and use httpx.AsyncClient with a timeout to
fetch the URL (await response.aread() or response.read()) and assign to
owl_content; ensure you call it via asyncio.run if needed, and add basic
exception handling for timeouts and HTTP errors so the code no longer performs
blocking I/O when reading args.owl_url.
- Around line 296-306: The script currently prints an error when
args.upstream_repo is not in "owner/name" format but continues execution; modify
the block that parses parts = args.upstream_repo.split("/") so that when
len(parts) != 2 it exits with a non-zero status (e.g., call sys.exit(1) or raise
SystemExit) after printing the error, preventing the later await
configure_upstream_sync(...) from running; ensure you import sys if needed and
keep the rest of the flow (session, project_id, filename,
configure_upstream_sync) unchanged.
- Around line 91-98: The INSERT into project_members in seed-project.py uses a
non-existent column joined_at; update the SQL to use created_at (matching the
ProjectMember model) and keep the NOW() value (i.e., INSERT ... created_at) so
the seed uses the correct column for timestamps; locate the INSERT block that
binds id/project_id/user_id and replace joined_at with created_at.
---
Nitpick comments:
In `@ontokit/core/config.py`:
- Around line 83-85: Replace the plain str annotation for auth_mode with a
Literal type to enforce allowed values: change auth_mode: str = "required" to
auth_mode: Literal["required", "optional", "disabled"] = "required", and
add/import Literal from typing at the top of the module; keep the default value
"required" and ensure any runtime/config parsing respects these options
(reference the auth_mode field in the config class).
In `@scripts/seed-project.py`:
- Around line 103-144: The function create_git_repo blocks the async loop by
using synchronous subprocess.run; convert it to an async function (async def
create_git_repo(...)) and replace each subprocess.run call with
asyncio.create_subprocess_exec (or asyncio.create_subprocess_shell) plus await
proc.communicate() to capture stdout/stderr, then check proc.returncode and
raise or handle errors as before; keep the same git command arguments and use
text decoding on communicate() results; update any callers to await
create_git_repo and ensure the tempdir/file writes remain compatible (use
pathlib.write_bytes directly or offload blocking file ops with asyncio.to_thread
if needed).
- Around line 247-248: The code uses sessionmaker to create an async session
which is the synchronous API; replace sessionmaker(engine, class_=AsyncSession,
expire_on_commit=False) with async_sessionmaker from sqlalchemy.ext.asyncio so
the async session is created correctly: import async_sessionmaker, then set
async_session = async_sessionmaker(engine, expire_on_commit=False) (remove the
class_=AsyncSession argument) while leaving engine =
create_async_engine(str(settings.database_url)) unchanged.
In `@tests/unit/test_auth_disabled.py`:
- Around line 83-101: Add a new async test in TestAuthModeRequired to assert
get_current_user_with_token raises a 401 when credentials are None; patch
ontokit.core.auth.settings to set mock_settings.auth_mode = "required", call
await get_current_user_with_token(credentials=None) inside a
pytest.raises(HTTPException) context manager, and assert
exc_info.value.status_code == 401 to match the existing tests for
get_current_user and get_current_user_optional.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 72398f10-540f-4f05-aea9-3b082538e4fb
📒 Files selected for processing (4)
ontokit/core/auth.pyontokit/core/config.pyscripts/seed-project.pytests/unit/test_auth_disabled.py
| await session.execute( | ||
| text(""" | ||
| INSERT INTO project_members (id, project_id, user_id, role, joined_at) | ||
| VALUES (:id, :project_id, :user_id, 'owner', NOW()) | ||
| ON CONFLICT (project_id, user_id) DO NOTHING | ||
| """), | ||
| {"id": str(uuid4()), "project_id": str(project_id), "user_id": owner_id}, | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify the actual column name in the ProjectMember model
rg -n 'class ProjectMember' -A 20 --type=pyRepository: CatholicOS/ontokit-api
Length of output: 1586
Fix column name in INSERT statement: use created_at instead of joined_at.
The SQL statement references a non-existent joined_at column. The ProjectMember model defines the column as created_at (line 88 of ontokit/models/project.py), which will cause a database error during seed execution.
Suggested fix
await session.execute(
text("""
- INSERT INTO project_members (id, project_id, user_id, role, joined_at)
- VALUES (:id, :project_id, :user_id, 'owner', NOW())
+ INSERT INTO project_members (id, project_id, user_id, role, created_at)
+ VALUES (:id, :project_id, :user_id, 'owner', NOW())
ON CONFLICT (project_id, user_id) DO NOTHING
"""),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| await session.execute( | |
| text(""" | |
| INSERT INTO project_members (id, project_id, user_id, role, joined_at) | |
| VALUES (:id, :project_id, :user_id, 'owner', NOW()) | |
| ON CONFLICT (project_id, user_id) DO NOTHING | |
| """), | |
| {"id": str(uuid4()), "project_id": str(project_id), "user_id": owner_id}, | |
| ) | |
| await session.execute( | |
| text(""" | |
| INSERT INTO project_members (id, project_id, user_id, role, created_at) | |
| VALUES (:id, :project_id, :user_id, 'owner', NOW()) | |
| ON CONFLICT (project_id, user_id) DO NOTHING | |
| """), | |
| {"id": str(uuid4()), "project_id": str(project_id), "user_id": owner_id}, | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/seed-project.py` around lines 91 - 98, The INSERT into
project_members in seed-project.py uses a non-existent column joined_at; update
the SQL to use created_at (matching the ProjectMember model) and keep the NOW()
value (i.e., INSERT ... created_at) so the seed uses the correct column for
timestamps; locate the INSERT block that binds id/project_id/user_id and replace
joined_at with created_at.
| if args.owl_url: | ||
| print(f"Downloading {args.owl_url}...") | ||
| with urllib.request.urlopen(args.owl_url) as response: | ||
| owl_content = response.read() | ||
| print(f" Downloaded {len(owl_content)} bytes") |
There was a problem hiding this comment.
Use async HTTP client instead of synchronous urllib.request.
urllib.request.urlopen is synchronous I/O, which violates the coding guideline to use async/await for all I/O operations. Additionally, it lacks a timeout parameter, which could cause the script to hang indefinitely.
As per coding guidelines: "Use async/await for all I/O operations in Python code".
Suggested fix using httpx
-import urllib.request
+import httpx
...
- if args.owl_url:
- print(f"Downloading {args.owl_url}...")
- with urllib.request.urlopen(args.owl_url) as response:
- owl_content = response.read()
- print(f" Downloaded {len(owl_content)} bytes")
+ if args.owl_url:
+ print(f"Downloading {args.owl_url}...")
+ async with httpx.AsyncClient() as client:
+ response = await client.get(args.owl_url, timeout=60.0)
+ response.raise_for_status()
+ owl_content = response.content
+ print(f" Downloaded {len(owl_content)} bytes")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/seed-project.py` around lines 230 - 234, Replace the synchronous
urllib.request.urlopen call that reads args.owl_url with an async HTTP client
call: make the surrounding function async (or create an async helper) and use
httpx.AsyncClient with a timeout to fetch the URL (await response.aread() or
response.read()) and assign to owl_content; ensure you call it via asyncio.run
if needed, and add basic exception handling for timeouts and HTTP errors so the
code no longer performs blocking I/O when reading args.owl_url.
| parts = args.upstream_repo.split("/") | ||
| if len(parts) != 2: | ||
| print(f"ERROR: --upstream-repo must be 'owner/name' format, got '{args.upstream_repo}'") | ||
| else: | ||
| await configure_upstream_sync( | ||
| session, project_id, | ||
| repo_owner=parts[0], | ||
| repo_name=parts[1], | ||
| branch=args.upstream_branch, | ||
| file_path=args.upstream_file or filename, | ||
| ) |
There was a problem hiding this comment.
Script continues after error instead of exiting.
When --upstream-repo format is invalid, the script prints an error but continues execution and reports success. It should exit with a non-zero status code.
Suggested fix
if args.upstream_repo:
parts = args.upstream_repo.split("/")
if len(parts) != 2:
print(f"ERROR: --upstream-repo must be 'owner/name' format, got '{args.upstream_repo}'")
+ sys.exit(1)
else:
await configure_upstream_sync(📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| parts = args.upstream_repo.split("/") | |
| if len(parts) != 2: | |
| print(f"ERROR: --upstream-repo must be 'owner/name' format, got '{args.upstream_repo}'") | |
| else: | |
| await configure_upstream_sync( | |
| session, project_id, | |
| repo_owner=parts[0], | |
| repo_name=parts[1], | |
| branch=args.upstream_branch, | |
| file_path=args.upstream_file or filename, | |
| ) | |
| parts = args.upstream_repo.split("/") | |
| if len(parts) != 2: | |
| print(f"ERROR: --upstream-repo must be 'owner/name' format, got '{args.upstream_repo}'") | |
| sys.exit(1) | |
| else: | |
| await configure_upstream_sync( | |
| session, project_id, | |
| repo_owner=parts[0], | |
| repo_name=parts[1], | |
| branch=args.upstream_branch, | |
| file_path=args.upstream_file or filename, | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/seed-project.py` around lines 296 - 306, The script currently prints
an error when args.upstream_repo is not in "owner/name" format but continues
execution; modify the block that parses parts = args.upstream_repo.split("/") so
that when len(parts) != 2 it exits with a non-zero status (e.g., call
sys.exit(1) or raise SystemExit) after printing the error, preventing the later
await configure_upstream_sync(...) from running; ensure you import sys if needed
and keep the rest of the flow (session, project_id, filename,
configure_upstream_sync) unchanged.
- Add ontokit/core/anonymous_token.py with HMAC-signed create/verify functions (24h TTL, anon: prefix to prevent token type confusion with beacon tokens) - Extend SuggestionSession model with is_anonymous, submitter_name, submitter_email, and client_ip columns - Add ontokit/schemas/anonymous_suggestion.py with AnonymousSessionCreateResponse, AnonymousSubmitRequest (honeypot field aliased as 'website'), and AnonymousSubmitResponse
…e limiting - Add ontokit/api/routes/anonymous_suggestions.py with create, save, submit, discard, and beacon endpoints - All endpoints gate on AUTH_MODE != 'required' (return 403 otherwise) - Rate limit: create endpoint checks for >= 5 anonymous sessions from same IP in last hour, returns 429 - Save/submit/discard authenticate via X-Anonymous-Token header using verify_anonymous_token() - Submit endpoint silently returns fake success for honeypot-filled bot requests - Add create_anonymous_session, save_anonymous, submit_anonymous, discard_anonymous methods to SuggestionService - Register anonymous_suggestions router in ontokit/api/routes/__init__.py under /projects prefix
- Add is_anonymous field to SuggestionSessionSummary schema
- _build_summary: use submitter_name/email (credit info) over user_name/email for anonymous sessions
- _create_pr_for_session: show 'Submitted anonymously' or 'Submitted by {name}' for anonymous sessions
- Existing authenticated session summaries are unchanged (backward compatible)
Adds is_anonymous, submitter_name, submitter_email, client_ip columns to suggestion_sessions table. These were added to the model in plan 10-01 but the migration was missing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The index path was only returning rdfs:label as labels and excluding all LABEL_PROPERTIES (skos:altLabel, skos:prefLabel, etc.) from annotations. This caused translations and synonyms stored as skos:altLabel to be invisible in browse mode. Fix: include non-rdfs:label entries from indexed_labels as annotations, matching the RDFLib fallback path behavior. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
ontokit/services/ontology_index.py (1)
659-684:⚠️ Potential issue | 🟠 MajorAvoid double-counting label properties in
annotations.Line 662 now includes all non-comment annotation rows, and Lines 679-684 append non-
rdfs:label
rows fromIndexedLabelinto the sameannotations_by_prop. This can duplicate values for overlapping
properties.💡 Suggested fix
+ non_rdfs_label_iris = [iri for iri, _ in LABEL_PROPERTIES if iri != rdfs_label_iri] annotations_result = await self.db.execute( select(IndexedAnnotation).where( IndexedAnnotation.entity_id == entity.id, IndexedAnnotation.property_iri != rdfs_comment_iri, + IndexedAnnotation.property_iri.notin_(non_rdfs_label_iris), ) )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ontokit/services/ontology_index.py` around lines 659 - 684, The annotations aggregation currently double-counts properties present in both IndexedAnnotation and IndexedLabel; when iterating non_rdfs_labels_result (from select(IndexedLabel)...), check against annotations_by_prop (the dict populated from annotations_result/scalars()) and skip or merge only new values for any property_iri already present. Update the loop that consumes non_rdfs_labels_result to reference annotations_by_prop and either continue if key in annotations_by_prop or deduplicate entries before appending so properties from IndexedLabel do not duplicate those from IndexedAnnotation (use the variables/objects IndexedLabel, IndexedAnnotation, annotations_result, non_rdfs_labels_result, annotations_by_prop, and rdfs_label_iri to locate the logic).
🧹 Nitpick comments (3)
ontokit/api/routes/__init__.py (1)
43-45: Minor tag naming inconsistency.The tag
"anonymous-suggestions"uses lowercase-hyphenated format while other tags use title case (e.g.,"Suggestions","Pull Requests"). Consider using"Anonymous Suggestions"for consistency in OpenAPI documentation.Suggested fix
router.include_router( - anonymous_suggestions.router, prefix="/projects", tags=["anonymous-suggestions"] + anonymous_suggestions.router, prefix="/projects", tags=["Anonymous Suggestions"] )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ontokit/api/routes/__init__.py` around lines 43 - 45, The OpenAPI tag for the anonymous suggestions router is inconsistent; update the tag string passed to router.include_router for anonymous_suggestions.router (currently "anonymous-suggestions") to use title case "Anonymous Suggestions" to match other tags like "Suggestions" and "Pull Requests" so documentation labels are consistent.ontokit/api/routes/anonymous_suggestions.py (1)
123-126: Honeypot response leaks detection to sophisticated bots.Returning
pr_number=0for honeypot-triggered requests differs from legitimate submissions that return actual PR numbers. A sophisticated bot could detect this pattern and retry without filling the honeypot field. Consider returning a realistic-looking fake PR number or using a randomized value.Suggested improvement
if data.honeypot is not None and data.honeypot != "": # Silent fake success — do not create anything - return AnonymousSubmitResponse(pr_number=0, pr_url=None, status="submitted") + import random + fake_pr = random.randint(100, 9999) + return AnonymousSubmitResponse(pr_number=fake_pr, pr_url=None, status="submitted")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ontokit/api/routes/anonymous_suggestions.py` around lines 123 - 126, The honeypot branch in anonymous_suggestions.py currently returns a fixed pr_number=0 which leaks detection; change the honeypot handling in the function that constructs AnonymousSubmitResponse so it returns a realistic-looking randomized PR number (e.g., a random integer in the same range/format as real PRs) and a matching pr_url pattern or None to match normal responses, keeping status value identical to legitimate submissions; use data.honeypot and AnonymousSubmitResponse as the reference points when implementing the randomized/fake-but-plausible response.ontokit/models/suggestion_session.py (1)
57-61: Consider adding an index for the rate-limiting query.The anonymous session rate-limiting query in
suggestion_service.pyfilters on(project_id, is_anonymous, client_ip, created_at). Without an index, this will require a full table scan as thesuggestion_sessionstable grows. Consider adding a partial index for anonymous sessions:__table_args__ = ( UniqueConstraint("project_id", "session_id", name="uq_suggestion_session"), Index( "ix_suggestion_sessions_rate_limit", "project_id", "client_ip", "created_at", postgresql_where=text("is_anonymous = true"), ), )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ontokit/models/suggestion_session.py` around lines 57 - 61, Add a partial index to the SuggestionSession ORM model to support the rate-limiting query (which filters on project_id, is_anonymous, client_ip, created_at): modify the SuggestionSession class in suggestion_session.py to include a __table_args__ tuple that preserves the existing UniqueConstraint on (project_id, session_id) and adds an Index named e.g. "ix_suggestion_sessions_rate_limit" over ("project_id", "client_ip", "created_at") with a PostgreSQL WHERE clause limiting it to is_anonymous = true (use sqlalchemy.text for the postgresql_where expression) so anonymous-session lookups used by the logic in suggestion_service.py can use the index.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ontokit/api/routes/anonymous_suggestions.py`:
- Around line 151-176: The anonymous beacon handler calls
service.beacon_save(project_id, data, data.session_id) but beacon_save expects a
beacon token and will call verify_beacon_token(token); instead, after
verify_anonymous_token(token) returns verified_session_id you must load the
SuggestionSession for that session ID and extract its stored beacon_token, then
call await service.beacon_save(project_id, data, beacon_token); update
anonymous_beacon_save (and/or add a service method like beacon_save_anonymous on
SuggestionService) to use the session's beacon_token rather than
data.session_id, referencing verify_anonymous_token, anonymous_beacon_save,
SuggestionService.beacon_save, and SuggestionBeaconRequest to locate the code.
In `@ontokit/models/suggestion_session.py`:
- Around line 57-61: The Alembic revision
t8u9v0w1x2y3_add_anonymous_suggestion_fields.py adds columns without length
constraints that conflict with the SuggestionSession model fields
(submitter_name, submitter_email, client_ip) which use String(255)/String(45);
update the migration's op.add_column calls to use sa.String(255) for
submitter_name and submitter_email and sa.String(45) for client_ip so the DB
schema matches the model definitions.
In `@ontokit/services/ontology_index.py`:
- Around line 689-693: The code appends label-derived annotations with an empty
language tag (lbl.lang or ""), causing inconsistent semantics; update the
fallback to a concrete tag (e.g., use "und" or "en") by replacing the
empty-string fallback in the annotations_by_prop append (the place that uses
lbl.value and lbl.lang) so it writes lbl.lang or "und" (or "en") instead of
lbl.lang or "" to ensure consistent language metadata.
---
Outside diff comments:
In `@ontokit/services/ontology_index.py`:
- Around line 659-684: The annotations aggregation currently double-counts
properties present in both IndexedAnnotation and IndexedLabel; when iterating
non_rdfs_labels_result (from select(IndexedLabel)...), check against
annotations_by_prop (the dict populated from annotations_result/scalars()) and
skip or merge only new values for any property_iri already present. Update the
loop that consumes non_rdfs_labels_result to reference annotations_by_prop and
either continue if key in annotations_by_prop or deduplicate entries before
appending so properties from IndexedLabel do not duplicate those from
IndexedAnnotation (use the variables/objects IndexedLabel, IndexedAnnotation,
annotations_result, non_rdfs_labels_result, annotations_by_prop, and
rdfs_label_iri to locate the logic).
---
Nitpick comments:
In `@ontokit/api/routes/__init__.py`:
- Around line 43-45: The OpenAPI tag for the anonymous suggestions router is
inconsistent; update the tag string passed to router.include_router for
anonymous_suggestions.router (currently "anonymous-suggestions") to use title
case "Anonymous Suggestions" to match other tags like "Suggestions" and "Pull
Requests" so documentation labels are consistent.
In `@ontokit/api/routes/anonymous_suggestions.py`:
- Around line 123-126: The honeypot branch in anonymous_suggestions.py currently
returns a fixed pr_number=0 which leaks detection; change the honeypot handling
in the function that constructs AnonymousSubmitResponse so it returns a
realistic-looking randomized PR number (e.g., a random integer in the same
range/format as real PRs) and a matching pr_url pattern or None to match normal
responses, keeping status value identical to legitimate submissions; use
data.honeypot and AnonymousSubmitResponse as the reference points when
implementing the randomized/fake-but-plausible response.
In `@ontokit/models/suggestion_session.py`:
- Around line 57-61: Add a partial index to the SuggestionSession ORM model to
support the rate-limiting query (which filters on project_id, is_anonymous,
client_ip, created_at): modify the SuggestionSession class in
suggestion_session.py to include a __table_args__ tuple that preserves the
existing UniqueConstraint on (project_id, session_id) and adds an Index named
e.g. "ix_suggestion_sessions_rate_limit" over ("project_id", "client_ip",
"created_at") with a PostgreSQL WHERE clause limiting it to is_anonymous = true
(use sqlalchemy.text for the postgresql_where expression) so anonymous-session
lookups used by the logic in suggestion_service.py can use the index.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 28db8a0d-f232-4e64-8329-cd0181080a9c
📒 Files selected for processing (9)
alembic/versions/t8u9v0w1x2y3_add_anonymous_suggestion_fields.pyontokit/api/routes/__init__.pyontokit/api/routes/anonymous_suggestions.pyontokit/core/anonymous_token.pyontokit/models/suggestion_session.pyontokit/schemas/anonymous_suggestion.pyontokit/schemas/suggestion.pyontokit/services/ontology_index.pyontokit/services/suggestion_service.py
✅ Files skipped from review due to trivial changes (2)
- alembic/versions/t8u9v0w1x2y3_add_anonymous_suggestion_fields.py
- ontokit/schemas/anonymous_suggestion.py
| @router.post( | ||
| "/{project_id}/suggestions/anonymous/beacon", | ||
| status_code=status.HTTP_204_NO_CONTENT, | ||
| ) | ||
| async def anonymous_beacon_save( | ||
| project_id: UUID, | ||
| data: SuggestionBeaconRequest, | ||
| service: Annotated[SuggestionService, Depends(get_service)], | ||
| token: str = Query(..., description="Anonymous session token for authentication"), | ||
| ) -> Response: | ||
| """Handle a sendBeacon flush for anonymous sessions. | ||
|
|
||
| Authenticated via 'token' query parameter (same pattern as authenticated beacon). | ||
| """ | ||
| _require_anonymous_mode() | ||
| verified_session_id = verify_anonymous_token(token) | ||
| if verified_session_id is None: | ||
| from fastapi import HTTPException | ||
|
|
||
| raise HTTPException( | ||
| status_code=status.HTTP_401_UNAUTHORIZED, | ||
| detail="Invalid or expired anonymous token", | ||
| ) | ||
| # Delegate to the existing beacon_save (session lookup is by session_id, no user check) | ||
| await service.beacon_save(project_id, data, data.session_id) | ||
| return Response(status_code=status.HTTP_204_NO_CONTENT) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if there's a beacon_save_anonymous method or similar in the service
ast-grep --pattern 'async def beacon_save$_($$$) {
$$$
}'Repository: CatholicOS/ontokit-api
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Find the suggestion service file
fd -type f -name "*suggestion*service*"Repository: CatholicOS/ontokit-api
Length of output: 236
🏁 Script executed:
#!/bin/bash
# Search for the beacon_save method definition
rg -A 30 "async def beacon_save" --type pyRepository: CatholicOS/ontokit-api
Length of output: 4580
🏁 Script executed:
#!/bin/bash
# Also search for any beacon_save_anonymous method
rg "beacon_save" --type py -B 2 -A 10Repository: CatholicOS/ontokit-api
Length of output: 5167
🏁 Script executed:
#!/bin/bash
# Search for beacon_save_anonymous or similar methods in the service
rg "def beacon_save" ontokit/services/ -A 3Repository: CatholicOS/ontokit-api
Length of output: 414
🏁 Script executed:
#!/bin/bash
# Check what verify_anonymous_token returns and understand the token flow
rg "verify_anonymous_token|beacon_token" -B 3 -A 3Repository: CatholicOS/ontokit-api
Length of output: 14050
Critical bug: beacon_save expects a beacon token, not a session ID — all anonymous beacon saves will fail.
The beacon_save method signature is async def beacon_save(self, project_id: UUID, data: SuggestionBeaconRequest, token: str), and its implementation calls verify_beacon_token(token) to validate the token. However, line 175 passes data.session_id (a plain identifier string) instead of a beacon token. This will cause verify_beacon_token(data.session_id) to fail immediately with "Invalid or expired beacon token" on every call.
Each session stores two separate tokens: an anonymous_token for initial authentication (verified at line 166 to get verified_session_id) and a beacon_token stored in the SuggestionSession model for beacon operations. The current code extracts verified_session_id but never uses it.
Pass the session's stored beacon_token to service.beacon_save instead, or create a dedicated beacon_save_anonymous method in the service that bypasses beacon token re-verification.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ontokit/api/routes/anonymous_suggestions.py` around lines 151 - 176, The
anonymous beacon handler calls service.beacon_save(project_id, data,
data.session_id) but beacon_save expects a beacon token and will call
verify_beacon_token(token); instead, after verify_anonymous_token(token) returns
verified_session_id you must load the SuggestionSession for that session ID and
extract its stored beacon_token, then call await service.beacon_save(project_id,
data, beacon_token); update anonymous_beacon_save (and/or add a service method
like beacon_save_anonymous on SuggestionService) to use the session's
beacon_token rather than data.session_id, referencing verify_anonymous_token,
anonymous_beacon_save, SuggestionService.beacon_save, and
SuggestionBeaconRequest to locate the code.
| # Anonymous session fields | ||
| is_anonymous: Mapped[bool] = mapped_column(Boolean, default=False, server_default="false") | ||
| submitter_name: Mapped[str | None] = mapped_column(String(255), nullable=True) | ||
| submitter_email: Mapped[str | None] = mapped_column(String(255), nullable=True) | ||
| client_ip: Mapped[str | None] = mapped_column(String(45), nullable=True) |
There was a problem hiding this comment.
Migration schema mismatch with model definitions.
The Alembic migration (per context snippet alembic/versions/t8u9v0w1x2y3_add_anonymous_suggestion_fields.py:18-22) uses sa.String() without length constraints for submitter_name, submitter_email, and client_ip, but the model specifies String(255), String(255), and String(45) respectively. On databases like MySQL that require explicit VARCHAR lengths, this mismatch will cause schema inconsistencies or failures.
Update the migration to include the length constraints:
op.add_column("suggestion_sessions", sa.Column("submitter_name", sa.String(255), nullable=True))
op.add_column("suggestion_sessions", sa.Column("submitter_email", sa.String(255), nullable=True))
op.add_column("suggestion_sessions", sa.Column("client_ip", sa.String(45), nullable=True))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ontokit/models/suggestion_session.py` around lines 57 - 61, The Alembic
revision t8u9v0w1x2y3_add_anonymous_suggestion_fields.py adds columns without
length constraints that conflict with the SuggestionSession model fields
(submitter_name, submitter_email, client_ip) which use String(255)/String(45);
update the migration's op.add_column calls to use sa.String(255) for
submitter_name and submitter_email and sa.String(45) for client_ip so the DB
schema matches the model definitions.
| annotations_by_prop[key].append( | ||
| { | ||
| "value": lbl.value, | ||
| "lang": lbl.lang or "", | ||
| } |
There was a problem hiding this comment.
Use a non-empty language fallback for label-derived annotation values.
On Line 692, lbl.lang or "" emits empty language tags for text labels. Prefer a concrete fallback
(e.g. "en" or "und") to keep response semantics consistent.
💡 Suggested fix
annotations_by_prop[key].append(
{
"value": lbl.value,
- "lang": lbl.lang or "",
+ "lang": lbl.lang or "en",
}
)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ontokit/services/ontology_index.py` around lines 689 - 693, The code appends
label-derived annotations with an empty language tag (lbl.lang or ""), causing
inconsistent semantics; update the fallback to a concrete tag (e.g., use "und"
or "en") by replacing the empty-string fallback in the annotations_by_prop
append (the place that uses lbl.value and lbl.lang) so it writes lbl.lang or
"und" (or "en") instead of lbl.lang or "" to ensure consistent language
metadata.
Summary
AUTH_MODE (Phase 8)
AUTH_MODE=required|optional|disabledin config + auth moduleANONYMOUS_USERreturned in disabled mode, existing behavior in required/optionalAnonymous Suggestions (Phase 10)
ontokit/core/anonymous_token.py— HMAC-signed tokens with 24h TTLontokit/api/routes/anonymous_suggestions.py— 5 endpoints (create, save, submit, discard, beacon) gated onAUTH_MODE != requiredsuggestion_sessionsmodel extended withis_anonymous,submitter_name,submitter_email,client_ipSeed Script
scripts/seed-project.py— import OWL files with direct PostgreSQL indexingIndex Fix
ontology_index.py— includeskos:altLabel,skos:prefLabelas annotations in class detailTest plan
python -m pytest tests/unit/test_auth_disabled.py -v— 11 tests passAUTH_MODE=optional— GET endpoints work without auth, POST returns 401python scripts/seed-project.py --help— shows usage🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Database Migration
New Scripts
Tests