Skip to content

feat: implement deposition upload workflow with semantics domain#62

Merged
rorybyrne merged 3 commits intomainfrom
030-feat-implement-deposition
Feb 17, 2026
Merged

feat: implement deposition upload workflow with semantics domain#62
rorybyrne merged 3 commits intomainfrom
030-feat-implement-deposition

Conversation

@rorybyrne
Copy link
Copy Markdown
Contributor

Summary

  • Semantics domain: Ontology and Schema aggregates with CRUD commands/queries, OBO Graphs import, and full DI wiring
  • Deposition domain: Convention aggregate, enhanced deposition lifecycle (DRAFT → IN_VALIDATION → IN_REVIEW → ACCEPTED/REJECTED), spreadsheet template generation/parsing, file storage with checksums, return-to-draft on validation failure
  • Frontend: Deposit wizard (4-step flow), deposition detail page, SDK rewrite with separate auth/search/deposition modules, mock layer for development
  • Infrastructure: Postgres repositories for ontologies/schemas/conventions, cross-domain read adapters, openpyxl spreadsheet adapter, local file storage adapter, worker transaction cleanup

Test plan

  • 399 unit tests pass (just test)
  • Ruff lint clean (just lint)
  • Web lint + build pass (pnpm lint && pnpm build)
  • Manual: create ontology → schema → convention → deposition → template → upload → submit → validate → record

Closes #30

Add new semantics domain (ontologies + schemas) and enhance the
deposition domain with conventions, spreadsheet-based metadata entry,
file management, and the validation return-to-draft pipeline.

New domains and aggregates:
- Ontology: immutable versioned term collections with hierarchy support
- Schema: typed field definitions referencing ontologies
- Convention: submission templates bundling schema + file requirements

Deposition enhancements:
- Full lifecycle: create → metadata → files → submit → return-to-draft
- Spreadsheet template generation with ontology-backed dropdowns
- Spreadsheet upload with structural validation
- File upload with SHA-256 checksums and convention enforcement
- Domain events for all deposition mutations

Infrastructure:
- OBO Graphs parser for ontology import from external sources
- Postgres repositories for ontology, schema, convention
- Cross-domain reader adapters (SchemaReader, OntologyReader)
- LocalFileStorageAdapter with atomic writes
- OpenpyxlSpreadsheetAdapter for .xlsx generation and parsing
- REST routes: /ontologies, /schemas, /conventions, /depositions
- Database migration for new tables

Closes #30

396 unit tests passing.
- Deposit wizard UI with 4-step flow (convention, template, metadata, files)
- Deposition detail page at /deposition/[srn] (client component for auth)
- SDK layer rewrite: separate http, auth, search, deposition modules
- Mock deposition SDK with tests for demo/development
- Auth service: fix session commit, configurable base_role
- Worker: remove explicit commits (UOW scope handles it)
- Vector index: skip records with no indexable content gracefully
- SRN deserialization fix for versioned resources
- Seed script for sample convention and schema
- Header navigation with Deposit link
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 15, 2026

Code Coverage

Package Line Rate Complexity Health
. 73% 0
application 0% 0
application.api 100% 0
application.api.rest 0% 0
application.api.v1 82% 0
application.api.v1.routes 9% 0
application.event 100% 0
cli 40% 0
cli.commands 18% 0
cli.util 53% 0
domain 100% 0
domain.auth 100% 0
domain.auth.command 97% 0
domain.auth.event 100% 0
domain.auth.model 92% 0
domain.auth.port 99% 0
domain.auth.query 93% 0
domain.auth.service 90% 0
domain.auth.util 100% 0
domain.auth.util.di 81% 0
domain.curation 100% 0
domain.curation.adapter 100% 0
domain.curation.command 100% 0
domain.curation.event 100% 0
domain.curation.handler 92% 0
domain.curation.model 100% 0
domain.curation.port 100% 0
domain.curation.query 100% 0
domain.curation.service 100% 0
domain.deposition 100% 0
domain.deposition.adapter 100% 0
domain.deposition.command 69% 0
domain.deposition.event 100% 0
domain.deposition.handler 100% 0
domain.deposition.model 98% 0
domain.deposition.port 100% 0
domain.deposition.query 61% 0
domain.deposition.service 98% 0
domain.export 100% 0
domain.export.adapter 100% 0
domain.export.command 100% 0
domain.export.event 100% 0
domain.export.model 100% 0
domain.export.port 100% 0
domain.export.query 100% 0
domain.export.service 100% 0
domain.index 100% 0
domain.index.event 100% 0
domain.index.handler 76% 0
domain.index.model 84% 0
domain.index.service 100% 0
domain.record 100% 0
domain.record.adapter 100% 0
domain.record.command 100% 0
domain.record.event 100% 0
domain.record.handler 100% 0
domain.record.model 100% 0
domain.record.port 100% 0
domain.record.query 100% 0
domain.record.service 100% 0
domain.search 100% 0
domain.search.adapter 100% 0
domain.search.command 100% 0
domain.search.event 100% 0
domain.search.model 100% 0
domain.search.port 100% 0
domain.search.query 100% 0
domain.search.service 100% 0
domain.semantics 100% 0
domain.semantics.command 31% 0
domain.semantics.event 100% 0
domain.semantics.handler 100% 0
domain.semantics.model 100% 0
domain.semantics.port 100% 0
domain.semantics.query 0% 0
domain.semantics.service 100% 0
domain.semantics.util 100% 0
domain.semantics.util.di 0% 0
domain.shared 91% 0
domain.shared.authorization 87% 0
domain.shared.model 90% 0
domain.shared.port 100% 0
domain.source 100% 0
domain.source.event 100% 0
domain.source.handler 0% 0
domain.source.model 76% 0
domain.source.schedule 0% 0
domain.source.service 92% 0
domain.validation 100% 0
domain.validation.adapter 100% 0
domain.validation.command 0% 0
domain.validation.event 100% 0
domain.validation.handler 83% 0
domain.validation.model 100% 0
domain.validation.port 100% 0
domain.validation.query 100% 0
domain.validation.service 0% 0
infrastructure 100% 0
infrastructure.auth 0% 0
infrastructure.event 66% 0
infrastructure.http 36% 0
infrastructure.index 0% 0
infrastructure.index.vector 73% 0
infrastructure.messaging 100% 0
infrastructure.oci 0% 0
infrastructure.persistence 63% 0
infrastructure.persistence.adapter 75% 0
infrastructure.persistence.mappers 55% 0
infrastructure.persistence.repository 32% 0
infrastructure.source 0% 0
sdk 100% 0
sdk.index 100% 0
sdk.source 100% 0
util 100% 0
util.di 25% 0
Summary 58% (3441 / 5952) 0

@rorybyrne
Copy link
Copy Markdown
Contributor Author

@greptile

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Feb 15, 2026

Greptile Summary

Large feature PR adding two new bounded contexts (Semantics and Deposition) with a full deposition upload workflow: ontology/schema/convention CRUD, deposition lifecycle (DRAFT → IN_VALIDATION → IN_REVIEW → ACCEPTED/REJECTED), spreadsheet template generation/parsing, file storage with checksums, and a frontend deposit wizard. Also includes an SDK rewrite splitting the monolithic OSAClient into namespace modules (auth/search/deposition) with a mock layer, and moves session commit responsibility from individual worker call sites to the session teardown in DI.

  • Security: Path traversal in file storage — User-supplied filenames are used directly in filesystem paths in LocalFileStorageAdapter without sanitization. A malicious filename like ../../etc/crontab could write outside the deposition directory. This must be fixed before merge.
  • Security: Missing ownership checks — Any authenticated user with DEPOSITOR role can upload to, delete from, or submit any other user's deposition. The service layer does not verify the caller owns the deposition.
  • Security: Content-Disposition header injection — Unsanitized filenames are interpolated into HTTP response headers, enabling potential header injection.
  • Migration mismatchconvention_srn is added as nullable=True in the migration but declared nullable=False in the table definition and domain model, which will cause issues with existing data or runtime errors.
  • Architecture is well-structured — The domain model design (aggregates, value objects, ports, services) follows clean architecture patterns consistently. Cross-domain read adapters properly decouple semantics from deposition. The event-driven validation pipeline with return-to-draft is well thought out.
  • Worker commit cleanup — Moving session.commit() from ~8 individual call sites to the session generator teardown is a nice simplification, though it means read-only operations also trigger commits.

Confidence Score: 2/5

  • This PR has a path traversal vulnerability and missing authorization checks that should be addressed before merging.
  • The architecture and domain modeling are strong, but the file storage path traversal vulnerability is a critical security issue that could allow arbitrary file writes. Combined with missing ownership verification on deposition operations and the migration/table nullability mismatch, these issues need to be resolved before this is safe to merge. The 399 passing tests are encouraging but don't cover the identified security gaps.
  • Pay close attention to server/osa/infrastructure/persistence/adapter/storage.py (path traversal), server/osa/domain/deposition/service/deposition.py (missing ownership checks), server/migrations/versions/add_deposition_tables.py (nullable mismatch), and server/osa/application/api/v1/routes/depositions.py (header injection, memory usage).

Important Files Changed

Filename Overview
server/osa/infrastructure/persistence/adapter/storage.py Path traversal vulnerability: user-supplied filenames used directly in filesystem paths without sanitization in save_file, get_file, and delete_file.
server/migrations/versions/add_deposition_tables.py Migration adds convention_srn as nullable=True but table definition and domain model require it as non-nullable. Creates ontologies, schemas, conventions tables correctly.
server/osa/domain/deposition/service/deposition.py Core deposition service with clean lifecycle management (create, upload, submit), but lacks ownership verification — any DEPOSITOR can modify any deposition.
server/osa/application/api/v1/routes/depositions.py Full CRUD routes for depositions with file upload/download. Content-Disposition header injection risk and full file buffering in memory for uploads.
server/osa/domain/deposition/model/aggregate.py Well-structured aggregate with proper state guards (_require_draft), clear state transitions, and appropriate domain validation logic.
server/osa/infrastructure/persistence/adapter/spreadsheet.py Openpyxl adapter for template generation and parsing. Only parses row 3 (single data row), which may be intentional for v1 but limits multi-row spreadsheets.
server/osa/infrastructure/persistence/di.py Added auto-commit to session teardown, replacing manual commits throughout worker. Significant behavioral change that commits even for read-only operations.
server/osa/domain/shared/model/srn.py Extended SRN system with OntologySRN, ConventionSRN types and added model_validator for string parsing and model_serializer for string output. Clean Pydantic integration.
server/osa/domain/auth/service/auth.py Added base_role auto-assignment for new users. Clean implementation with proper logging, though verbose debug logging could be cleaned up before production.
web/src/lib/sdk/index.ts SDK refactored from single OSAClient to namespace pattern (auth/search/deposition) with mock layer support. Clean separation of concerns.
web/src/components/deposit/DepositWizard.tsx 4-step deposit wizard with convention selection, template download, metadata upload, and data file upload. No error handling shown to user on submission failure.
server/osa/infrastructure/persistence/tables.py Added ontologies, ontology_terms, schemas, and conventions tables with proper indexes and constraints. Convention_srn on depositions correctly marked as non-nullable (mismatched with migration).

Sequence Diagram

sequenceDiagram
    participant U as User/Frontend
    participant API as FastAPI Routes
    participant CMD as Command Handlers
    participant SVC as DepositionService
    participant CONV as ConventionRepo
    participant REPO as DepositionRepo
    participant FS as FileStorage
    participant OB as Outbox
    participant W as Worker
    participant VH as ValidateDeposition

    U->>API: POST /depositions {convention_srn}
    API->>CMD: CreateDepositionHandler.run()
    CMD->>SVC: create(convention_srn, owner_id)
    SVC->>CONV: get(convention_srn)
    CONV-->>SVC: Convention
    SVC->>REPO: save(deposition)
    SVC->>OB: append(DepositionCreatedEvent)
    SVC-->>U: DepositionCreated {srn}

    U->>API: POST /depositions/{srn}/spreadsheet
    API->>CMD: UploadSpreadsheetHandler.run()
    CMD->>SVC: update_metadata(srn, parsed_data)
    SVC->>REPO: save(deposition)
    SVC-->>U: SpreadsheetUploaded

    U->>API: POST /depositions/{srn}/files
    API->>CMD: UploadFileHandler.run()
    CMD->>SVC: upload_file(srn, filename, content, size)
    SVC->>CONV: get(convention_srn) [validate file reqs]
    SVC->>FS: save_file(srn, filename, content, size)
    FS-->>SVC: DepositionFile
    SVC->>REPO: save(deposition)
    SVC->>OB: append(FileUploadedEvent)
    SVC-->>U: FileUploaded

    U->>API: POST /depositions/{srn}/submit
    API->>CMD: SubmitDepositionHandler.run()
    CMD->>SVC: submit(srn)
    SVC->>REPO: save(deposition [IN_VALIDATION])
    SVC->>OB: append(DepositionSubmittedEvent)
    SVC-->>U: DepositionSubmitted

    W->>OB: claim(DepositionSubmittedEvent)
    W->>VH: handle(event)
    VH->>OB: append(ValidationCompleted or ValidationFailed)

    Note over W,VH: On ValidationFailed:
    W->>OB: claim(ValidationFailed)
    W->>REPO: get(deposition)
    W->>REPO: save(deposition [DRAFT])
Loading

Last reviewed commit: d22a102

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

156 files reviewed, 7 comments

Edit Code Review Agent Settings | Greptile

) -> DepositionFile:
dep_dir = self._dep_dir(deposition_id)
dep_dir.mkdir(parents=True, exist_ok=True)
target = dep_dir / filename
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Path traversal vulnerability via unsanitized filename

The filename parameter comes directly from user input (file.filename in the upload route) and is used to construct a filesystem path without any sanitization. A malicious user could upload a file with a name like ../../etc/passwd or ../../../important_file to write outside the deposition directory.

This affects save_file, get_file, and delete_file — all of which construct paths as dep_dir / filename.

You should sanitize the filename to strip path separators and ensure the resolved path stays within the deposition directory. For example:

Suggested change
target = dep_dir / filename
target = dep_dir / Path(filename).name
if not target.resolve().is_relative_to(dep_dir.resolve()):
raise ValueError(f"Invalid filename: {filename}")

Comment on lines +81 to +84
op.add_column(
"depositions",
sa.Column("convention_srn", sa.String(), nullable=True),
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migration/table definition nullability mismatch

The migration adds convention_srn as nullable=True, but the SQLAlchemy table definition in tables.py:29 declares it as nullable=False, and the domain model requires convention_srn as a mandatory field on Deposition. This mismatch means:

  1. The migration will create the column as nullable in the database
  2. SQLAlchemy's table metadata believes it's non-nullable
  3. Any existing deposition rows would need a convention_srn value to satisfy the constraint

If this runs against a database with existing depositions, the ALTER TABLE ADD COLUMN with nullable=True will succeed, but those rows will have NULL for convention_srn, which will cause errors when the ORM tries to load them. The migration should either set a server_default or include a data migration step for existing rows, then alter to NOT NULL.

Suggested change
op.add_column(
"depositions",
sa.Column("convention_srn", sa.String(), nullable=True),
)
op.add_column(
"depositions",
sa.Column("convention_srn", sa.String(), nullable=False, server_default=""),
)

Comment on lines +85 to +127
async def upload_file(
self,
srn: DepositionSRN,
filename: str,
content: bytes,
size: int,
) -> Deposition:
dep = await self.get(srn)
convention = await self.convention_repo.get(dep.convention_srn)
if convention is None:
raise NotFoundError(f"Convention not found: {dep.convention_srn}")

reqs = convention.file_requirements

# Validate file type
ext = _get_extension(filename)
if reqs.accepted_types and ext not in reqs.accepted_types:
raise ValidationError(f"File type '{ext}' not accepted. Allowed: {reqs.accepted_types}")

# Validate file size
if size > reqs.max_file_size:
raise ValidationError(f"File size {size} exceeds maximum {reqs.max_file_size}")

# Validate max count
if len(dep.files) >= reqs.max_count:
raise ValidationError(
f"Maximum {reqs.max_count} files allowed, already have {len(dep.files)}"
)

# Store the file and get back the DepositionFile VO
saved_file = await self.file_storage.save_file(srn, filename, content, size)
dep.add_file(saved_file)
await self.deposition_repo.save(dep)

event = FileUploadedEvent(
id=EventId(uuid4()),
deposition_id=srn,
filename=saved_file.name,
size=saved_file.size,
checksum=saved_file.checksum,
)
await self.outbox.append(event)
return dep
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No ownership verification for file operations

The upload_file, delete_file, update_metadata, and submit methods on DepositionService do not verify that the requesting user owns the deposition. The command handlers receive a principal but only check the role (e.g., at_least(Role.DEPOSITOR)), and the service methods accept the SRN without validating ownership.

This means any authenticated user with the DEPOSITOR role can upload files to, delete files from, modify metadata of, or submit another user's deposition by knowing (or guessing) its SRN.

Consider adding an ownership check in the service layer, e.g.:

async def upload_file(self, srn, filename, content, size, *, caller_id: UserId):
    dep = await self.get(srn)
    if dep.owner_id != caller_id:
        raise AuthorizationError("Not the owner of this deposition")
    ...

Or leverage the existing @writes(owner(…)) decorator pattern used in PostgresDepositionRepository.save.

Comment on lines +119 to +130
@router.get("/{srn:path}/files/{filename}")
async def download_file(
srn: str,
filename: str,
handler: FromDishka[DownloadFileHandler],
) -> StreamingResponse:
result = await handler.run(DownloadFile(srn=DepositionSRN.parse(srn), filename=filename))
return StreamingResponse(
result.stream,
media_type=result.content_type or "application/octet-stream",
headers={"Content-Disposition": f'attachment; filename="{result.filename}"'},
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Content-Disposition header injection via unsanitized filename

The filename path parameter and the result.filename value are interpolated directly into the Content-Disposition header without sanitization. A filename containing a " character or newlines could break out of the header value, enabling HTTP response header injection.

Consider sanitizing the filename in the response header, for example by stripping or escaping special characters:

safe_name = result.filename.replace('"', '_').replace('\n', '_').replace('\r', '_')
headers={"Content-Disposition": f'attachment; filename="{safe_name}"'},

Comment on lines +107 to +116
) -> FileUploaded:
content = await file.read()
return await handler.run(
UploadFileCommand(
srn=DepositionSRN.parse(srn),
filename=file.filename or "unknown",
content=content,
size=len(content),
)
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Entire file content loaded into memory

await file.read() loads the full uploaded file into memory as a bytes object before passing it to the command. For large data files (the convention max_file_size can be up to ~10 GB based on the mock data), this will cause memory exhaustion and crash the server.

Consider streaming the content to disk in chunks rather than buffering the whole file in memory. The UploadFile object from FastAPI wraps a SpooledTemporaryFile that supports chunked reading.

Comment on lines +41 to +49
srn: str,
handler: FromDishka[DownloadTemplateHandler],
) -> StreamingResponse:
result = await handler.run(DownloadTemplate(convention_srn=ConventionSRN.parse(srn)))
return StreamingResponse(
iter([result.content]),
media_type="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
headers={"Content-Disposition": f'attachment; filename="{result.filename}"'},
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Route order: /{srn:path}/template must be declared before /{srn:path}

FastAPI evaluates routes in declaration order. The GET /{srn:path} route at line 54 uses a path parameter that will match everything including /some-srn/template. Since /{srn:path} is declared after /{srn:path}/template here, this specific router is fine.

However, in depositions.py, the same pattern is used and the GET /{srn:path} route (line 159) is declared after the more specific sub-path routes — which is correct. Just flagging this as something to be aware of if routes are reordered in the future. The current order is correct for this file.

@@ -48,6 +69,7 @@ async def get_session(
) -> AsyncIterable[AsyncSession]:
async with session_factory() as session:
yield session
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-commit added to session teardown

This await session.commit() in the session generator is a significant behavioral change — it means every UOW scope now auto-commits when the session is yielded back. This is intentional (it replaces the manual session.commit() calls that were removed from the worker), but it also means that any query-only operations (reads, listing) will trigger a commit, which is unnecessary overhead and could commit unintended writes if a read handler accidentally mutates state.

The worker cleanup removing manual commits makes sense given this change, but consider whether auto-committing for read-only operations is desirable. A common pattern is to only commit when writes are detected, or to use separate read/write session factories.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

- Make convention_srn non-nullable in depositions table migration
- Add filename sanitization for Content-Disposition headers to prevent
 CRLF injection attacks
- Implement path traversal protection in LocalFileStorageAdapter by
 validating filenames and ensuring resolved paths stay within bounds
- Add comprehensive tests for security fixes
@rorybyrne rorybyrne merged commit ba41300 into main Feb 17, 2026
6 checks passed
@rorybyrne rorybyrne deleted the 030-feat-implement-deposition branch February 17, 2026 11:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: implement deposition upload workflow with semantics domain

1 participant