Skip to content

feat: implement deposition upload workflow with semantics domain #30

@rorybyrne

Description

@rorybyrne

Implement the deposition upload workflow end-to-end — from ontology and schema definitions through to published records — with spreadsheet-based metadata entry (template generation + upload/validation).

Frontend pages are tracked separately in #57.

Architecture Overview

Semantics Domain          Deposition Domain                  Downstream
─────────────────         ──────────────────                 ──────────
Ontology ◄──────────┐     Convention ◄──── Deposition ──► Validation ──► Curation ──► Record
Schema ──────────────┘          │               │                            │
  (typed fields ref ontologies) │          (metadata validated       (validator metrics
                                │           against schema)          attached to Record)
                           bundles:
                           - Schema ref
                           - File requirements
                           - Validator refs

Domain boundaries

Domain Aggregates Responsibility
Semantics (new) Ontology, Schema Defining meaning — terms, hierarchies, typed metadata structure
Deposition (enhanced) Convention, Deposition Submission lifecycle — templates, drafts, uploads, submission
Validation (existing) ValidationRun Execute validators against submissions; emit structured metrics/metadata
Curation (existing) Human/auto approval gate
Record (existing) Record Immutable published records, with validator-emitted metrics attached

Semantics Domain

New domain: osa/domain/semantics/

Ontology

Replaces the spec's "Vocabulary" concept. A versioned collection of Terms with optional hierarchical relationships. Can be imported from external sources (UBERON, Mondo, NCBI Taxonomy) or created within OSA.

Note: The existing VocabSRN type will be removed and replaced with OntologySRN. We are pre-launch so no deprecation is needed.

SRN: urn:osa:{node}:onto:{id}@{version}

Each Term has:

  • term_id: canonical identifier (e.g. UBERON:0001950)
  • label: human-readable name (e.g. "neocortex")
  • synonyms: alternative names
  • parent_ids: optional is_a / part_of relationships (empty for flat ontologies)
  • definition: optional description
  • deprecated: whether the term is deprecated

A flat ontology (like osa:sex with 4 terms) and a deep hierarchy (like UBERON with 16k terms) use the same abstraction. Flat is just "no parent relationships."

Baseline ontologies (for initial testing — real ontology imports like UBERON/NCBI Taxonomy are a future issue):

  • osa:sex — male, female, mixed, unknown
  • osa:boolean — true, false
  • osa:data-format — common scientific file formats
  • osa:license — open data licenses

Schema

Defines the metadata structure for a class of depositions. A list of typed fields where categorical data is ontology-backed.

SRN: urn:osa:{node}:schema:{id}@{version}

Field types:

Type Purpose Constraints
text Free text (title, description) Optional min/max length, regex
number Numeric (sample count, age) Optional unit, range, integer_only
date Temporal (collection date) ISO format
boolean Flag
term Ontology-backed categorical value References an Ontology SRN; optional root_term constraint
url Links (DOI, ORCID) Valid URL, optional pattern

Each field specifies: name, type, type-specific config, required (bool), cardinality (exactly_one, one_or_more, zero_or_more), description.

Example schema:

{
  "srn": "urn:osa:localhost:schema:scrnaseq-metadata@1",
  "title": "scRNA-seq Metadata",
  "fields": [
    { "name": "title", "type": "text", "required": true, "cardinality": "exactly_one", "description": "Dataset title" },
    { "name": "tissue", "type": "term", "ontology": "urn:osa:localhost:onto:uberon@1", "root_term": "UBERON:0000479", "required": true, "cardinality": "one_or_more" },
    { "name": "organism", "type": "term", "ontology": "urn:osa:localhost:onto:ncbi-taxonomy@1", "required": true, "cardinality": "exactly_one" },
    { "name": "sex", "type": "term", "ontology": "urn:osa:localhost:onto:sex@1", "required": true, "cardinality": "exactly_one" },
    { "name": "sample_count", "type": "number", "integer_only": true, "range": [1, null], "required": true, "cardinality": "exactly_one" },
    { "name": "collection_date", "type": "date", "required": false, "cardinality": "exactly_one" },
    { "name": "doi", "type": "url", "pattern": "^https://doi.org/", "required": false, "cardinality": "exactly_one" }
  ]
}

Deposition Domain (enhanced)

Convention

Lives in the Deposition domain. The user-facing "submission template" — what a depositor selects to know what's expected.

SRN: urn:osa:{node}:conv:{id}@{version}

A Convention bundles:

  • Schema reference (SRN) — what metadata to provide
  • Validator references — which OCI validators to run on submission
  • File requirements — accepted file types, min/max count, size limits
  • Description — human-readable explanation of what this convention is for

Deposition lifecycle

DRAFT ──► SUBMITTED ──► IN_REVIEW ──► ACCEPTED ──► RECORD
  │                                       │
editable                              published
add files                             immutable
  • Draft: Metadata and files are editable. Created against a Convention.
  • Submitted: Immutable. Validation pipeline triggered automatically.
  • In Review: Validation complete. Awaiting curation decision.
  • Accepted: Curator approved. Record created with validator metrics attached.

Note: Uses the existing DepositionStatus values (DRAFT, SUBMITTED, IN_REVIEW, ACCEPTED, REJECTED).

Validators and Record Metrics

Validators are OCI containers referenced by a Convention. When a deposition is submitted, the validators defined in its Convention are executed. Validators emit structured results (metrics, metadata, checks) which are attached to the published Record.

This means:

  • No separate "Trait" abstraction is needed — validator results are the searchable surface
  • Records carry rich, queryable validator output (e.g. "cell_viability: 94.2%") rather than binary stamps
  • Search can filter over validator-emitted metrics directly (e.g. "show me records where cell_viability > 90%")
  • A boolean "all checks passed" can be derived at query time without baking it into the model

Design note: The Trait concept from the spec is deferred. If a formal Trait registry is needed later, it can be layered on top of the validator→metrics model without breaking changes.

Record Versioning — Out of Scope

Record versioning (updating existing records via new depositions, rec:abc123@2) is deferred to a future issue. All published records will be @1 for now.

Spreadsheet-Based Metadata

Template generation

Server generates .xlsx from a Convention's Schema:

  • Column headers = field names
  • Description row = field descriptions, type info, requirements
  • For small ontology-backed fields (sex, license, boolean): Excel data validation dropdowns
  • For large ontology-backed fields (tissue, disease): column header includes ontology name + instructions to enter term IDs
  • Required fields visually distinguished (bold headers, coloured columns)

Upload and validation

  • Parse .xlsx server-side, validate each cell against the Schema
  • Type checking (text, number, date, boolean, url)
  • Term validation (is this a valid term in the referenced ontology?)
  • Required field checking
  • Cardinality checking
  • Return structured validation results for display

File Storage

FileStore (port)
├── LocalFileStore        # Development
└── S3FileStore           # Production (S3-compatible, future)
  • Draft files: temporary, deletable
  • Published files: permanent, immutable
  • Checksum verification on upload

Out of Scope (future issues)

  • Frontend pages (convention catalog, upload page, dashboard) — feat: implement deposition frontend — convention catalog, upload, and dashboard #57
  • Real ontology imports (UBERON, NCBI Taxonomy via OWL/OBO pipeline)
  • Ontology browser UI (rich typeahead, tree explorer)
  • Ontology term search / autocomplete
  • Guided wizard / form-based metadata entry
  • Hierarchical ontology queries (closure table traversal)
  • Embedding-based fuzzy term resolution
  • S3FileStore adapter
  • Resumable uploads
  • Collaboration on drafts
  • Batch deposition (multiple rows = multiple depositions)
  • Record versioning (updating existing records)
  • Trait registry abstraction

Dependencies

Acceptance Criteria

  • Ontologies can be created with terms (flat, via API) using a basic test ontology
  • Schemas can be created with typed fields referencing ontologies
  • Conventions can be created bundling a schema + validator refs + file requirements
  • Template .xlsx can be downloaded for a convention
  • Filled spreadsheet can be uploaded and validated against schema
  • Term fields are validated against referenced ontologies
  • Deposition can be created, metadata provided, files uploaded, and submitted
  • Submitted deposition triggers validators defined in its convention
  • Validator results are attached to published Record as queryable metrics
  • Full pipeline works: submit → validate → curate → record
  • Only owner can edit/submit their depositions

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions