feat: make S3 the canonical store for hook-produced features

## Summary

Make S3 (or CAS blob storage) the canonical, durable store for hook-produced features (e.g., protein pocket predictions, alignment metrics, QC checks). PostgreSQL materialization becomes an optional read-optimized projection that can be rebuilt from the canonical source at any time.

## Motivation

Currently, hook-produced features flow through a one-way pipeline:

```
Hook k8s Job → writes features.json to disk → InsertRecordFeatures reads it
→ creates dynamic PG table → inserts rows → features.json is ephemeral
```

PostgreSQL is the sole source of truth. If the database is lost or a feature table needs to be rebuilt (schema change, index update, bug fix), the original data is gone. The `features.json` written by the hook is a transport artifact that is not durably stored.

This is fragile for a system that stores scientific data. Hook outputs are computed artifacts that may be expensive to regenerate (hours of compute, external API calls). They should be stored durably alongside the record's data files.

## Proposed Architecture

```
Hook k8s Job → writes features.json
  → stored durably in S3/CAS as a canonical artifact (source of truth)
  → optionally materialized into PG for fast SQL queries (read projection)
```

This aligns with the existing CQRS pattern: S3 is the write model, PG feature tables are the read model. Materialization is an event-driven projection that can be replayed.

### Key Properties

- **S3 is canonical**: feature data is durable, versioned by content hash (when CAS lands), and always recoverable
- **PG is optional**: operators can choose which features to materialize for SQL query performance
- **Rebuildable**: PG feature tables can be dropped and rebuilt from S3 at any time (e.g., after schema evolution)
- **Decoupled**: feature storage policy is independent of how records are created (deposition, harvest, fork, import)

## Relationship to Other Issues

- **#103** (Record source-agnostic): no direct dependency — #103 doesn't change feature storage
- **#104** (Harvest domain): harvest-produced records emit features through the same pipeline. This issue ensures features from any pathway are durably stored
- **#99** (CAS): CAS changes the addressing scheme (content hash vs path). This issue changes the durability/materialization policy. They are independent but complementary — this issue can land before or after CAS

## Open Questions

- Should materialization be opt-in per hook definition (e.g., `materialize: true` in HookDefinition), or should all features be materialized by default with an opt-out?
- Should the canonical S3 representation be the raw `features.json` from the hook, or a normalized format?
- When PG tables are rebuilt from S3, how is schema evolution handled (e.g., hook adds a new column)?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: make S3 the canonical store for hook-produced features #105

Summary

Motivation

Proposed Architecture

Key Properties

Relationship to Other Issues

Open Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: make S3 the canonical store for hook-produced features #105

Description

Summary

Motivation

Proposed Architecture

Key Properties

Relationship to Other Issues

Open Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions