From 46d9aaf5b17ec996809a8c0a500c072e7a37f8f1 Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Wed, 21 Jan 2026 13:29:48 -0600 Subject: [PATCH 01/10] docs: Expand type system diagram with MySQL/PostgreSQL native types - Show both MySQL and PostgreSQL native type layers - Add uuid core type mapping to BINARY(16) / UUID - Add table showing native type equivalents and portable alternatives - Clarify that native types can be used directly at cost of portability Co-Authored-By: Claude Opus 4.5 --- src/explanation/type-system.md | 46 +++++++++++++++++++++++++++------- 1 file changed, 37 insertions(+), 9 deletions(-) diff --git a/src/explanation/type-system.md b/src/explanation/type-system.md index c3f1a4fc..719d38e4 100644 --- a/src/explanation/type-system.md +++ b/src/explanation/type-system.md @@ -24,13 +24,23 @@ graph TB varchar json bytes + uuid end - subgraph "Layer 1: Native" + subgraph "Layer 1: Native (MySQL)" INT["INT"] DOUBLE["DOUBLE"] - VARCHAR["VARCHAR"] - JSON_N["JSON"] + VARCHAR_M["VARCHAR"] + JSON_M["JSON"] BLOB["LONGBLOB"] + BIN16["BINARY(16)"] + end + subgraph "Layer 1: Native (PostgreSQL)" + INTEGER["INTEGER"] + DOUBLE_P["DOUBLE PRECISION"] + VARCHAR_P["VARCHAR"] + JSON_P["JSON"] + BYTEA["BYTEA"] + UUID_P["UUID"] end blob --> bytes @@ -38,23 +48,41 @@ graph TB npy --> json object --> json hash --> json + bytes --> BLOB - json --> JSON_N + bytes --> BYTEA + json --> JSON_M + json --> JSON_P int32 --> INT + int32 --> INTEGER float64 --> DOUBLE - varchar --> VARCHAR + float64 --> DOUBLE_P + varchar --> VARCHAR_M + varchar --> VARCHAR_P + uuid --> BIN16 + uuid --> UUID_P ``` +Core types provide **portability** — the same table definition works on both MySQL and PostgreSQL. Native types can be used directly but sacrifice cross-backend compatibility. + ## Layer 1: Native Database Types -Backend-specific types (MySQL, PostgreSQL). **Discouraged for direct use.** +Backend-specific types. **Can be used directly at the cost of portability.** ```python -# Native types (avoid) -column : TINYINT UNSIGNED -column : MEDIUMBLOB +# Native types — work but not portable +column : TINYINT UNSIGNED # MySQL only +column : MEDIUMBLOB # MySQL only (use BYTEA on PostgreSQL) +column : SERIAL # PostgreSQL only ``` +| MySQL | PostgreSQL | Portable Alternative | +|-------|------------|---------------------| +| `LONGBLOB` | `BYTEA` | `bytes` | +| `BINARY(16)` | `UUID` | `uuid` | +| `SMALLINT` | `SMALLINT` | `int16` | +| `DOUBLE` | `DOUBLE PRECISION` | `float64` | + ## Layer 2: Core DataJoint Types Standardized, scientist-friendly types that work identically across backends. From 05e4e5be4994a21fbb8054ba254b4db6a7db3122 Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Wed, 21 Jan 2026 13:37:42 -0600 Subject: [PATCH 02/10] docs: Clarify schema-addressed vs hash-addressed storage terminology - Define hash-addressed (content hash, deduplication) vs schema-addressed (mirrors database structure, browsable paths) - Fix section: "Schema-Addressed" not "Path-Addressed" - Add Addressing column to built-in codecs table - Update description to mention schema-addressed - Update plugin codecs to explicitly note schema-addressed - Expand Storage Modes section with clear definitions Co-Authored-By: Claude Opus 4.5 --- src/explanation/type-system.md | 50 ++++++++++++++++++++-------------- 1 file changed, 29 insertions(+), 21 deletions(-) diff --git a/src/explanation/type-system.md b/src/explanation/type-system.md index 719d38e4..b964e1b6 100644 --- a/src/explanation/type-system.md +++ b/src/explanation/type-system.md @@ -134,24 +134,24 @@ Codec types use angle bracket notation: ### Built-in Codecs -| Codec | Database | Object Store | Returns | -|-------|----------|--------------|---------| -| `` | ✅ | ✅ `` | Python object | -| `` | ✅ | ✅ `` | Local file path | -| `` | ❌ | ✅ | NpyRef (lazy) | -| `` | ❌ | ✅ | ObjectRef | -| `` | ❌ | ✅ | bytes | -| `` | ❌ | ✅ | ObjectRef | +| Codec | Database | Object Store | Addressing | Returns | +|-------|----------|--------------|------------|---------| +| `` | ✅ | ✅ `` | Hash | Python object | +| `` | ✅ | ✅ `` | Hash | Local file path | +| `` | ❌ | ✅ | Schema | NpyRef (lazy) | +| `` | ❌ | ✅ | Schema | ObjectRef | +| `` | ❌ | ✅ | Hash | bytes | +| `` | ❌ | ✅ | — | ObjectRef | ### Plugin Codecs -Additional codecs are available as separately installed packages. This ecosystem is actively expanding—new codecs are added as community needs arise. +Additional schema-addressed codecs are available as separately installed packages. This ecosystem is actively expanding—new codecs are added as community needs arise. | Package | Codec | Description | Repository | |---------|-------|-------------|------------| -| `dj-zarr-codecs` | `` | Zarr arrays with lazy chunked access | [datajoint/dj-zarr-codecs](https://github.com/datajoint/dj-zarr-codecs) | -| `dj-figpack-codecs` | `` | Interactive browser visualizations | [datajoint/dj-figpack-codecs](https://github.com/datajoint/dj-figpack-codecs) | -| `dj-photon-codecs` | `` | Photon imaging data formats | [datajoint/dj-photon-codecs](https://github.com/datajoint/dj-photon-codecs) | +| `dj-zarr-codecs` | `` | Schema-addressed Zarr arrays with lazy chunked access | [datajoint/dj-zarr-codecs](https://github.com/datajoint/dj-zarr-codecs) | +| `dj-figpack-codecs` | `` | Schema-addressed interactive browser visualizations | [datajoint/dj-figpack-codecs](https://github.com/datajoint/dj-figpack-codecs) | +| `dj-photon-codecs` | `` | Schema-addressed photon imaging data formats | [datajoint/dj-photon-codecs](https://github.com/datajoint/dj-photon-codecs) | **Installation and discovery:** @@ -236,7 +236,7 @@ class Config(dj.Manual): ### `` — NumPy Arrays as .npy Files -Stores NumPy arrays as standard `.npy` files with lazy loading. Returns `NpyRef` which provides metadata access (shape, dtype) without downloading. +Schema-addressed storage for NumPy arrays as standard `.npy` files. Returns `NpyRef` which provides metadata access (shape, dtype) without downloading. ```python class Recording(dj.Computed): @@ -269,19 +269,21 @@ result = np.mean(ref) # Downloads automatically - **Safe bulk fetch**: Fetching many rows doesn't download until needed - **Memory mapping**: `ref.load(mmap_mode='r')` for random access to large arrays -### `` — Path-Addressed Storage +### `` — Schema-Addressed Storage -For large/complex file structures (Zarr, HDF5). Path derived from primary key. +Schema-addressed storage for files and folders. Path mirrors the database structure: `{schema}/{table}/{pk}/{attribute}`. ```python class ProcessedData(dj.Computed): definition = """ -> Recording --- - zarr_data : # Stored at {schema}/{table}/{pk}/ + results : # Stored at {schema}/{table}/{pk}/results/ """ ``` +Accepts files, folders, or bytes. Returns `ObjectRef` for lazy access. + ### `` — Portable References References to independently-managed files with portable paths. @@ -297,11 +299,17 @@ class RawData(dj.Manual): ## Storage Modes -| Mode | Database | Object Store | Use Case | -|------|----------|--------------|----------| -| Database | Data | — | Small data | -| Hash-addressed | Metadata | Deduplicated | Large/repeated data | -| Path-addressed | Metadata | PK-based path | Complex files | +Object store codecs use one of two addressing schemes: + +**Hash-addressed** — Path derived from content hash (e.g., `_hash/ab/cd/abcd1234...`). Provides automatic deduplication—identical content stored once. Used by ``, ``, ``. + +**Schema-addressed** — Path mirrors database structure: `{schema}/{table}/{pk}/{attribute}`. Human-readable, browsable paths that reflect your data organization. No deduplication. Used by ``, ``, and plugin codecs (``, ``, ``). + +| Mode | Database | Object Store | Deduplication | Use Case | +|------|----------|--------------|---------------|----------| +| Database | Data | — | — | Small data | +| Hash-addressed | Metadata | Content hash path | ✅ Automatic | Large/repeated data | +| Schema-addressed | Metadata | Schema-mirrored path | ❌ None | Complex files, browsable storage | ## Custom Codecs From 4373f8632ee6b2fa3b2c3dedb263d4b7b41f0645 Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Wed, 21 Jan 2026 13:39:00 -0600 Subject: [PATCH 03/10] docs: Add edge labels showing backend-specific type mappings MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add MySQL/PostgreSQL labels to bytes and uuid edges in diagram - Expand explanation with explicit examples: - bytes → LONGBLOB (MySQL) / BYTEA (PostgreSQL) - uuid → BINARY(16) (MySQL) / UUID (PostgreSQL) Co-Authored-By: Claude Opus 4.5 --- src/explanation/type-system.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/explanation/type-system.md b/src/explanation/type-system.md index b964e1b6..ad2cab65 100644 --- a/src/explanation/type-system.md +++ b/src/explanation/type-system.md @@ -49,8 +49,8 @@ graph TB object --> json hash --> json - bytes --> BLOB - bytes --> BYTEA + bytes -->|MySQL| BLOB + bytes -->|PostgreSQL| BYTEA json --> JSON_M json --> JSON_P int32 --> INT @@ -59,11 +59,11 @@ graph TB float64 --> DOUBLE_P varchar --> VARCHAR_M varchar --> VARCHAR_P - uuid --> BIN16 - uuid --> UUID_P + uuid -->|MySQL| BIN16 + uuid -->|PostgreSQL| UUID_P ``` -Core types provide **portability** — the same table definition works on both MySQL and PostgreSQL. Native types can be used directly but sacrifice cross-backend compatibility. +Core types provide **portability** — the same table definition works on both MySQL and PostgreSQL. For example, `bytes` maps to `LONGBLOB` on MySQL but `BYTEA` on PostgreSQL; `uuid` maps to `BINARY(16)` on MySQL but native `UUID` on PostgreSQL. Native types can be used directly but sacrifice cross-backend compatibility. ## Layer 1: Native Database Types From 07b0c1c2b3c58503cf40d0d86e1d20165abec55d Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Wed, 21 Jan 2026 13:44:11 -0600 Subject: [PATCH 04/10] docs: Fix broken link to schema-addressed storage section --- src/reference/specs/npy-codec.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/reference/specs/npy-codec.md b/src/reference/specs/npy-codec.md index a3ee8b8f..f6a44892 100644 --- a/src/reference/specs/npy-codec.md +++ b/src/reference/specs/npy-codec.md @@ -287,4 +287,4 @@ arr = np.load('/path/to/store/my_schema/recording/recording_id=1/waveform.npy') - [Type System Specification](type-system.md) - Complete type system overview - [Codec API](codec-api.md) - Creating custom codecs -- [Object Storage](type-system.md#object--path-addressed-storage) - Path-addressed storage details +- [Object Storage](type-system.md#object--schema-addressed-storage) - Schema-addressed storage details From b926b0429319f6054ef0877009d5729c2614c0c5 Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Wed, 21 Jan 2026 13:46:03 -0600 Subject: [PATCH 05/10] docs: Simplify type system diagram with combined native types Merge MySQL/PostgreSQL boxes into single "Native Types" layer with backend-specific types shown inline, e.g.: - LONGBLOB (MySQL) / BYTEA (PG) - BINARY(16) (MySQL) / UUID (PG) Co-Authored-By: Claude Opus 4.5 --- src/explanation/type-system.md | 36 +++++++++++----------------------- 1 file changed, 11 insertions(+), 25 deletions(-) diff --git a/src/explanation/type-system.md b/src/explanation/type-system.md index ad2cab65..892eac70 100644 --- a/src/explanation/type-system.md +++ b/src/explanation/type-system.md @@ -26,21 +26,13 @@ graph TB bytes uuid end - subgraph "Layer 1: Native (MySQL)" - INT["INT"] - DOUBLE["DOUBLE"] - VARCHAR_M["VARCHAR"] - JSON_M["JSON"] - BLOB["LONGBLOB"] - BIN16["BINARY(16)"] - end - subgraph "Layer 1: Native (PostgreSQL)" - INTEGER["INTEGER"] - DOUBLE_P["DOUBLE PRECISION"] - VARCHAR_P["VARCHAR"] - JSON_P["JSON"] - BYTEA["BYTEA"] - UUID_P["UUID"] + subgraph "Layer 1: Native Types" + INT["INT / INTEGER"] + DOUBLE["DOUBLE / DOUBLE PRECISION"] + VARCHAR_N["VARCHAR"] + JSON_N["JSON"] + BYTES_N["LONGBLOB (MySQL) / BYTEA (PG)"] + UUID_N["BINARY(16) (MySQL) / UUID (PG)"] end blob --> bytes @@ -49,18 +41,12 @@ graph TB object --> json hash --> json - bytes -->|MySQL| BLOB - bytes -->|PostgreSQL| BYTEA - json --> JSON_M - json --> JSON_P + bytes --> BYTES_N + json --> JSON_N int32 --> INT - int32 --> INTEGER float64 --> DOUBLE - float64 --> DOUBLE_P - varchar --> VARCHAR_M - varchar --> VARCHAR_P - uuid -->|MySQL| BIN16 - uuid -->|PostgreSQL| UUID_P + varchar --> VARCHAR_N + uuid --> UUID_N ``` Core types provide **portability** — the same table definition works on both MySQL and PostgreSQL. For example, `bytes` maps to `LONGBLOB` on MySQL but `BYTEA` on PostgreSQL; `uuid` maps to `BINARY(16)` on MySQL but native `UUID` on PostgreSQL. Native types can be used directly but sacrifice cross-backend compatibility. From 6bff32b750a0a9f4e2d50cae5613aab107da685a Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Wed, 21 Jan 2026 13:49:32 -0600 Subject: [PATCH 06/10] =?UTF-8?q?docs:=20Show=20codec=20chaining=20in=20di?= =?UTF-8?q?agram=20(=20=E2=86=92=20)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add as separate node showing how it chains through for hash-addressed object storage with deduplication. Co-Authored-By: Claude Opus 4.5 --- src/explanation/type-system.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/explanation/type-system.md b/src/explanation/type-system.md index 892eac70..ab8f6d46 100644 --- a/src/explanation/type-system.md +++ b/src/explanation/type-system.md @@ -12,6 +12,7 @@ database efficiency with Python convenience. graph TB subgraph "Layer 3: Codecs" blob["‹blob›"] + blob_at["‹blob@›"] attach["‹attach›"] npy["‹npy@›"] object["‹object@›"] @@ -36,10 +37,11 @@ graph TB end blob --> bytes + blob_at --> hash + hash --> json attach --> bytes npy --> json object --> json - hash --> json bytes --> BYTES_N json --> JSON_N From 3abb750a789824b2b9ac1d3c94bd0037e8e10dbc Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Wed, 21 Jan 2026 13:51:54 -0600 Subject: [PATCH 07/10] docs: Move MySQL/PostgreSQL label to layer title for cleaner boxes --- src/explanation/type-system.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/explanation/type-system.md b/src/explanation/type-system.md index ab8f6d46..ce32ff46 100644 --- a/src/explanation/type-system.md +++ b/src/explanation/type-system.md @@ -27,13 +27,13 @@ graph TB bytes uuid end - subgraph "Layer 1: Native Types" + subgraph "Layer 1: Native Types (MySQL / PostgreSQL)" INT["INT / INTEGER"] DOUBLE["DOUBLE / DOUBLE PRECISION"] VARCHAR_N["VARCHAR"] JSON_N["JSON"] - BYTES_N["LONGBLOB (MySQL) / BYTEA (PG)"] - UUID_N["BINARY(16) (MySQL) / UUID (PG)"] + BYTES_N["LONGBLOB / BYTEA"] + UUID_N["BINARY(16) / UUID"] end blob --> bytes From 23de4d3844527861f8855d91195407ea373f2092 Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Wed, 21 Jan 2026 13:53:53 -0600 Subject: [PATCH 08/10] docs: Add to diagram showing hash-addressed chaining --- src/explanation/type-system.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/explanation/type-system.md b/src/explanation/type-system.md index ce32ff46..ad845422 100644 --- a/src/explanation/type-system.md +++ b/src/explanation/type-system.md @@ -14,6 +14,7 @@ graph TB blob["‹blob›"] blob_at["‹blob@›"] attach["‹attach›"] + attach_at["‹attach@›"] npy["‹npy@›"] object["‹object@›"] hash["‹hash@›"] @@ -38,8 +39,9 @@ graph TB blob --> bytes blob_at --> hash - hash --> json attach --> bytes + attach_at --> hash + hash --> json npy --> json object --> json From 9d16c23db7fe19bd49ed3591fd141409fad8f1cc Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Wed, 21 Jan 2026 14:02:21 -0600 Subject: [PATCH 09/10] docs: Rename 'custom' to 'plugin' in diagram --- src/explanation/type-system.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/explanation/type-system.md b/src/explanation/type-system.md index ad845422..5bd15f6b 100644 --- a/src/explanation/type-system.md +++ b/src/explanation/type-system.md @@ -18,7 +18,7 @@ graph TB npy["‹npy@›"] object["‹object@›"] hash["‹hash@›"] - custom["‹custom›"] + plugin["‹plugin›"] end subgraph "Layer 2: Core Types" int32 From bb774bc7d254057f2346aba9dab13f017432baab Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Wed, 21 Jan 2026 14:04:32 -0600 Subject: [PATCH 10/10] docs: Add to diagram, chains to json --- src/explanation/type-system.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/explanation/type-system.md b/src/explanation/type-system.md index 5bd15f6b..e606c3b2 100644 --- a/src/explanation/type-system.md +++ b/src/explanation/type-system.md @@ -17,6 +17,7 @@ graph TB attach_at["‹attach@›"] npy["‹npy@›"] object["‹object@›"] + filepath["‹filepath@›"] hash["‹hash@›"] plugin["‹plugin›"] end @@ -44,6 +45,7 @@ graph TB hash --> json npy --> json object --> json + filepath --> json bytes --> BYTES_N json --> JSON_N