diff --git a/.gts-spec b/.gts-spec index e088287..48f15e0 160000 --- a/.gts-spec +++ b/.gts-spec @@ -1 +1 @@ -Subproject commit e0882879577e7427f759677e9cf2eac7031d978c +Subproject commit 48f15e0e2fdb14f8cda10b02f1f13b4253173959 diff --git a/docs/001-macro-alignment-adr.md b/docs/001-macro-alignment-adr.md new file mode 100644 index 0000000..80ee9de --- /dev/null +++ b/docs/001-macro-alignment-adr.md @@ -0,0 +1,708 @@ +# ADR-001: Align gts-rust Macro with GTS Specification + +- **Status**: Proposed +- **Date**: 2026-03-06 +- **Authors**: Andre Smith +- **Issue**: [#72 — struct_to_gts_schema: gts_type field blocks Deserialize for structs](https://github.com/GlobalTypeSystem/gts-rust/issues/72) + +--- + +## 1. Context + +The `#[struct_to_gts_schema]` attribute macro is the primary integration point between Rust structs and the Global Type System. It currently serves three purposes: compile-time validation, JSON Schema generation, and runtime API generation. While the macro delivers significant value, its monolithic design has led to accumulated friction as the GTS specification and user requirements have evolved. + +Issue [#72](https://github.com/GlobalTypeSystem/gts-rust/issues/72) exposed a concrete symptom — the macro requires base structs to declare a `gts_type: GtsSchemaId` or `id: GtsInstanceId` field, but does not properly handle deserialization of that field. Users are forced into workarounds like `#[serde(skip_serializing, default = "dummy_fn")]`, which are fragile and pollute consumer code. + +However, the root cause runs deeper than a missing `#[serde(skip)]`. The macro conflates multiple orthogonal concerns into a single attribute, makes assumptions that are not grounded in the GTS specification, and silently manipulates user-controlled serde behavior in ways that are difficult to reason about. + +This ADR proposes a redesign that decomposes the macro into focused, composable units that align with the GTS specification and follow Rust ecosystem conventions. + +--- + +## 2. Current Design + +### 2.1 What the macro does today + +The `#[struct_to_gts_schema]` attribute macro accepts five required parameters: + +```rust +#[struct_to_gts_schema( + dir_path = "schemas", + base = true, // or base = ParentStruct + schema_id = "gts.x.core.events.type.v1~", + description = "Base event type", + properties = "event_type,id,tenant_id,payload" +)] +``` + +In a single invocation, the macro performs all of the following: + +1. **Validates the schema ID format** against GTS identifier rules +2. **Validates version consistency** between the struct name suffix (e.g., `V1`) and the schema ID version (e.g., `v1~`) +3. **Validates parent-child inheritance** — segment count, parent schema ID match +4. **Validates the `properties` list** — every listed property must exist as a struct field +5. **Requires id/type fields** — base structs must have either an `id: GtsInstanceId` or a type field (`type`/`gts_type`/etc.) of type `GtsSchemaId`, but not both +6. **Injects serde derives** — automatically adds `Serialize`, `Deserialize`, `JsonSchema` for base structs +7. **Removes serde derives** — strips `Serialize`/`Deserialize` from nested structs and emits compile errors if the user adds them manually +8. **Injects serde attributes** — adds `#[serde(bound(...))]` on the struct and `#[serde(serialize_with, deserialize_with)]` on generic fields for base structs +9. **Implements `GtsSchema` trait** — with `SCHEMA_ID`, `GENERIC_FIELD`, and schema composition methods +10. **Implements `GtsSerialize`/`GtsDeserialize`** — custom serialization traits for nested structs +11. **Implements `GtsNoDirectSerialize`/`GtsNoDirectDeserialize`** — marker traits that block direct serde usage on nested structs +12. **Generates runtime API** — `gts_schema_id()`, `gts_base_schema_id()`, `gts_make_instance_id()`, `gts_instance_json()`, schema string methods + +### 2.2 The `base` attribute's dual role + +The `base` attribute conflates two orthogonal concepts: + +| `base` value | GTS meaning | Serialization meaning | +|---|---|---| +| `base = true` | Root type in GTS hierarchy | Gets direct `Serialize`/`Deserialize` via serde derives | +| `base = ParentStruct` | Child type inheriting from parent | Blocked from direct serialization; uses `GtsSerialize`/`GtsDeserialize` instead | + +These are separate concerns. A root GTS type might not need direct serialization. A child type might need standalone serialization for testing or debugging. The current design couples them. + +### 2.3 The `properties` parameter + +The `properties` parameter is a comma-separated string listing which struct fields should appear in the generated JSON Schema: + +```rust +properties = "event_type,id,tenant_id,payload" +``` + +This requires the user to duplicate the field list — once in the struct definition and once in `properties`. If a field is added to the struct but not to `properties`, it silently becomes invisible to the schema. If a field is listed in `properties` but doesn't exist, the macro catches it — but the inverse (forgotten field) is not caught. + +### 2.4 The id/type field requirement + +The macro requires every base struct to have a GTS identity field. On `main`, `validate_field_types` (line 260) enforces: + +``` +Base structs must have either an ID field (one of: $id, id, gts_id, gtsId) of type +GtsInstanceId OR a GTS Type field (one of: type, gts_type, gtsType, schema) of type +GtsSchemaId +``` + +This is not grounded in the GTS specification (see [Section 3](#3-motivation-from-gts-specification) below). + +### 2.5 The parallel serialization system + +For nested structs, the macro creates a shadow serialization system: + +- `GtsSerialize` / `GtsDeserialize` — parallel traits to `serde::Serialize` / `serde::Deserialize` +- `GtsSerializeWrapper` / `GtsDeserializeWrapper` — bridge types between the two systems +- `GtsNoDirectSerialize` / `GtsNoDirectDeserialize` — marker traits that cause compile errors if serde traits are also implemented +- `serialize_gts` / `deserialize_gts` — helper functions used via `#[serde(serialize_with, deserialize_with)]` + +This exists to solve a real problem: a nested payload struct like `AuditPayloadV1` produces incomplete JSON when serialized alone (it lacks the base event fields). But the solution enforces a type-level restriction for what is fundamentally a usage concern, and creates significant cognitive overhead. + +--- + +## 3. Motivation from GTS Specification + +The GTS specification (v0.8) provides clear guidance that contradicts several assumptions baked into the current macro design. + +### 3.1 Identity fields are not universally required + +The spec defines five categories of JSON documents ([spec §11.1, Rule C](https://github.com/GlobalTypeSystem/gts-spec#111-global-rules-schema-vs-instance-normalization-and-document-categories)): + +1. **GTS entity schemas** — have `$schema` and `$id` starting with `gts://` +2. **Non-GTS schemas** — have `$schema` but no GTS `$id` +3. **Instances of unknown/non-GTS schemas** — no `$schema`, no determinable GTS identity +4. **Well-known GTS instances** — identified by a GTS instance ID in an `id` field +5. **Anonymous GTS instances** — opaque `id` (UUID), GTS type in a separate `type` field + +Categories 4 and 5 require GTS identity fields. But the spec examples include schemas that produce instances with **no GTS identity field at all**: + +- `gts.x.commerce.orders.order.v1.0~` — the Order schema has `id: uuid` (a plain business identifier, not a `GtsInstanceId`) and no `type` field +- `gts.x.core.idp.contact.v1.0~` — the Contact schema has `id: uuid` and no `type` field + +These are valid GTS schemas — they have a `$id` in their schema document — but their instances are pure data objects. They are referenced by other GTS entities (e.g., an event's `subjectType` points to `gts.x.commerce.orders.order.v1.0~`) but they don't self-identify at the instance level via GTS. + +**Conclusion**: The macro's requirement that every base struct must have a `GtsInstanceId` or `GtsSchemaId` field is not justified by the spec. + +### 3.2 The `type` field is a polymorphic discriminator, not a static constant + +The base event schema (`gts.x.core.events.type.v1~`) defines its `type` property as: + +```json +"type": { + "description": "Identifier of the event type in GTS format.", + "type": "string", + "x-gts-ref": "/$id" +} +``` + +The `x-gts-ref: "/$id"` annotation means the field value references the current schema's `$id`. In a child schema (`gts.x.core.events.type.v1~x.commerce.orders.order_placed.v1.0~`), this is narrowed to: + +```json +"type": { + "const": "gts.x.core.events.type.v1~x.commerce.orders.order_placed.v1.0~" +} +``` + +The `type` value is the **child's** full chained schema ID — not the base struct's own `SCHEMA_ID`. This is visible in the spec's instance examples: + +```json +{ + "type": "gts.x.core.events.type.v1~x.commerce.orders.order_placed.v1.0~", + "id": "7a1d2f34-5678-49ab-9012-abcdef123456" +} +``` + +And in the macro's own README examples: + +```rust +let event = BaseEventV1 { + event_type: PlaceOrderDataV1::gts_schema_id().clone(), // child's ID, not base's + ... +}; +``` + +**Conclusion**: The `type` field is a runtime value set by the user — it depends on which concrete child type the instance represents. It is not something the macro can or should auto-populate from the struct's own `SCHEMA_ID`. + +### 3.3 The `id` field follows different patterns depending on entity type + +The spec shows multiple patterns: + +| Schema | `id` field | `type`/`gtsId` field | Pattern | +|---|---|---|---| +| `events.type.v1~` | UUID | Full chained schema ID | Anonymous instance | +| `events.type_combined.v1~` | Chained GTS ID + UUID | (omitted) | Combined anonymous | +| `events.topic.v1~` | GTS instance ID | (none) | Well-known instance | +| `compute.vm.v1~` | UUID | `gtsId`: schema ID | Hybrid | +| `compute.vm_state.v1~` | (none) | `gtsId`: schema ID | Well-known (type-only) | +| `orders.order.v1.0~` | UUID | (none) | Plain data entity | +| `idp.contact.v1.0~` | UUID | (none) | Plain data entity | + +**Conclusion**: There is no single "correct" identity pattern. The choice is domain-specific and implementation-defined ([spec §11.1](https://github.com/GlobalTypeSystem/gts-spec#111-global-rules-schema-vs-instance-normalization-and-document-categories)). The macro should support these patterns, not mandate one. + +### 3.4 Field names are implementation-defined + +From [spec §11.1](https://github.com/GlobalTypeSystem/gts-spec#111-global-rules-schema-vs-instance-normalization-and-document-categories): + +> *"The exact field names used for instance IDs and instance types are **implementation-defined** and may be **configuration-driven** (different systems may look for identifiers in different fields)."* + +And [spec §9.1](https://github.com/GlobalTypeSystem/gts-spec#91---identifier-reference-in-json-and-json-schema): + +> *"Field naming: typically `id` (alternatives: `gtsId`, `gts_id`)"* +> *"Field naming: `type` (alternatives: `gtsType`, `gts_type`)"* + +The macro already supports multiple field name variants. This is appropriate, but the rigid requirement around field presence is not. + +### 3.5 Schema identity lives at the schema level, not the instance level + +Every GTS schema has a `$id` in its JSON Schema document. The macro's `schema_id` attribute captures this. The generated `SCHEMA_ID` constant, `gts_schema_id()` method, and `gts_make_instance_id()` method provide runtime access. + +Instance-level identity (the `id` or `type` field on a JSON object) is a separate concern — it's how a specific JSON document identifies itself at runtime. Not every schema produces instances that need self-identification. + +**Conclusion**: Schema-level identity (the `schema_id` macro attribute) and instance-level identity (the `id`/`type` struct field) should be treated as independent concerns. + +--- + +## 4. Proposed Design + +### 4.1 Decompose into focused macros + +Replace the single `#[struct_to_gts_schema]` with composable, single-responsibility macros and field-level attributes: + +#### `#[derive(GtsSchema)]` — Schema identity and metadata + +A derive macro that handles the pure GTS metadata concern: + +```rust +#[derive(Debug, Clone, GtsSchema)] +#[gts( + schema_id = "gts.x.core.events.type.v1~", + description = "Base event type with common fields", +)] +pub struct BaseEventV1 { + #[serde(rename = "type")] + pub event_type: GtsSchemaId, + pub id: Uuid, + pub tenant_id: Uuid, + pub sequence_id: u64, + pub payload: P, +} +``` + +**Responsibilities:** +- Validate the `schema_id` format against GTS identifier rules +- Validate version consistency between struct name and schema ID +- Implement the `GtsSchema` trait (`SCHEMA_ID`, `GENERIC_FIELD`, schema composition methods) +- Generate `gts_schema_id()`, `gts_base_schema_id()`, `gts_make_instance_id()` +- Generate `gts_schema_with_refs_as_string()` and similar convenience methods +- Store `dir_path` and `description` as associated constants for CLI schema generation + +**Automatic derives:** +- The derive macro automatically adds `schemars::JsonSchema` if not already present. This is required because the `GtsSchema` trait implementation uses `schemars::schema_for!(Self)` internally for runtime schema generation. Unlike `Serialize`/`Deserialize`, `JsonSchema` is a direct dependency of the GTS schema system and cannot be meaningfully omitted. + +**What it does NOT do:** +- No `Serialize`/`Deserialize` injection or removal — user-controlled +- No serde attribute manipulation (except on generic fields — see [4.5](#45-gts-aware-serde-for-generic-fields)) +- No field requirements (no mandatory id/type) +- No `properties` parameter — the schema is derived from the struct's fields (see [4.2](#42-remove-the-properties-parameter)) + +#### `#[gts(extends = ParentStruct)]` — Inheritance declaration + +An attribute within the `#[gts(...)]` namespace that declares parent-child relationships: + +```rust +#[derive(Debug, Clone, GtsSchema)] +#[gts( + schema_id = "gts.x.core.events.type.v1~x.core.audit.event.v1~", + description = "Audit event with user context", + extends = BaseEventV1, +)] +pub struct AuditPayloadV1 { + pub user_agent: String, + pub user_id: Uuid, + pub ip_address: String, + pub data: D, +} +``` + +**Responsibilities:** +- Validate that the schema ID has the correct segment count (multi-segment for child types) +- Validate at compile time that the parent's `SCHEMA_ID` matches the parent segment in `schema_id` +- Validate that the parent struct has exactly one generic parameter +- Generate `allOf` + `$ref` schema composition + +**What it does NOT do:** +- Does not control serialization behavior — that remains the user's choice (see [4.4](#44-let-users-control-serde)) + +#### Field-level attributes — Opt-in GTS semantics + +Instead of requiring id/type fields and hardcoding field-name recognition, provide opt-in field-level attributes: + +```rust +#[gts(type_field)] // Marks this as the GTS type discriminator +pub event_type: GtsSchemaId, + +#[gts(instance_id)] // Marks this as the GTS instance ID +pub id: GtsInstanceId, + +#[gts(skip)] // Exclude from generated JSON schema +pub internal_cache: HashMap, +``` + +**`#[gts(type_field)]`:** +- Validates that the field type is `GtsSchemaId` +- In the generated JSON Schema, annotates the property with `"x-gts-ref": "/$id"` per [spec §9.6](https://github.com/GlobalTypeSystem/gts-spec#96---x-gts-ref-support) +- Can only appear once per struct, and is mutually exclusive with `#[gts(instance_id)]` + +**`#[gts(instance_id)]`:** +- Validates that the field type is `GtsInstanceId` +- In the generated JSON Schema, annotates the property with `"x-gts-ref": "/$id"` per [spec §9.6](https://github.com/GlobalTypeSystem/gts-spec#96---x-gts-ref-support) +- Can only appear once per struct, and is mutually exclusive with `#[gts(type_field)]` + +**`#[gts(skip)]`:** +- Excludes the field from the generated JSON Schema properties +- Does not affect serde behavior (use `#[serde(skip)]` for that) + +These attributes are **all optional**. A struct without any `#[gts(type_field)]` or `#[gts(instance_id)]` annotation is valid — it represents a data entity like `order.v1.0~` or `contact.v1.0~` from the spec. + +### 4.2 Remove the `properties` parameter + +The current `properties = "event_type,id,tenant_id,payload"` parameter manually lists which fields appear in the schema. This is redundant with the struct definition itself. + +**New behavior**: All named fields are included in the JSON Schema by default. To exclude a field, use `#[gts(skip)]` (schema-only exclusion) or `#[serde(skip)]` (also excluded from serialization). + +```rust +#[derive(Debug, Clone, Serialize, Deserialize, GtsSchema)] +#[gts( + schema_id = "gts.x.core.events.type.v1~", + description = "Base event type", +)] +pub struct BaseEventV1 { + #[gts(type_field)] + #[serde(rename = "type")] + pub event_type: GtsSchemaId, + + pub id: Uuid, + pub tenant_id: Uuid, + pub sequence_id: u64, + pub payload: P, + + #[gts(skip)] + pub internal_metadata: Option, // not in schema, but still serializable +} +``` + +**Migration**: The `properties` parameter is removed. Fields previously omitted from `properties` should add `#[gts(skip)]`. The `dir_path` parameter is retained as it controls file output location for the CLI. + +### 4.3 Replace `base` with `extends` + +The current `base` attribute has two forms: +- `base = true` — "this is a root type" +- `base = ParentStruct` — "this inherits from ParentStruct" + +The `base = true` case carries no information — it simply means "not a child." This is the default state and doesn't need to be declared. The child case is better expressed as `extends = ParentStruct`, which reads naturally and only appears when needed. + +```rust +// Root type — no `extends`, this is the default +#[derive(GtsSchema)] +#[gts(schema_id = "gts.x.core.events.type.v1~", description = "...")] +pub struct BaseEventV1 { ... } + +// Child type — declares parent explicitly +#[derive(GtsSchema)] +#[gts( + schema_id = "gts.x.core.events.type.v1~x.core.audit.event.v1~", + description = "...", + extends = BaseEventV1, +)] +pub struct AuditPayloadV1 { ... } +``` + +**Validation rules remain the same:** +- Without `extends`: schema ID must have exactly 1 segment +- With `extends = Parent`: schema ID must have 2+ segments, and the parent segment must match `Parent::SCHEMA_ID` + +### 4.4 Let users control serde + +The current macro silently adds `Serialize`/`Deserialize` derives for base structs and silently removes them for nested structs. This is surprising behavior that fights against Rust conventions. + +**New behavior**: The macro does **not** inject or remove any serde derives. Users explicitly control their serialization: + +```rust +// Base struct — user adds Serialize/Deserialize themselves +#[derive(Debug, Serialize, Deserialize, GtsSchema)] +#[gts(schema_id = "gts.x.core.events.type.v1~", description = "...")] +pub struct BaseEventV1 { ... } + +// Nested struct — user decides whether it's directly serializable +#[derive(Debug, GtsSchema)] +#[gts( + schema_id = "gts.x.core.events.type.v1~x.core.audit.event.v1~", + description = "...", + extends = BaseEventV1, +)] +pub struct AuditPayloadV1 { ... } +``` + +**Impact on nested struct serialization:** + +The current design blocks nested structs from implementing `Serialize`/`Deserialize` to prevent users from accidentally serializing incomplete JSON. This is a valid safety concern, but the current approach is heavy-handed. + +**Decision: Retain current blocking as default, with opt-out.** Nested structs (those with `extends`) will continue to be blocked from deriving `Serialize`/`Deserialize` by default, since direct serialization produces incomplete JSON and is a real source of bugs. Users who understand the tradeoff (testing, debugging, standalone use cases) can opt out: + +```rust +#[derive(Debug, Serialize, Deserialize, GtsSchema)] +#[gts( + schema_id = "...", + extends = BaseEventV1, + allow_direct_serde, // opt-out: allow Serialize/Deserialize +)] +pub struct AuditPayloadV1 { ... } +``` + +Without `allow_direct_serde`, deriving `Serialize`/`Deserialize` on a nested struct will produce a compile error directing the user to either remove the derives or add `allow_direct_serde`. + +The `GtsSerialize`/`GtsDeserialize` trait system is retained regardless. It remains necessary for bridging generic fields in base structs where the generic parameter only implements `GtsSerialize`, not `serde::Serialize`. The key change is that users control whether a struct *also* implements serde directly. + +**How serialization works without `Serialize` on nested structs:** + +When a nested struct does not derive `Serialize`/`Deserialize` (the default), it can still be serialized through the base struct. The chain works as follows: + +1. The base struct (e.g., `BaseEventV1>`) derives `Serialize` and has `#[serde(serialize_with = "gts::serialize_gts")]` on its generic field +2. `serialize_gts` calls `GtsSerialize::gts_serialize()` on the nested struct +3. The macro generates an explicit `GtsSerialize` impl for the nested struct (this is not the blanket impl — it's a custom implementation that handles generic field wrapping via `GtsSerializeWrapper`) +4. The same applies in reverse for deserialization via `GtsDeserialize` + +The macro **must** continue generating explicit `GtsSerialize`/`GtsDeserialize` implementations for nested structs. Without `Serialize`/`Deserialize` derives, the blanket impls (`impl GtsSerialize for T`) do not apply, so these explicit impls are the only path for nested struct serialization. + +**Instance serialization methods (`gts_instance_json`, etc.):** + +The current macro generates `gts_instance_json()`, `gts_instance_json_as_string()`, and `gts_instance_json_as_string_pretty()` for base structs. These call `serde_json::to_value(self)` which requires `Serialize`. Since the new design does not inject `Serialize`, these methods must be generated with a `where Self: serde::Serialize` bound so they are only available when the user has derived `Serialize`. If `Serialize` is not derived, the methods simply won't exist — no compile error unless the user tries to call them. + +**Unit struct handling:** + +The current macro provides custom `Serialize`/`Deserialize` implementations for unit structs (both base and nested) that serialize as `{}` instead of `null` and accept both `{}` and `null` on deserialization. This behavior is retained in the new design: +- Base unit structs: custom `Serialize`/`Deserialize` impls generated by the macro +- Nested unit structs: custom `GtsSerialize`/`GtsDeserialize` impls generated by the macro + +### 4.5 GTS-aware serde for generic fields + +The current macro injects `#[serde(bound(...))]` and `#[serde(serialize_with, deserialize_with)]` attributes on base structs with generic parameters. This is necessary because the generic parameter `P` may only implement `GtsSerialize`/`GtsDeserialize` (not direct serde traits), and serde needs to be told how to handle it. + +This behavior is retained, but made explicit. When the derive macro detects a generic field of type `P` where `P: GtsSchema`, it adds the appropriate serde bounds and delegation attributes. This is the one case where the macro manipulates serde attributes, and it is justified because the `GtsSchema` bound is user-declared and the serde bridging is a direct consequence of the GTS type system. + +The macro should emit a note in documentation (or via `#[doc]`) explaining what serde attributes were added and why. + +--- + +## 5. Migration Path + +### 5.1 Before (current) + +```rust +#[derive(Debug)] +#[struct_to_gts_schema( + dir_path = "schemas", + base = true, + schema_id = "gts.x.core.events.type.v1~", + description = "Base event type definition", + properties = "event_type,id,tenant_id,sequence_id,payload" +)] +pub struct BaseEventV1

{ + #[serde(rename = "type")] + pub event_type: GtsSchemaId, + pub id: Uuid, + pub tenant_id: Uuid, + pub sequence_id: u64, + pub payload: P, +} + +#[derive(Debug)] +#[struct_to_gts_schema( + dir_path = "schemas", + base = BaseEventV1, + schema_id = "gts.x.core.events.type.v1~x.core.audit.event.v1~", + description = "Audit event with user context", + properties = "user_agent,user_id,ip_address,data" +)] +pub struct AuditPayloadV1 { + pub user_agent: String, + pub user_id: Uuid, + pub ip_address: String, + pub data: D, +} +``` + +### 5.2 After (proposed) + +```rust +#[derive(Debug, Serialize, Deserialize, GtsSchema)] +#[gts( + dir_path = "schemas", + schema_id = "gts.x.core.events.type.v1~", + description = "Base event type definition", +)] +pub struct BaseEventV1 { + #[gts(type_field)] + #[serde(rename = "type")] + pub event_type: GtsSchemaId, + pub id: Uuid, + pub tenant_id: Uuid, + pub sequence_id: u64, + pub payload: P, +} + +#[derive(Debug, GtsSchema)] +#[gts( + dir_path = "schemas", + schema_id = "gts.x.core.events.type.v1~x.core.audit.event.v1~", + description = "Audit event with user context", + extends = BaseEventV1, +)] +pub struct AuditPayloadV1 { + pub user_agent: String, + pub user_id: Uuid, + pub ip_address: String, + pub data: D, +} +``` + +### 5.3 The issue #72 case — data entity without GTS identity + +```rust +// BEFORE: forced to add dead gts_type field with fragile serde workaround +#[derive(Debug, Clone)] +#[struct_to_gts_schema( + dir_path = "schemas", + base = true, + schema_id = "gts.cf.core.errors.quota_violation.v1~", + description = "A single quota violation entry", + properties = "subject,description" +)] +pub struct QuotaViolationV1 { + #[allow(dead_code)] + #[serde(skip_serializing, default = "dummy_gts_schema_id")] + gts_type: gts::GtsSchemaId, // unwanted field, serde workaround + pub subject: String, + pub description: String, +} + +// AFTER: clean data entity, no GTS identity field needed +#[derive(Debug, Clone, Serialize, Deserialize, GtsSchema)] +#[gts( + dir_path = "schemas", + schema_id = "gts.cf.core.errors.quota_violation.v1~", + description = "A single quota violation entry", +)] +pub struct QuotaViolationV1 { + pub subject: String, + pub description: String, +} +``` + +--- + +## 6. Summary of Changes + +| Concern | Current behavior | Proposed behavior | +|---|---|---| +| **Macro entry point** | Single `#[struct_to_gts_schema]` attribute macro | `#[derive(GtsSchema)]` derive macro + `#[gts(...)]` attributes | +| **Schema identity** | Provided via `schema_id` param | Same — via `#[gts(schema_id = "...")]` | +| **Inheritance** | `base = true` / `base = Parent` | Absent (default = root) / `extends = Parent` | +| **Properties list** | Manual `properties = "a,b,c"` | Automatic from struct fields; `#[gts(skip)]` to exclude | +| **Id/type fields** | Required on all base structs | Optional; opt-in via `#[gts(type_field)]` / `#[gts(instance_id)]` | +| **Serde derives** | Silently injected (base) or removed (nested) | User-controlled; macro does not add or remove derives | +| **Serde attributes on generic fields** | Silently injected | Retained (necessary), but documented | +| **Nested struct serialization blocking** | Always blocked via marker traits | Blocked by default; opt-out via `#[gts(allow_direct_serde)]` | +| **GtsSerialize/GtsDeserialize** | Explicit impls generated for nested structs | Retained — explicit impls still generated for nested structs | +| **`GtsSchema` trait** | Implemented by macro | Same — implemented by `#[derive(GtsSchema)]` | +| **Runtime API** | Generated methods on struct | Same — generated by derive; `gts_instance_json()` gated on `Self: Serialize` | +| **CLI schema generation** | Uses `dir_path` and `properties` | Uses `dir_path`; properties derived from fields | +| **JsonSchema derive** | Auto-added | Auto-added by `#[derive(GtsSchema)]` (required for `schemars::schema_for!`) | + +--- + +## 7. Schema Output Impact + +A key concern: does the redesign change the generated JSON Schemas? + +### 7.1 Structurally identical output + +With correct migration (adding `#[gts(skip)]` to fields previously omitted from `properties`), the generated schemas are **structurally identical**: + +- `$id`, `$schema`, `title`, `type: "object"` — unchanged +- `properties` object — same fields included +- `required` array — derived the same way (non-`Option` fields are required) +- `additionalProperties: false` — unchanged +- `allOf` + `$ref` structure for child types — unchanged +- Generic field nesting via `wrap_in_nesting_path` — unchanged +- `GtsInstanceId`/`GtsSchemaId` field representation — unchanged (via `json_schema_value()`) + +### 7.2 One improvement: spec-correct `x-gts-ref` on identity fields + +The current macro generates all `GtsSchemaId` fields with a generic `x-gts-ref`: + +```json +"type": { "type": "string", "format": "gts-schema-id", "x-gts-ref": "gts.*" } +``` + +But the GTS spec examples ([§9.6](https://github.com/GlobalTypeSystem/gts-spec#96---x-gts-ref-support)) use a more precise self-reference annotation on identity fields — `"x-gts-ref": "/$id"` — meaning "this field's value must equal the current schema's `$id`": + +```json +"type": { "type": "string", "x-gts-ref": "/$id" } +``` + +This distinction matters. Consider a base event schema with two `GtsSchemaId` fields: +- `type` — identifies *this* entity's schema (should be `"x-gts-ref": "/$id"`) +- `subjectType` — references *another* entity's schema (should be `"x-gts-ref": "gts.*"`) + +The current macro treats both identically. The new field-level attributes fix this: + +```rust +#[gts(type_field)] +#[serde(rename = "type")] +pub event_type: GtsSchemaId, // → "x-gts-ref": "/$id" + +pub subject_type: GtsSchemaId, // → "x-gts-ref": "gts.*" (from schemars JsonSchema impl) +``` + +This brings the generated schemas closer to the spec examples: +- `events.type.v1~` schema: `"type"` property has `"x-gts-ref": "/$id"` (spec ref: [events.type.v1~.schema.json](/.gts-spec/examples/events/schemas/gts.x.core.events.type.v1~.schema.json)) +- `events.topic.v1~` schema: `"id"` property has `"x-gts-ref": "/$id"` (spec ref: [events.topic.v1~.schema.json](/.gts-spec/examples/events/schemas/gts.x.core.events.topic.v1~.schema.json)) +- `modules.capability.v1~` schema: `"id"` property has `"x-gts-ref": "/$id"` (spec ref: [capability.v1~.schema.json](/.gts-spec/examples/modules/schemas/gts.x.core.modules.capability.v1~.schema.json)) + +### 7.3 Summary + +| Aspect | Change? | Notes | +|---|---|---| +| Schema structure (`$id`, `allOf`, `$ref`, `properties`, `required`) | No | Identical output | +| `additionalProperties` | No | Same behavior | +| `GtsSchemaId` / `GtsInstanceId` field representation | No | Same `json_schema_value()` | +| `x-gts-ref` on `#[gts(type_field)]` / `#[gts(instance_id)]` fields | Yes — improved | Changes from `"gts.*"` to `"/$id"`, matching spec examples | +| `x-gts-ref` on other GTS fields (e.g., `subjectType`) | No | Retains `"gts.*"` from schemars impl | + +--- + +## 8. Decisions + +1. **Nested struct serialization policy**: Retain current blocking as default, with opt-out via `#[gts(allow_direct_serde)]`. See [section 4.4](#44-let-users-control-serde). + +2. **Backwards compatibility**: `#[struct_to_gts_schema]` will be deprecated. Both the old and new macros will coexist during a migration period. The old macro will emit a deprecation warning pointing users to the migration guide. + +3. **`dir_path` location**: Remains per-struct for now. A future enhancement may support crate-level configuration with per-struct overrides. + +## 9. Decisions on Pre-existing Issues + +The following are pre-existing issues in the current macro that will be addressed as part of the redesign: + +1. **Fix nested struct deserializer field renaming**: The current macro's `GtsDeserialize` impl for nested structs generates a field identifier enum with `#[serde(field_identifier, rename_all = "snake_case")]`. This assumes all incoming JSON field names are snake_case. However, fields with `#[serde(rename = "someOtherName")]` are handled correctly during serialization (via `get_serde_rename()`), but the `rename_all = "snake_case"` on the field identifier enum does not correctly match camelCase or other conventions in incoming JSON. The redesign will generate the field identifier enum with explicit per-field `#[serde(rename = "...")]` attributes that respect the user's serde renames, rather than applying a blanket `rename_all`. + +2. **Include `description` in runtime schemas**: The `description` parameter is currently stored as `GTS_SCHEMA_DESCRIPTION` but is only used by the CLI for file-based schema generation. The runtime-generated schemas (via `gts_schema_with_refs()`) omit it, even though the GTS spec example schemas consistently include `description` (e.g., `events.type.v1~.schema.json`, `events.topic.v1~.schema.json`, `compute.vm.v1~.schema.json`). The redesign will include `description` in runtime-generated schemas to match the spec. + +## 10. Compile-Fail Test Migration + +The current macro has 31 compile-fail tests. The redesign affects their status: + +| Test | Current behavior | Redesign status | +|---|---|---| +| `base_struct_missing_id` | Error: must have id or type field | **Removed** — id/type fields no longer required | +| `base_struct_wrong_id_type` | Error: id must be `GtsInstanceId` | **Modified** — only validates if `#[gts(instance_id)]` is present | +| `base_struct_wrong_gts_type` | Error: type must be `GtsSchemaId` | **Modified** — only validates if `#[gts(type_field)]` is present | +| `base_struct_both_id_and_type` | Error: cannot have both | **Retained** — `#[gts(type_field)]` and `#[gts(instance_id)]` are mutually exclusive | +| `nested_direct_serialize` | Error: cannot derive Serialize | **Retained** — blocked by default; `allow_direct_serde` to opt out | +| `nested_direct_serialize_cfg_attr` | Error: same, via cfg_attr | **Retained** | +| Version mismatch tests (6 cases) | Error: version inconsistency | **Retained** as-is | +| Schema ID format tests | Error: invalid GTS identifier | **Retained** as-is | +| `base_true_multi_segment` | Error: base=true needs 1 segment | **Retained** — no `extends` with multi-segment errors similarly | +| `base_parent_mismatch` | Error: parent ID doesn't match | **Retained** as-is | +| `base_parent_no_generic` | Error: parent must have generic | **Retained** as-is | +| `multiple_type_generics` | Error: only 1 generic allowed | **Retained** as-is | +| `non_gts_generic` | Error: generic must impl GtsSchema | **Retained** as-is | +| `tuple_struct` | Error: not supported | **Retained** as-is | +| `missing_schema_id` | Error: required attribute | **Retained** — still required in `#[gts(...)]` | +| `missing_description` | Error: required attribute | **Retained** — still required in `#[gts(...)]` | +| `missing_file_path` | Error: dir_path required | **Retained** | +| `missing_properties` | Error: required attribute | **Removed** — `properties` parameter eliminated | +| `missing_property` | Error: property not in struct | **Removed** — `properties` parameter eliminated | +| `unknown_attribute` | Error: unrecognized attribute | **Modified** — new attribute names (`extends`, `type_field`, etc.) | + +## 11. Open Questions + +1. **CLI impact**: The CLI currently scans source files for `#[struct_to_gts_schema]` annotations to extract metadata (schema ID, properties, description). The new `#[derive(GtsSchema)]` + `#[gts(...)]` pattern requires updating the CLI parser. The removal of `properties` means the CLI must parse struct fields directly (respecting `#[gts(skip)]` and `#[serde(skip)]` attributes). This is deferred for a separate design discussion. + +2. **`GTS_SCHEMA_PROPERTIES` constant**: ~~The current macro generates `GTS_SCHEMA_PROPERTIES: &'static str` as a comma-separated string derived from the `properties` parameter. With `properties` removed, this constant could either be auto-generated from struct fields or removed entirely.~~ **Resolved**: The derive macro will auto-generate this constant from the struct's field names, excluding fields with `#[gts(skip)]` or `#[serde(skip)]`. The struct fields are already available to the macro at compile time — no user input needed. + +3. **Schema traits (`x-gts-traits-schema` / `x-gts-traits`)**: The GTS spec ([§9.7](https://github.com/GlobalTypeSystem/gts-spec#97---schema-traits-x-gts-traits-schema--x-gts-traits)) defines a trait system for schema-level metadata — semantic annotations like retention rules, topic associations, and processing directives that are not part of the instance data model. The current macro does not generate these, and this ADR does not address them. Examples from the spec: + + - Base event schema defines a trait schema: + ```json + "x-gts-traits-schema": { + "type": "object", + "properties": { + "topicRef": { "x-gts-ref": "gts.x.core.events.topic.v1~" }, + "retention": { "type": "string", "default": "P30D" } + } + } + ``` + - Child schemas provide trait values (with immutability — once set by an ancestor, descendants cannot override): + ```json + "x-gts-traits": { + "topicRef": "gts.x.core.events.topic.v1~x.commerce._.orders.v1", + "retention": "P90D" + } + ``` + + This is a significant spec feature that needs its own design. The macro redesign should be compatible with future trait support (e.g., via `#[gts(traits_schema = "...")]` or a separate derive), but the design is deferred. + +--- + +## 12. References + +- [GTS Specification v0.8](/.gts-spec/README.md) + - Section 3.7 — Well-known and Anonymous Instances + - Section 9.1 — Identifier reference in JSON and JSON Schema + - Section 9.6 — `x-gts-ref` support + - Section 11.1 — JSON document categories (Rule C) + - Section 11.2 — JSON and JSON Schema examples +- [Issue #72 — struct_to_gts_schema: gts_type field blocks Deserialize](https://github.com/GlobalTypeSystem/gts-rust/issues/72) +- Spec examples: + - [`order.v1.0~.schema.json`](/.gts-spec/examples/events/schemas/gts.x.commerce.orders.order.v1.0~.schema.json) — data entity without GTS identity field + - [`contact.v1.0~.schema.json`](/.gts-spec/examples/events/schemas/gts.x.core.idp.contact.v1.0~.schema.json) — data entity without GTS identity field + - [`events.type.v1~.schema.json`](/.gts-spec/examples/events/schemas/gts.x.core.events.type.v1~.schema.json) — anonymous instance with `type` field using `x-gts-ref: "/$id"` + - [`events.topic.v1~.schema.json`](/.gts-spec/examples/events/schemas/gts.x.core.events.topic.v1~.schema.json) — well-known instance with `id` field using `x-gts-ref: "/$id"` + - [`compute.vm.v1~.schema.json`](/.gts-spec/examples/typespec/vms/schemas/gts.x.infra.compute.vm.v1~.schema.json) — hybrid pattern with `gtsId` + UUID `id` diff --git a/docs/001-macro-alignment-implementation-plan.md b/docs/001-macro-alignment-implementation-plan.md new file mode 100644 index 0000000..233a1da --- /dev/null +++ b/docs/001-macro-alignment-implementation-plan.md @@ -0,0 +1,441 @@ +# Implementation Plan: ADR-001 Macro Redesign + +This plan implements the redesign specified in [ADR-001](./001-macro-alignment-adr.md). The work is structured in phases that can each be independently tested and merged. + +Each phase references the GTS specification sections that justify the design choices. Spec references link to the [GTS Specification](https://github.com/GlobalTypeSystem/gts-spec). + +--- + +## Phase 1: New derive macro skeleton with attribute parsing + +**Goal**: Create `#[derive(GtsSchema)]` with `#[gts(...)]` attribute parsing. No code generation yet — just parsing, validation, and error reporting. + +### 1.1 Attribute parsing + +Create the `GtsSchema` derive macro entry point in `gts-macros/src/lib.rs` (alongside the existing `struct_to_gts_schema`). Parse `#[gts(...)]` attributes into a struct: + +```rust +struct GtsAttrs { + dir_path: String, + schema_id: String, + description: String, + extends: Option, // None = root type + allow_direct_serde: bool, +} +``` + +Parse field-level attributes: + +```rust +enum GtsFieldAttr { + TypeField, // #[gts(type_field)] + InstanceId, // #[gts(instance_id)] + Skip, // #[gts(skip)] +} +``` + +**Spec justification:** +- `schema_id` — maps to the `$id` in JSON Schema documents [[Spec §9.1](https://github.com/GlobalTypeSystem/gts-spec#91---identifier-reference-in-json-and-json-schema), [§11.1 Rule C](https://github.com/GlobalTypeSystem/gts-spec#111-global-rules-schema-vs-instance-normalization-and-document-categories)] +- `extends` — models left-to-right inheritance via chained identifiers [[Spec §2.2](https://github.com/GlobalTypeSystem/gts-spec#22-chained-identifiers), [§3.2](https://github.com/GlobalTypeSystem/gts-spec#32-gts-types-inheritance)] +- `type_field` / `instance_id` — optional, maps to anonymous instance `type` field [[Spec §3.7](https://github.com/GlobalTypeSystem/gts-spec#37-well-known-and-anonymous-instances), [§11.1 Rule C](https://github.com/GlobalTypeSystem/gts-spec#111-global-rules-schema-vs-instance-normalization-and-document-categories)] or well-known instance `id` field. Made optional because not all GTS schemas require instance-level identity (e.g., `order.v1.0~`, `contact.v1.0~` have plain UUID `id` fields with no GTS semantics) [[Spec §11.1 Rule C](https://github.com/GlobalTypeSystem/gts-spec#111-global-rules-schema-vs-instance-normalization-and-document-categories)]. **This is the fix for Issue #72.** +- Field names are implementation-defined [[Spec §11.1](https://github.com/GlobalTypeSystem/gts-spec#111-global-rules-schema-vs-instance-normalization-and-document-categories)] + +### 1.2 Validation (carried over from current macro) + +Implement all validations that apply to the new design: + +- `schema_id` format validation via `gts_id::validate_gts_id()` [Spec §2.1, §2.3, §8.1](https://github.com/GlobalTypeSystem/gts-spec) +- Version match between struct name suffix and schema ID [Spec §4](https://github.com/GlobalTypeSystem/gts-spec#4-versioning) +- Segment count: no `extends` → single segment; `extends` → multi-segment [[Spec §2.2](https://github.com/GlobalTypeSystem/gts-spec#22-chained-identifiers)] +- Only named structs (no tuple structs, enums) +- Max 1 generic type parameter (GTS inheritance is single-chain, not multi-branch) [[Spec §3.2](https://github.com/GlobalTypeSystem/gts-spec#32-gts-types-inheritance)] +- `#[gts(type_field)]` must be on a `GtsSchemaId` field — the `type` field value is a GTS type identifier (ending with `~`) [[Spec §3.7](https://github.com/GlobalTypeSystem/gts-spec#37-well-known-and-anonymous-instances), [§11.1 Rule C](https://github.com/GlobalTypeSystem/gts-spec#111-global-rules-schema-vs-instance-normalization-and-document-categories)] +- `#[gts(instance_id)]` must be on a `GtsInstanceId` field — the `id` field value is a GTS instance identifier (no trailing `~`) [[Spec §3.7](https://github.com/GlobalTypeSystem/gts-spec#37-well-known-and-anonymous-instances), [§11.1 Rule C](https://github.com/GlobalTypeSystem/gts-spec#111-global-rules-schema-vs-instance-normalization-and-document-categories)] +- `#[gts(type_field)]` and `#[gts(instance_id)]` are mutually exclusive — a schema's instances follow either the well-known or anonymous pattern [[Spec §3.7](https://github.com/GlobalTypeSystem/gts-spec#37-well-known-and-anonymous-instances)] +- At most one `#[gts(type_field)]` and one `#[gts(instance_id)]` per struct +- Unknown `#[gts(...)]` attributes emit clear errors + +### 1.3 Tests for Phase 1 + +**Compile-fail tests** (`tests/compile_fail_v2/`): + +| Test | Validates | Spec reference | +|---|---|---| +| `missing_schema_id` | `#[gts(...)]` without `schema_id` | §9.1 — schema `$id` is mandatory | +| `missing_description` | `#[gts(...)]` without `description` | Spec examples consistently include `description` | +| `missing_dir_path` | `#[gts(...)]` without `dir_path` | Implementation requirement for CLI | +| `invalid_gts_id` | Malformed schema_id string | §2.1, §2.3, §8.1 — identifier format rules | +| `version_mismatch` | Struct `V1` with schema `v2~` | §4 — version consistency | +| `root_multi_segment` | No `extends` but multi-segment schema_id | §2.2 — single segment = base type, multi-segment = derived | +| `extends_single_segment` | `extends = Parent` with single-segment schema_id | §2.2 — derived types must chain | +| `tuple_struct` | `#[derive(GtsSchema)]` on tuple struct | JSON Schema `"type": "object"` maps to named fields | +| `enum_not_supported` | `#[derive(GtsSchema)]` on enum | JSON Schema `"type": "object"` maps to structs | +| `multiple_generics` | Struct with 2+ type params | §3.2 — inheritance is single-chain | +| `type_field_wrong_type` | `#[gts(type_field)]` on `String` field | §3.7 — type field must be a GTS type identifier (ending with `~`) | +| `instance_id_wrong_type` | `#[gts(instance_id)]` on `Uuid` field | §3.7 — well-known id must be a GTS instance identifier | +| `both_type_and_instance` | Same struct has both `#[gts(type_field)]` and `#[gts(instance_id)]` | §3.7 — well-known and anonymous are distinct patterns | +| `duplicate_type_field` | Two fields with `#[gts(type_field)]` | One identity field per entity | +| `unknown_gts_attr` | `#[gts(nonexistent)]` | Fail-fast on typos | +| `extends_parent_mismatch` | Parent's `SCHEMA_ID` doesn't match parent segment | §3.2 — chain must be valid derivation | +| `extends_parent_no_generic` | Parent struct has no generic parameter | Inheritance requires a slot for child properties | +| `nested_direct_serialize` | `extends` struct with `Serialize` derive (no `allow_direct_serde`) | Nested structs produce incomplete JSON without base envelope | +| `nested_direct_serialize_cfg_attr` | Same via `cfg_attr` | Same as above | + +--- + +## Phase 2: GtsSchema trait implementation and runtime API + +**Goal**: Generate the `GtsSchema` trait impl and all runtime methods. At this point, the new macro produces the same runtime API as the old one. + +### 2.1 Auto-derive JsonSchema + +If the struct does not already derive `schemars::JsonSchema`, inject it. This is required for `schemars::schema_for!(Self)` used in `gts_schema_with_refs()`. + +### 2.2 GtsSchema trait implementation + +Generate the `GtsSchema` trait impl with: +- `SCHEMA_ID` — from `schema_id` attribute [[Spec §9.1](https://github.com/GlobalTypeSystem/gts-spec#91---identifier-reference-in-json-and-json-schema)] +- `GENERIC_FIELD` — detected from struct fields (field whose type matches the generic param) +- `gts_schema_with_refs()` / `gts_schema_with_refs_allof()` — runtime schema generation using `schemars::schema_for!(Self)`, resolving `$ref` for `GtsSchemaId`/`GtsInstanceId` +- `innermost_schema_id()`, `innermost_schema()`, `collect_nesting_path()` — for generic base structs [[Spec §3.2](https://github.com/GlobalTypeSystem/gts-spec#32-gts-types-inheritance)] +- `wrap_in_nesting_path()` — inherited from trait default + +For `extends` structs: +- `allOf` + `$ref` schema composition [[Spec §9.1](https://github.com/GlobalTypeSystem/gts-spec#91---identifier-reference-in-json-and-json-schema), [§3.2](https://github.com/GlobalTypeSystem/gts-spec#32-gts-types-inheritance)] +- Compile-time assertion that parent's `SCHEMA_ID` matches [[Spec §3.1](https://github.com/GlobalTypeSystem/gts-spec#31-gts-types)] +- Property nesting under parent's generic field + +### 2.3 Runtime API methods + +Generate on the struct impl: +- `gts_schema_id() -> &'static GtsSchemaId` (LazyLock) [[Spec §9.1](https://github.com/GlobalTypeSystem/gts-spec#91---identifier-reference-in-json-and-json-schema)] +- `gts_base_schema_id() -> Option<&'static GtsSchemaId>` (LazyLock) [[Spec §3.2](https://github.com/GlobalTypeSystem/gts-spec#32-gts-types-inheritance)] +- `gts_make_instance_id(segment) -> GtsInstanceId` [[Spec §3.7](https://github.com/GlobalTypeSystem/gts-spec#37-well-known-and-anonymous-instances)] +- `gts_schema_with_refs_as_string() -> String` +- `gts_schema_with_refs_as_string_pretty() -> String` +- `gts_instance_json(&self) -> Value` — with `where Self: Serialize` bound +- `gts_instance_json_as_string(&self) -> String` — with `where Self: Serialize` bound +- `gts_instance_json_as_string_pretty(&self) -> String` — with `where Self: Serialize` bound + +Generate associated constants: +- `GTS_SCHEMA_FILE_PATH` — from `dir_path` + `schema_id` +- `GTS_SCHEMA_DESCRIPTION` — from `description` +- `GTS_SCHEMA_PROPERTIES` — auto-generated from fields (excluding `#[gts(skip)]` / `#[serde(skip)]`) +- `BASE_SCHEMA_ID` — `Option<&str>`, `Some(parent_segment)` for `extends`, `None` otherwise + +### 2.4 Include `description` in runtime schemas + +The `gts_schema_with_refs_allof()` method should include `"description"` in the generated JSON schema output, sourced from `GTS_SCHEMA_DESCRIPTION`. This aligns with every spec example schema (e.g., `events.type.v1~.schema.json`, `events.topic.v1~.schema.json`, `compute.vm.v1~.schema.json`), all of which include a `description` field. + +### 2.5 `x-gts-ref` on identity fields + +When generating the runtime schema, if a field has `#[gts(type_field)]` or `#[gts(instance_id)]`, override its schema property to use `"x-gts-ref": "/$id"` instead of the default `"x-gts-ref": "gts.*"` from `json_schema_value()`. + +**Spec justification** [[Spec §9.6](https://github.com/GlobalTypeSystem/gts-spec#96---x-gts-ref-support)]: +- `"x-gts-ref": "/$id"` — relative self-reference; field value must equal the current schema's `$id`. Used on identity fields that identify *this* entity (e.g., `type` on events, `id` on topics). +- `"x-gts-ref": "gts.*"` — generic reference; field must be any valid GTS identifier. Used on fields that reference *other* entities (e.g., `subjectType` referencing an order schema). + +The spec examples consistently use `"/$id"` on identity fields: +- `events.type.v1~` → `"type"` property has `"x-gts-ref": "/$id"` +- `events.topic.v1~` → `"id"` property has `"x-gts-ref": "/$id"` +- `modules.capability.v1~` → `"id"` property has `"x-gts-ref": "/$id"` +- `compute.vm_state.v1~` → `"gtsId"` property has `"x-gts-ref": "/$id"` + +### 2.6 Tests for Phase 2 + +**Integration tests** (`tests/v2_integration_tests.rs`): + +| Test | Validates | Spec reference | +|---|---|---| +| `base_struct_schema_id` | `gts_schema_id()` returns correct value | §9.1 — `$id` access | +| `base_struct_schema_constants` | `SCHEMA_ID`, `GTS_SCHEMA_FILE_PATH`, `GTS_SCHEMA_DESCRIPTION`, `GTS_SCHEMA_PROPERTIES` | §9.1 | +| `base_struct_instance_id` | `gts_make_instance_id()` produces correct format | §3.7 — instance = schema chain + segment | +| `base_struct_schema_output` | `gts_schema_with_refs()` produces correct JSON schema structure | §9.1, §11.1 Rule C cat. 1 | +| `base_struct_schema_has_description` | Runtime schema includes `description` field | Spec examples consistently include `description` | +| `base_struct_no_gts_identity_field` | Struct without id/type field compiles and produces valid schema | **§11.1 Rule C** — not all schemas need identity fields (e.g., `order.v1.0~`, `contact.v1.0~`). **Fixes Issue #72.** | +| `base_struct_with_type_field` | `#[gts(type_field)]` field gets `x-gts-ref: "/$id"` in schema | §9.6 — `/$id` self-reference | +| `base_struct_with_instance_id` | `#[gts(instance_id)]` field gets `x-gts-ref: "/$id"` in schema | §9.6 — `/$id` self-reference | +| `base_struct_other_gts_fields` | Non-annotated `GtsSchemaId` fields retain `x-gts-ref: "gts.*"` | §9.6 — `gts.*` generic reference | +| `base_struct_gts_skip` | `#[gts(skip)]` field excluded from schema properties | ADR §4.2 | +| `base_struct_serde_skip` | `#[serde(skip)]` field excluded from schema properties | ADR §4.2 | +| `base_struct_properties_auto` | `GTS_SCHEMA_PROPERTIES` matches struct fields minus skipped | ADR §4.2 — auto-derived from fields | +| `base_struct_with_generic` | Generic base struct with `GENERIC_FIELD` set correctly | §3.2 — generic field is the extension point | +| `base_struct_no_generic` | Non-generic base struct (leaf type) | Leaf types have no extension point | +| `base_struct_schema_pretty` | `gts_schema_with_refs_as_string_pretty()` is valid formatted JSON | — | +| `base_struct_base_schema_id_none` | Root type returns `None` for `gts_base_schema_id()` | §2.2 — single-segment = no parent | + +**Schema structure tests** (`tests/v2_schema_structure_tests.rs`): + +| Test | Validates | Spec reference | +|---|---|---| +| `schema_has_id` | `$id` field is `gts://` + schema_id | §9.1 — `$id` must use `gts://` prefix | +| `schema_has_json_schema_ref` | `$schema` is `http://json-schema.org/draft-07/schema#` | §11.1 Rule A — `$schema` presence = schema document | +| `schema_type_object` | `type` is `"object"` | JSON Schema standard for struct-like types | +| `schema_additional_properties_false` | `additionalProperties` is `false` | §4.2 — closed content model for type safety | +| `schema_required_fields` | Non-`Option` fields appear in `required` | JSON Schema `required` semantics | +| `schema_optional_fields_not_required` | `Option` fields not in `required` | JSON Schema `required` semantics | +| `schema_uuid_format` | `Uuid` fields have `format: "uuid"` | §9.10 — UUID support for instance IDs | +| `schema_gts_schema_id_format` | `GtsSchemaId` fields have `format: "gts-schema-id"` | §9.6 — GTS identifier reference | +| `schema_gts_instance_id_format` | `GtsInstanceId` fields have `format: "gts-instance-id"` | §9.6 — GTS identifier reference | + +--- + +## Phase 3: Serialization — serde attribute injection and GtsSerialize/GtsDeserialize + +**Goal**: Handle the serialization aspects — serde bound injection on generic fields, GtsSerialize/GtsDeserialize for nested structs, direct-serde blocking. + +### 3.1 Serde attribute injection on generic fields + +For base structs with a generic parameter `P`: +- Add `#[serde(bound(serialize = "P: GtsSerialize", deserialize = "P: GtsDeserialize<'de>"))]` to the struct +- Add `#[serde(serialize_with = "gts::serialize_gts", deserialize_with = "gts::deserialize_gts")]` to the generic field + +### 3.2 GtsSerialize/GtsDeserialize for nested structs + +For structs with `extends`: +- Generate explicit `GtsSerialize` impl (custom `SerializeStruct` with `GtsSerializeWrapper` for generic fields) +- Generate explicit `GtsDeserialize` impl (custom visitor with field identifier enum) +- The field identifier enum must use explicit per-field `#[serde(rename = "...")]` respecting the user's serde renames — **not** `rename_all = "snake_case"` (fixing the pre-existing bug) + +### 3.3 Direct serde blocking for nested structs + +For structs with `extends` (and without `allow_direct_serde`): +- Implement `GtsNoDirectSerialize` and `GtsNoDirectDeserialize` marker traits +- These conflict with the blanket impls if the user also derives `Serialize`/`Deserialize` + +If `allow_direct_serde` is set, skip the marker trait impls. + +### 3.4 Unit struct handling + +- Base unit structs: generate custom `Serialize`/`Deserialize` that handles `{}` and `null` +- Nested unit structs: generate custom `GtsSerialize`/`GtsDeserialize` with same behavior + +### 3.5 Instance serialization methods + +Generate `gts_instance_json()`, `gts_instance_json_as_string()`, `gts_instance_json_as_string_pretty()` with `where Self: serde::Serialize` bound. + +### 3.6 Tests for Phase 3 + +**Serialization tests** (`tests/v2_serialization_tests.rs`): + +| Test | Validates | +|---|---| +| `base_struct_serialize` | Base struct serializes to JSON correctly | +| `base_struct_deserialize` | Base struct deserializes from JSON correctly | +| `base_struct_roundtrip` | Serialize → deserialize produces identical struct | +| `base_struct_with_nested_serialize` | `BaseEventV1>` serializes with nested fields | +| `base_struct_with_nested_deserialize` | Same type deserializes from JSON | +| `base_struct_with_nested_roundtrip` | Full roundtrip through the generic chain | +| `nested_struct_gts_serialize` | `GtsSerialize` impl works correctly | +| `nested_struct_gts_deserialize` | `GtsDeserialize` impl works correctly | +| `serde_rename_respected` | `#[serde(rename = "type")]` appears in serialized JSON | +| `serde_rename_in_deserialize` | Deserialization reads renamed field correctly | +| `generic_field_serde_rename` | Renamed generic field nests correctly | +| `unit_struct_serialize` | Unit struct serializes to `{}` | +| `unit_struct_deserialize_object` | Unit struct deserializes from `{}` | +| `unit_struct_deserialize_null` | Unit struct deserializes from `null` | +| `nested_unit_struct` | Nested unit struct through GtsSerialize chain | +| `instance_json_methods` | `gts_instance_json()` returns correct `serde_json::Value` | +| `no_gts_identity_field_roundtrip` | Struct without id/type field round-trips correctly (issue #72) | + +**Compile-fail tests** (add to `tests/compile_fail_v2/`): + +| Test | Validates | +|---|---| +| `nested_direct_serialize` | `extends` + `Serialize` without `allow_direct_serde` fails | +| `nested_direct_serialize_cfg_attr` | Same via `cfg_attr` | +| `allow_direct_serde_on_root` | `allow_direct_serde` without `extends` is an error (or warning) | + +--- + +## Phase 4: Inheritance chain tests + +**Goal**: Verify that multi-level inheritance produces correct schemas and serialization. These tests mirror the existing `inheritance_tests.rs` patterns. + +### 4.1 Tests for Phase 4 + +**Inheritance tests** (`tests/v2_inheritance_tests.rs`): + +Define a test hierarchy: + +```rust +// Level 1: Base event (root, generic) +#[derive(Debug, Serialize, Deserialize, GtsSchema)] +#[gts(dir_path = "schemas", schema_id = "gts.x.core.events.type.v1~", description = "Base event type")] +pub struct BaseEventV1 { + #[gts(type_field)] + #[serde(rename = "type")] + pub event_type: GtsSchemaId, + pub id: Uuid, + pub tenant_id: Uuid, + pub sequence_id: u64, + pub payload: P, +} + +// Level 2: Audit payload (nested, generic) +#[derive(Debug, GtsSchema)] +#[gts(dir_path = "schemas", schema_id = "gts.x.core.events.type.v1~x.core.audit.event.v1~", + description = "Audit event with user context", extends = BaseEventV1)] +pub struct AuditPayloadV1 { + pub user_agent: String, + pub user_id: Uuid, + pub ip_address: String, + pub data: D, +} + +// Level 3: Place order data (nested, non-generic, leaf) +#[derive(Debug, GtsSchema)] +#[gts(dir_path = "schemas", schema_id = "gts.x.core.events.type.v1~x.core.audit.event.v1~x.marketplace.orders.purchase.v1~", + description = "Order placement audit event", extends = AuditPayloadV1)] +pub struct PlaceOrderDataV1 { + pub order_id: Uuid, + pub product_id: Uuid, +} +``` + +| Test | Validates | Spec reference | +|---|---|---| +| `two_level_schema` | Child schema has `allOf` with `$ref` to parent | §3.2, §9.1 — derived type uses `allOf` + `$ref` | +| `three_level_schema` | 3-level chain produces correct nested `allOf` | §3.2 — `A~B~C` left-to-right inheritance | +| `three_level_serialize` | Full 3-level instance serializes with correct nesting | §11.2 Example #2 — nested instance structure | +| `three_level_deserialize` | Full 3-level instance deserializes correctly | §11.2 — instance roundtrip | +| `three_level_roundtrip` | Serialize → deserialize preserves all fields | — | +| `child_schema_id` | Child `gts_schema_id()` returns full chained ID | §2.2 — chained identifier format | +| `child_base_schema_id` | Child `gts_base_schema_id()` returns parent's ID | §3.2 — parent segment extraction | +| `child_instance_id` | Child `gts_make_instance_id()` appends to full chain | §3.7 — instance = chain + segment | +| `innermost_schema_id` | `BaseEventV1::>::innermost_schema_id()` returns leaf ID | §3.1 — rightmost type resolution | +| `generic_field_detection` | `GENERIC_FIELD` is `Some("payload")` for base, `Some("data")` for audit, `None` for leaf | §3.2 — generic field is the extension point | +| `unit_struct_child` | Unit struct as leaf in inheritance chain | Empty derived types (no new properties) | +| `schema_additional_properties` | Each nesting level has `additionalProperties: false` | §4.2 — closed content model | +| `nesting_path` | `collect_nesting_path()` returns correct path through chain | §3.2 — path from outer to inner | + +--- + +## Phase 5: Parity validation and deprecation + +**Goal**: Verify that the new macro produces identical behavior to the old one, then deprecate the old macro. + +### 5.1 Schema parity tests + +Create tests that generate schemas from both the old and new macros for equivalent struct definitions, and assert the schemas are identical (except for the `x-gts-ref` improvement on identity fields). + +**Parity tests** (`tests/v2_parity_tests.rs`): + +| Test | Validates | +|---|---| +| `base_event_schema_parity` | Old and new macros produce same base event schema | +| `child_event_schema_parity` | Old and new macros produce same child schema | +| `three_level_schema_parity` | Old and new macros produce same 3-level schemas | +| `instance_json_parity` | Old and new macros produce same serialized instance JSON | +| `deserialization_parity` | Same JSON deserializes identically under both macros | +| `topic_schema_parity` | Well-known instance (id field) schema parity | + +### 5.2 New capability tests + +Tests for features that the old macro couldn't support: + +| Test | Validates | Spec reference | +|---|---|---| +| `data_entity_no_identity` | Struct without id/type field (issue #72) | **§11.1 Rule C** — data entities like `order.v1.0~` and `contact.v1.0~` have no GTS identity fields | +| `data_entity_roundtrip` | Serialize/deserialize without id/type field | §11.1 — instances of unknown/non-GTS schemas are valid | +| `gts_skip_field` | `#[gts(skip)]` excludes from schema but not serde | ADR §4.2 | +| `allow_direct_serde_nested` | `allow_direct_serde` enables Serialize on nested struct | ADR §4.4 | +| `description_in_runtime_schema` | `description` appears in `gts_schema_with_refs()` output | Spec examples — all schemas include `description` | +| `x_gts_ref_self_reference` | `#[gts(type_field)]` produces `"x-gts-ref": "/$id"` | §9.6 — `/$id` relative self-reference | +| `x_gts_ref_cross_reference` | Non-annotated `GtsSchemaId` retains `"x-gts-ref": "gts.*"` | §9.6 — `gts.*` generic reference | + +### 5.3 Deprecate old macro + +Add `#[deprecated]` to `struct_to_gts_schema` with a message pointing to the migration guide. The old macro continues to work but emits warnings. + +### 5.4 Update README + +Rewrite `gts-macros/README.md` to document the new `#[derive(GtsSchema)]` API, with migration examples from the ADR. + +--- + +## Phase 6: Serde rename fix for nested deserializer + +**Goal**: Fix the pre-existing bug where the nested struct deserializer uses `rename_all = "snake_case"` instead of respecting per-field serde renames. + +### 6.1 Implementation + +In the `GtsDeserialize` code generation for nested structs, replace: + +```rust +#[serde(field_identifier, rename_all = "snake_case")] +enum Field { + #field_idents, + #[serde(other)] + Unknown, +} +``` + +With per-field rename attributes: + +```rust +#[serde(field_identifier)] +enum Field { + #[serde(rename = "user_agent")] + user_agent, + #[serde(rename = "userId")] // respects user's #[serde(rename)] + user_id, + #[serde(other)] + Unknown, +} +``` + +### 6.2 Tests + +| Test | Validates | +|---|---| +| `nested_camel_case_deserialize` | Nested struct with `#[serde(rename = "camelCase")]` deserializes correctly | +| `nested_mixed_renames` | Struct with mix of renamed and non-renamed fields | +| `nested_rename_roundtrip` | Serialize → deserialize with renamed fields | + +--- + +## Implementation Order and Dependencies + +``` +Phase 1 ──→ Phase 2 ──→ Phase 3 ──→ Phase 4 ──→ Phase 5 + (parse) (trait) (serde) (inherit) (parity) + │ + Phase 6 + (rename fix) +``` + +- Phases 1-4 are sequential — each builds on the previous +- Phase 5 (parity) requires all prior phases +- Phase 6 (rename fix) can be done independently after Phase 3 + +Each phase should be a separate PR with its own tests passing before merge. + +--- + +## File Structure + +``` +gts-macros/ +├── src/ +│ ├── lib.rs # Both old + new macro entry points +│ ├── gts_schema_derive.rs # New: #[derive(GtsSchema)] implementation +│ ├── gts_attrs.rs # New: #[gts(...)] attribute parsing +│ ├── gts_field_attrs.rs # New: field-level attribute parsing +│ ├── gts_validation.rs # New: validation logic (extracted + new) +│ ├── gts_codegen.rs # New: code generation (trait impl, runtime API) +│ └── gts_serde.rs # New: serde-related code generation +├── tests/ +│ ├── compile_fail/ # Existing (old macro) +│ ├── compile_fail_v2/ # New: compile-fail tests for new macro +│ ├── integration_tests.rs # Existing (old macro) +│ ├── inheritance_tests.rs # Existing (old macro) +│ ├── v2_integration_tests.rs # New: integration tests +│ ├── v2_inheritance_tests.rs # New: inheritance chain tests +│ ├── v2_serialization_tests.rs # New: serialization tests +│ ├── v2_schema_structure_tests.rs # New: schema output tests +│ ├── v2_parity_tests.rs # New: old vs new comparison +│ └── ... +``` + +The new macro source is split into focused modules rather than a single 1800-line file. The old macro source remains in `lib.rs` until deprecation is complete. diff --git a/docs/001-macro-proposal.md b/docs/001-macro-proposal.md new file mode 100644 index 0000000..2d95f88 --- /dev/null +++ b/docs/001-macro-proposal.md @@ -0,0 +1,483 @@ +# Proposal: Align gts-rust Macro with GTS Specification + +**ADR**: [001-macro-alignment-adr.md](./001-macro-alignment-adr.md) | **Implementation Plan**: [001-macro-alignment-implementation-plan.md](./001-macro-alignment-implementation-plan.md) +**Issue**: [#72 - gts_type field blocks Deserialize](https://github.com/GlobalTypeSystem/gts-rust/issues/72) +**Branch**: `gts-macro-proposal` + +--- + +## 1. Purpose + +The `#[struct_to_gts_schema]` macro has been the primary integration point between Rust structs and the Global Type System. It delivers compile-time validation, JSON Schema generation, and a runtime API from a single annotation. As the GTS specification has matured and usage has grown, several areas have emerged where the macro's assumptions can be brought into closer alignment with the spec. + +This proposal evolves the macro from `#[struct_to_gts_schema]` to `#[derive(GtsSchema)]` with `#[gts(...)]` attributes. The goals are: + +- Align field and identity requirements with the GTS specification (v0.8) +- Support the full range of GTS document categories ([Spec §11.1, Rule C](https://github.com/GlobalTypeSystem/gts-spec#111-global-rules-schema-vs-instance-normalization-and-document-categories)) +- Enable spec-correct `x-gts-ref` annotations ([Spec §9.6](https://github.com/GlobalTypeSystem/gts-spec#96---x-gts-ref-support)) +- Give users explicit control over serde derives while preserving safety defaults +- Structure the codebase for future spec features like schema traits ([§9.7](https://github.com/GlobalTypeSystem/gts-spec#97---schema-traits-x-gts-traits-schema--x-gts-traits)) + +All existing compile-time validations, runtime behavior, and schema output are preserved. The old macro continues to work alongside the new one during migration. + +--- + +## 2. Opportunities for Alignment + +### 2.1 Supporting all GTS document categories + +The current macro requires every base struct to declare either a `GtsSchemaId` field (for anonymous instances) or a `GtsInstanceId` field (for well-known instances). This was a reasonable default when the macro was written, as the primary use case was event types that always carry identity fields. + +However, the GTS specification (v0.8) defines **five** categories of JSON documents ([Spec §11.1, Rule C](https://github.com/GlobalTypeSystem/gts-spec#111-global-rules-schema-vs-instance-normalization-and-document-categories)). Only two of the five require identity fields: + +| Category | Identity field required? | Example | +|---|---|---| +| 1. GTS entity schemas | No (identity is `$id` in the schema document) | Any `.schema.json` file | +| 2. Non-GTS schemas | No | Third-party JSON Schemas | +| 3. Instances of unknown/non-GTS schemas | No | Opaque JSON payloads | +| 4. **Well-known GTS instances** | **Yes** -- GTS instance ID in `id` field | Event topics, modules | +| 5. **Anonymous GTS instances** | **Yes** -- GTS type ID in `type` field | Events, audit records | + +The spec includes concrete examples of GTS schemas whose instances have no GTS identity field: + +- `gts.x.commerce.orders.order.v1.0~` -- Order schema. The `id` field is a plain UUID, not a `GtsInstanceId`. There is no `type` field. +- `gts.x.core.idp.contact.v1.0~` -- Contact schema. Same pattern: UUID `id`, no GTS identity. + +These are valid GTS entity schemas (category 1) that produce instances falling under category 3. They are referenced by other GTS types (e.g., an event's `subjectType` references the order schema) but their instances do not self-identify via GTS. + +The spec notes this explicitly ([§11.1](https://github.com/GlobalTypeSystem/gts-spec#111-global-rules-schema-vs-instance-normalization-and-document-categories)): + +> *"The exact field names used for instance IDs and instance types are **implementation-defined** and may be **configuration-driven** (different systems may look for identifiers in different fields)."* + +This gap surfaced as Issue #72, where data entity structs are forced to add a dummy `gts_type` field with fragile serde workarounds to satisfy the macro's requirement. + +### 2.2 Distinguishing self-reference from cross-reference + +The GTS specification defines two kinds of `x-gts-ref` annotations on schema properties ([§9.6](https://github.com/GlobalTypeSystem/gts-spec#96---x-gts-ref-support)): + +- **`"x-gts-ref": "/$id"`** -- Self-reference. The field's value must equal the current schema's `$id`. Used on fields that identify *this* entity. +- **`"x-gts-ref": "gts.*"`** -- Cross-reference. The field's value can be any valid GTS identifier. Used on fields that reference *other* entities. + +This distinction is visible in the spec's example schemas. The base event schema (`gts.x.core.events.type.v1~.schema.json`) demonstrates both on the same struct: + +```json +{ + "type": { + "description": "Identifier of the event type in GTS format.", + "type": "string", + "x-gts-ref": "/$id" + }, + "subjectType": { + "description": "GTS type identifier of the entity this event is about.", + "type": "string", + "x-gts-ref": "gts.*" + } +} +``` + +The module schema (`gts.x.core.modules.module.v1~.schema.json`) shows the same pattern: + +```json +{ + "type": { "x-gts-ref": "/$id" }, + "capabilities": { + "items": { "x-gts-ref": "gts.x.core.modules.capability.v1~" } + } +} +``` + +The current macro treats all `GtsSchemaId` fields identically, generating `"x-gts-ref": "gts.*"` for every one. It does not yet have a mechanism to distinguish a field that identifies *this* entity from a field that references *another* entity. This proposal adds that mechanism through field-level annotations. + +### 2.3 Making serde derives visible + +The current macro automatically adds `Serialize`, `Deserialize`, and `JsonSchema` derives to base structs, and blocks `Serialize`/`Deserialize` on nested structs. This approach successfully prevents nested structs from producing incomplete JSON, which was the original design goal. + +The tradeoff is that users cannot see which traits are derived by reading the struct definition. The macro's serde attribute injection (`#[serde(bound(...))]`, `#[serde(serialize_with)]`) is invisible in source code. Issue #72 arose in part because the macro's serde handling for identity fields didn't account for deserialization correctly -- a problem that's harder to diagnose when the serde configuration isn't visible. + +This proposal makes all derives explicit while preserving the same safety default: nested structs are still blocked from direct serialization unless the user opts out with `allow_direct_serde`. + +### 2.4 Auto-deriving properties from struct fields + +The current macro requires `properties = "event_type,id,tenant_id,payload"` -- a comma-separated string that lists which fields appear in the schema. This serves as both a schema surface declaration and a typo check (the macro validates that every listed property exists as a field). + +The tradeoff is that the more dangerous direction isn't caught: if a field is added to the struct but omitted from `properties`, it silently disappears from the generated JSON Schema. For a system focused on diffable API contracts, this means a schema diff would show no change even though the wire format changed. + +This proposal auto-derives properties from struct fields, catching changes in both directions. Fields can be excluded from the schema with `#[gts(skip)]` or `#[serde(skip)]`. + +### 2.5 Clearer inheritance declaration + +The current macro uses `base` to declare a struct's position in the hierarchy: + +| `base` value | Meaning | +|---|---| +| `base = true` | Root type (no parent) | +| `base = ParentStruct` | Child type inheriting from parent | + +`base = true` is the default state and carries no information. This proposal removes the need to declare root types explicitly -- the absence of `extends` means root -- and uses `extends = ParentStruct` for child types, which reads more naturally in the context of GTS's left-to-right inheritance model ([§2.2](https://github.com/GlobalTypeSystem/gts-spec#22-chained-identifiers), [§3.2](https://github.com/GlobalTypeSystem/gts-spec#32-gts-types-inheritance)). + +--- + +## 3. What Changes + +### 3.1 Entry point: Derive macro with `#[gts(...)]` attributes + +The single `#[struct_to_gts_schema]` attribute macro evolves into `#[derive(GtsSchema)]` with `#[gts(...)]` attributes at both the struct and field level. + +**Before:** + +```rust +#[derive(Debug)] +#[struct_to_gts_schema( + dir_path = "schemas", + base = true, + schema_id = "gts.x.core.events.type.v1~", + description = "Base event type", + properties = "event_type,id,tenant_id,payload" +)] +pub struct BaseEventV1

{ + #[serde(rename = "type")] + pub event_type: GtsSchemaId, + pub id: Uuid, + pub tenant_id: Uuid, + pub payload: P, +} +``` + +**After:** + +```rust +#[derive(Debug, Serialize, Deserialize, JsonSchema, GtsSchema)] +#[gts( + dir_path = "schemas", + schema_id = "gts.x.core.events.type.v1~", + description = "Base event type", +)] +pub struct BaseEventV1 { + #[gts(type_field)] + #[serde(rename = "type")] + pub event_type: GtsSchemaId, + pub id: Uuid, + pub tenant_id: Uuid, + pub payload: P, +} +``` + +**What changed and why:** + +| Change | Reason | +|---|---| +| `Serialize`, `Deserialize`, `JsonSchema` are explicit | User controls all derives. No hidden injection. | +| `base = true` removed | Root types are the default -- no declaration needed. | +| `properties = "..."` removed | Properties are auto-derived from struct fields. | +| `#[gts(type_field)]` added to `event_type` | Explicit opt-in marks this as the identity field ([§9.6](https://github.com/GlobalTypeSystem/gts-spec#96---x-gts-ref-support): `"x-gts-ref": "/$id"`). | +| `P: GtsSchema` bound is visible | Generic constraint is in source, not injected. | + +### 3.2 Inheritance: `extends` replaces `base` + +**Before:** + +```rust +#[struct_to_gts_schema( + dir_path = "schemas", + base = BaseEventV1, + schema_id = "gts.x.core.events.type.v1~x.core.audit.event.v1~", + description = "Audit event", + properties = "user_agent,data" +)] +pub struct AuditPayloadV1 { + pub user_agent: String, + pub data: D, +} +``` + +**After:** + +```rust +#[derive(Debug, JsonSchema, GtsSchema)] +#[gts( + dir_path = "schemas", + schema_id = "gts.x.core.events.type.v1~x.core.audit.event.v1~", + description = "Audit event", + extends = BaseEventV1, +)] +pub struct AuditPayloadV1 { + pub user_agent: String, + pub data: D, +} +``` + +`extends = BaseEventV1` reads as what it means: this type extends the base event type. The `allOf` + `$ref` schema composition is generated from this declaration, following the GTS chained identifier model ([§2.2](https://github.com/GlobalTypeSystem/gts-spec#22-chained-identifiers), [§3.2](https://github.com/GlobalTypeSystem/gts-spec#32-gts-types-inheritance)): + +> *"Multiple GTS identifiers can be chained with `~` to express derivation and conformance. The chain follows **left-to-right inheritance** semantics."* + +The compile-time validations remain identical: +- Schema ID segment count must match `extends` presence ([§2.2](https://github.com/GlobalTypeSystem/gts-spec#22-chained-identifiers)) +- Parent's `SCHEMA_ID` must match the parent segment in `schema_id` ([§3.2](https://github.com/GlobalTypeSystem/gts-spec#32-gts-types-inheritance)) +- Parent struct must have exactly one generic parameter + +### 3.3 Optional identity fields with explicit annotations + +**Before (Issue #72):** + +```rust +// Forced to add a dead field with serde workaround +#[struct_to_gts_schema( + dir_path = "schemas", + base = true, + schema_id = "gts.cf.core.errors.quota_violation.v1~", + description = "A quota violation entry", + properties = "subject,description" +)] +pub struct QuotaViolationV1 { + #[allow(dead_code)] + #[serde(skip_serializing, default = "dummy_gts_schema_id")] + gts_type: GtsSchemaId, // unwanted, breaks Deserialize (#72) + pub subject: String, + pub description: String, +} +``` + +**After:** + +```rust +#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, GtsSchema)] +#[gts( + dir_path = "schemas", + schema_id = "gts.cf.core.errors.quota_violation.v1~", + description = "A quota violation entry", +)] +pub struct QuotaViolationV1 { + pub subject: String, + pub description: String, +} +``` + +No dummy field. No serde workaround. The struct represents what the GTS spec intends -- a data entity schema whose instances don't carry GTS identity fields, like `order.v1.0~` or `contact.v1.0~` in the spec examples. + +When identity fields *are* needed, they are annotated explicitly: + +```rust +// Well-known instance (Spec §3.7: named instance with GTS instance ID) +#[gts(instance_id)] +pub id: GtsInstanceId, // generates "x-gts-ref": "/$id" + +// Anonymous instance (Spec §3.7: opaque id + GTS type discriminator) +#[gts(type_field)] +#[serde(rename = "type")] +pub event_type: GtsSchemaId, // generates "x-gts-ref": "/$id" + +// Cross-reference (Spec §9.6: reference to another entity's schema) +pub subject_type: GtsSchemaId, // generates "x-gts-ref": "gts.*" +``` + +This maps directly to the spec's distinction in [§9.6](https://github.com/GlobalTypeSystem/gts-spec#96---x-gts-ref-support): + +> *"`x-gts-ref": "/$id"` -- relative self-reference; field value must equal the current schema's `$id`"* +> +> *"`x-gts-ref": "gts.*"` -- field must be a valid GTS identifier; optionally resolve against a registry"* + +The field-level attributes are validated at compile time: +- `#[gts(type_field)]` must be on a `GtsSchemaId` field +- `#[gts(instance_id)]` must be on a `GtsInstanceId` field +- The two are mutually exclusive (a schema's instances are either well-known or anonymous, per [§3.7](https://github.com/GlobalTypeSystem/gts-spec#37-well-known-and-anonymous-instances)) +- At most one of each per struct + +### 3.4 User-controlled serialization + +The macro no longer injects or removes serde derives. Users explicitly declare `Serialize` and `Deserialize` where needed. + +Nested structs (those with `extends`) are still blocked from direct serialization by default -- serializing a nested payload alone produces incomplete JSON (missing the base event envelope). This safety behavior, which was an intentional and valuable part of the original design, is preserved via marker trait conflicts (`GtsNoDirectSerialize` / `GtsNoDirectDeserialize`). The user can opt out with `allow_direct_serde` for testing or standalone use: + +```rust +#[derive(Debug, Serialize, Deserialize, JsonSchema, GtsSchema)] +#[gts( + schema_id = "gts.x.core.events.type.v1~x.core.audit.event.v1~", + extends = BaseEventV1, + allow_direct_serde, +)] +pub struct AuditPayloadV1 { ... } +``` + +Without `allow_direct_serde`, deriving `Serialize` on a nested struct produces a compile error. + +### 3.5 Auto-derived properties + +The `properties` parameter is replaced with auto-derivation from struct fields. All named fields are included in the generated JSON Schema by default. To exclude a field: + +```rust +#[gts(skip)] // excluded from schema, still serializable +pub internal_cache: String, + +#[serde(skip)] // excluded from both schema and serialization +pub runtime_state: String, +``` + +--- + +## 4. Schema Output + +The generated JSON Schemas are **structurally identical** between old and new macros, verified by 17 parity tests that compare both macros' output on equivalent struct definitions. + +### 4.1 Unchanged + +- `$id` with `gts://` prefix ([§9.1](https://github.com/GlobalTypeSystem/gts-spec#91---identifier-reference-in-json-and-json-schema)) +- `$schema` set to `http://json-schema.org/draft-07/schema#` +- `type: "object"`, `additionalProperties: false` +- `properties` and `required` arrays +- `allOf` + `$ref` composition for inherited types ([§9.1](https://github.com/GlobalTypeSystem/gts-spec#91---identifier-reference-in-json-and-json-schema)) +- Generic field nesting via `wrap_in_nesting_path` +- `GtsSchemaId` / `GtsInstanceId` inline representation + +### 4.2 Improvements + +**`description` included in runtime schemas.** The current macro stores the `description` attribute but omits it from `gts_schema_with_refs()` output. The updated macro includes it, consistent with every spec example schema (`events.type.v1~`, `events.topic.v1~`, `orders.order.v1.0~`, `modules.module.v1~` -- all include `description`). + +**Spec-correct `x-gts-ref` on identity fields.** As described in section 3.3, annotated identity fields generate `"x-gts-ref": "/$id"` while unannotated `GtsSchemaId` fields retain `"x-gts-ref": "gts.*"`. + +### 4.3 Example output + +```json +{ + "$id": "gts://gts.x.core.events.type.v1~", + "$schema": "http://json-schema.org/draft-07/schema#", + "description": "Base event type", + "type": "object", + "additionalProperties": false, + "properties": { + "type": { + "type": "string", + "format": "gts-schema-id", + "title": "GTS Schema ID", + "description": "GTS schema identifier", + "x-gts-ref": "/$id" + }, + "id": { "type": "string", "format": "uuid" }, + "tenant_id": { "type": "string", "format": "uuid" }, + "payload": { "type": "object" } + }, + "required": ["type", "id", "tenant_id", "payload"] +} +``` + +Compare the `type` property above with the spec's base event schema (`gts.x.core.events.type.v1~.schema.json`): + +```json +"type": { + "description": "Identifier of the event type in GTS format.", + "type": "string", + "x-gts-ref": "/$id" +} +``` + +Both use `"x-gts-ref": "/$id"` on the type discriminator field. The current macro generates `"x-gts-ref": "gts.*"` here. + +--- + +## 5. Extensibility + +The current macro is ~1,843 lines in a single file (`lib.rs`). The updated implementation splits into focused modules: + +``` +gts-macros/src/ + lib.rs Entry points (current + updated macro) + gts_schema_derive.rs #[derive(GtsSchema)] orchestration + gts_attrs.rs Struct-level #[gts(...)] parsing + gts_field_attrs.rs Field-level #[gts(...)] parsing + gts_validation.rs All compile-time validations + gts_codegen.rs GtsSchema trait impl + runtime API generation + gts_serde.rs GtsSerialize/GtsDeserialize + serde blocking +``` + +This structure is designed to grow with the GTS specification. Concrete examples: + +**Adding a new field-level attribute** (e.g., `#[gts(sensitive)]` to mark PII fields in the schema): +1. Add a variant to the `GtsFieldAttr` enum in `gts_field_attrs.rs` +2. Add parsing for the new keyword (3 lines) +3. Add validation rules in `gts_validation.rs` +4. Generate the schema annotation in `gts_codegen.rs` + +No other modules are touched. + +**Adding schema traits** ([§9.7](https://github.com/GlobalTypeSystem/gts-spec#97---schema-traits-x-gts-traits-schema--x-gts-traits) -- `x-gts-traits-schema` / `x-gts-traits`): The spec defines a trait system for schema-level metadata like retention rules and topic associations. The current macro doesn't support this. The modular design accommodates it via new struct-level attributes (e.g., `#[gts(traits_schema = "...")]`) following the same parse-validate-generate pipeline. The spec examples show this pattern: + +```json +"x-gts-traits-schema": { + "properties": { + "topicRef": { "x-gts-ref": "gts.x.core.events.topic.v1~" }, + "retention": { "type": "string", "default": "P30D" } + } +}, +"x-gts-traits": { + "topicRef": "gts.x.core.events.topic.v1~x.commerce._.orders.v1.0", + "retention": "P90D" +} +``` + +**Adding new struct-level attributes**: A new option in `#[gts(...)]` is a key-value pair in `gts_attrs.rs` + validation + codegen. The parsing infrastructure handles it uniformly. + +--- + +## 6. What Stays the Same + +The proposal preserves all existing runtime behavior: + +- **`GtsSchema` trait** -- `SCHEMA_ID`, `GENERIC_FIELD`, `gts_schema_with_refs()`, `gts_schema_with_refs_allof()`, `innermost_schema_id()`, `innermost_schema()`, `collect_nesting_path()`, `wrap_in_nesting_path()` +- **`GtsSerialize` / `GtsDeserialize`** trait system for nested structs, including `GtsSerializeWrapper` / `GtsDeserializeWrapper` bridge types +- **Serde blocking** for nested structs via `GtsNoDirectSerialize` / `GtsNoDirectDeserialize` marker traits (default behavior) +- **Runtime API** -- `gts_schema_id()`, `gts_base_schema_id()`, `gts_make_instance_id()`, `gts_instance_json()`, schema string methods +- **Associated constants** -- `SCHEMA_ID`, `GENERIC_FIELD`, `GTS_SCHEMA_FILE_PATH`, `GTS_SCHEMA_DESCRIPTION`, `GTS_SCHEMA_PROPERTIES`, `BASE_SCHEMA_ID` +- **Compile-time validations** -- schema ID format, version consistency, segment count, parent assertions, single generic parameter +- **Unit struct handling** -- `{}` / `null` serialization for both base and nested unit structs + +--- + +## 7. Test Coverage + +235 tests pass, covering both current and updated macros: + +| Test suite | Count | What it validates | +|---|---|---| +| `compile_fail_tests` (v1) | 31 | Current macro compile-time error cases | +| `compile_fail_v2_tests` | 21 | Updated macro compile-time error cases | +| `integration_tests` (v1) | 45 | Current macro runtime behavior | +| `v2_integration_tests` | 22 | Updated macro runtime behavior | +| `v2_inheritance_tests` | 14 | Multi-level inheritance chains (2-level, 3-level) | +| `v2_serialization_tests` | 10 | Serialize / deserialize round-trips | +| `v2_serde_rename_tests` | 5 | Per-field `#[serde(rename)]` handling | +| `v2_parity_tests` | 17 | Current vs updated macro output comparison | +| `inheritance_tests` (v1) | 45 | Current macro inheritance chains | +| `inheritance_tests_mixed` | 7 | Mixed current/updated macro interop | +| Other | 18 | Pretty printing, serde rename (v1) | + +The **17 parity tests** are the most critical -- they define equivalent structs using both macros and assert identical schema output, serialization output, deserialization behavior, trait constants, and runtime API results. + +--- + +## 8. Migration + +Both macros coexist. The current macro continues to work without changes. + +Migration per struct: + +1. Replace `#[struct_to_gts_schema(...)]` with `#[derive(GtsSchema)]` + `#[gts(...)]` +2. Add `#[derive(JsonSchema)]` and `#[derive(Serialize, Deserialize)]` where needed +3. Replace `base = true` with nothing; replace `base = Parent` with `extends = Parent` +4. Remove `properties = "..."` -- add `#[gts(skip)]` to fields that were excluded +5. Add `#[gts(type_field)]` or `#[gts(instance_id)]` to identity fields if present +6. Remove dummy identity fields that existed only to satisfy the current macro's requirement + +--- + +## 9. Specification References + +| Spec section | Topic | How this proposal uses it | +|---|---|---| +| [§2.2](https://github.com/GlobalTypeSystem/gts-spec#22-chained-identifiers) | Chained identifiers | `extends` models left-to-right inheritance via chained `~` segments | +| [§3.2](https://github.com/GlobalTypeSystem/gts-spec#32-gts-types-inheritance) | Type inheritance | Compile-time validation of parent-child segment matching; `allOf` + `$ref` generation | +| [§3.7](https://github.com/GlobalTypeSystem/gts-spec#37-well-known-and-anonymous-instances) | Well-known vs anonymous instances | `#[gts(instance_id)]` for well-known, `#[gts(type_field)]` for anonymous -- both optional | +| [§4.1](https://github.com/GlobalTypeSystem/gts-spec#41-compatibility-modes) | Versioning | Version match validation between struct name suffix and schema ID | +| [§9.1](https://github.com/GlobalTypeSystem/gts-spec#91---identifier-reference-in-json-and-json-schema) | `$id` and `$ref` conventions | Generated schemas use `gts://` prefix on `$id` and `$ref` | +| [§9.6](https://github.com/GlobalTypeSystem/gts-spec#96---x-gts-ref-support) | `x-gts-ref` support | Identity fields get `"/$id"`, cross-reference fields get `"gts.*"` | +| [§9.7](https://github.com/GlobalTypeSystem/gts-spec#97---schema-traits-x-gts-traits-schema--x-gts-traits) | Schema traits | Not yet implemented; modular design accommodates future support | +| [§11.1](https://github.com/GlobalTypeSystem/gts-spec#111-global-rules-schema-vs-instance-normalization-and-document-categories), Rule C | Five document categories | Identity fields made optional -- not all schemas produce self-identifying instances | +| [§11.1](https://github.com/GlobalTypeSystem/gts-spec#111-global-rules-schema-vs-instance-normalization-and-document-categories) | Implementation-defined field names | Field annotations replace hardcoded name matching |