Skip to content

[API Proposal]: System.Text.Json union type support #127299

@eiriktsarpalis

Description

@eiriktsarpalis

Background and motivation

The C# compiler just introduced support for union types. Sibling proposals for closed hierarchies and closed enums are not yet available in the compiler preview, so this issue is scoped to unions only. System.Text.Json support for closed hierarchies and closed enums will be tracked separately once their compiler features land.

This issue is a sub-issue of the broader STJ union/closed-types umbrella (#125449), extracting the API surface that is relevant today.

The goal is twofold:

  1. Provide out-of-the-box serialization support for simple union types whose union cases present no structural ambiguity (e.g. union Result(int, string)).
  2. Provide a case classifier abstraction for unions whose cases are of the same type (e.g. union Pet(Cat, Dog) where both serialize as JSON objects). This abstraction is also extended to the existing polymorphic type infrastructure.

Design summary

Serialization: no discriminator

Union values serialize transparently — the wrapper produced by the C# union keyword is unpacked and the underlying case value is written using its own JSON contract. There is no envelope object, no $type field, no tagging of any kind:

union Result(int, string);

JsonSerializer.Serialize<Result>(new Result(42));      // 42
JsonSerializer.Serialize<Result>(new Result("hello")); // "hello"

This is a deliberate departure from the polymorphism support exposed via [JsonPolymorphic] / [JsonDerivedType], where derived types are written with a $type discriminator. Unions don't have a natural discriminator: any case can be picked by the union's constructors, and two distinct case constructors can produce equal values. Synthesising an artificial discriminator (e.g. the case type name) would push that arbitrary choice into the wire format and lock STJ into it forever. Users who want a tagged representation can keep using polymorphic hierarchies, or attach a custom converter.

Deserialization: first-token dispatch

Without a discriminator on the wire, the converter has to recover the case type from the JSON value itself. The chosen mechanism is to look at a single thing — the first token of the value — and pick the unique union case whose declared type is compatible with that token kind. The mapping is fixed:

JsonTokenType Compatible case types
Number numeric primitives (int, long, double, decimal, …)
String string, DateTime, DateTimeOffset, Guid, TimeSpan, Uri, char, byte[], enums
True / False bool
StartObject objects and dictionaries
StartArray arrays and collections
Null null

Selection is O(1) and does not require read-ahead. union Result(int, string) works cleanly out of the box because int is the only case compatible with Number and string is the only case compatible with String.

This approach avoids $O(n^2)$ deserialization time when handling recursive/nested union values, a potential vector for DOS attacks.

Ambiguous unions

The token-only rule is intentionally narrow, so a number of perfectly valid unions cannot be disambiguated by it:

  • union Num(int, long) — both cases are Number.
  • union When(DateTime, DateTimeOffset) — both cases are String.
  • union Pet(Cat, Dog) — both cases are StartObject.

For these unions, the metadata layer throws InvalidOperationException when deserialization is being attempted, and the source generator emits the diagnostic SYSLIB1227 so the failure surfaces at compile time. In such cases, the user is expected to attach a custom JsonTypeClassifier that decides which case applies.

Why not built-in structural matching or content sniffing?

Two natural-looking alternatives were considered and rejected as defaults:

  • Structural matching, where the converter parses the value and tries each case type until one succeeds, would in principle resolve union Pet(Cat, Dog) automatically. But it requires unbounded read-ahead, costs O(n) on every value, and silently chooses an arm when more than one case structurally matches — which is precisely the case where the user most needs an error.
  • Content sniffing for ambiguous string forms (e.g. attempting to parse "2024-05-01" as DateTime first, falling back to string) is culture-sensitive, security-sensitive, and produces results that depend on which parsers happen to accept which inputs.

Custom classifiers

Union case classification can be customized in non-trivial shapes by implementing a JsonTypeClassifier. This follows a pattern similar to authoring custom converters:

record Dog(string? Name, string? Breed);
record Cat(string? Name, int Lives);

[JsonUnion(TypeClassifier = typeof(PetClassifier))]
union Pet(Cat, Dog);

public sealed class PetClassifier : JsonTypeClassifierFactory<Pet>
{
    public override JsonTypeClassifier CreateJsonClassifier(
        JsonTypeClassifierContext context, JsonSerializerOptions options)
    {
        // The classifier delegate is being passed a pre-buffered
        // defensive copy of the underlying Utf8JsonReader.
        return static (ref Utf8JsonReader reader) =>
        {
            if (reader.TokenType is not JsonTokenType.StartObject)
            {
                return null;
            }

            while (reader.Read() && reader.TokenType is not JsonTokenType.EndObject)
            {
                Debug.Assert(reader.TokenType is JsonTokenType.PropertyName);

                if (reader.ValueTextEquals("Breed"u8)) return typeof(Dog);
                if (reader.ValueTextEquals("Lives"u8)) return typeof(Cat);

                reader.Read(); // Advance to the value of the property.
                reader.Skip(); // Skip the value.
            }

            return null;
        };
    }
}

A higher-level alternative parses the value into a JsonNode first — simpler to write, more allocation:

public sealed class JsonNodePetClassifier : JsonTypeClassifierFactory<Pet>
{
    public override JsonTypeClassifier CreateJsonClassifier(
        JsonTypeClassifierContext context, JsonSerializerOptions options)
    {
        return static (ref Utf8JsonReader reader) =>
        {
            if (JsonNode.Parse(ref reader) is JsonObject obj)
            {
                if (obj.ContainsKey("Breed")) return typeof(Dog);
                if (obj.ContainsKey("Lives")) return typeof(Cat);
            }

            return null; // No union case identified, fail deserialization.
        };
    }
}

Below is an example of implementing a multi-type classifier factory that performs precomputations:

JsonSerializerOptions options = new() { Classifiers = { new PropertyBasedClassifier() } };

public sealed class PropertyBasedClassifier : JsonTypeClassifierFactory
{
    // Multivariate factory: applies to any union type. Override CanClassify to scope it
    // to a specific set of types if needed.
    public override bool CanClassify(Type declaringType) => true;

    public override JsonTypeClassifier CreateJsonClassifier(JsonTypeClassifierContext context, JsonSerializerOptions options)
    {
        // Precompute a property-name-to-type index, happens once.
        Dictionary<string, Type> propertyIndex = new(StringComparer.OrdinalIgnoreCase);
        foreach (JsonUnionCaseInfo unionCase in context.UnionCases)
        {
            foreach (JsonPropertyInfo prop in options.GetTypeInfo(unionCase .Type).Properties)
            {
                if (prop.Name is { } name)
                {
                    propertyIndex.TryAdd(name, dt.DerivedType);
                }
            }
        }

        return (ref Utf8JsonReader reader) =>
        {
            // Deserialization hot path
            if (reader.TokenType is not JsonTokenType.StartObject)
            {
                return null;
            }

            while (reader.Read() && reader.TokenType is not JsonTokenType.EndObject)
            {
                Debug.Assert(reader.TokenType is JsonTokenType.PropertyName);

                if (propertyIndex.TryGetValue(reader.GetString()!, out Type? match))
                {
                    return match;
                }

                reader.Read();
                reader.Skip();
            }

            return null;
        };
    }
}

Null handling

C# unions accept null if and only if at least one case constructor parameter is declared nullable, and dispatch the null value to a nullable case's constructor. Any number of cases may be declared nullable — union(int?, string?, bool?, Dog?) is well-formed. By the union spec, passing null to any nullable case yields the same null-holding union value, so all such constructors are observationally equivalent on null:

union Pet(Dog?, Cat, Bird?);   // multiple nullable cases — allowed

Pet pet1 = (Dog?)null;
Pet pet2 = (Bird?)null;
Debug.Assert(pet1 == pet2);    // same canonical null union value
// Pet pet = (Cat?)null;       // would error: Cat case is not nullable

JsonUnionConverter mirrors that semantics:

  • Serialization. A null underlying case value writes JSON null. (Reachable only when the case is nullable; otherwise the union value cannot itself be null.)
  • Deserialization. When the reader is positioned on JsonTokenType.Null, the converter short-circuits — any configured JsonTypeClassifier is bypassed — and dispatches directly to a nullable case's constructor with a null argument. If no case is nullable, JsonException is thrown. When more than one case is nullable, the first declared nullable case is chosen — by spec, passing null to any nullable case yields the same null-holding union value, so the choice is semantically transparent. Custom classifier authors therefore do not need to handle the Null token, and existing classifier implementations are unaffected by per-union nullability.
  • Multiple nullable cases. Explicitly supported and not rejected. The deserializer always picks the first declared nullable case at the Null token; round-trip equality is preserved because all nullable case constructors collapse to the same null union per the spec.
  • Source-generated metadata. Per-case nullability is carried on JsonUnionCaseInfo.IsNullable (see API additions below). The source generator emits it from the constructor parameter's nullability annotation and inserts a null-dispatch prologue in the generated UnionConstructor so the right case constructor is invoked for null without re-walking case metadata at runtime.

Schema generation

JsonSchemaExporter emits an anyOf schema composed from JsonTypeInfo.UnionCases, with shared JsonSchemaType values hoisted to the parent. For example, union(string, int) produces:

{
  "anyOf": [
    { "type": "string" },
    { "type": "integer" }
  ]
}

Schema output is classifier-invariant — the exporter does not invoke TypeClassifier, so swapping or removing a custom classifier does not change the generated schema. This mirrors polymorphic hierarchies, whose schema depends only on the registered DerivedTypes list and not on runtime discriminator resolution.

Each union case whose JsonUnionCaseInfo.IsNullable is true OR-ins JsonSchemaType.Null into its sub-schema. For union(string, int?):

{
  "anyOf": [
    { "type": "string" },
    { "type": ["integer", "null"] }
  ]
}

Source-generated metadata

For each compiler union, the source generator emits a CreateUnionInfo<T>(...) call populating JsonUnionInfoValues<T>. The case list is sorted most-derived-first using the topological-sort helpers shared with the rest of the source generator, so that the switch-based dispatch in the generated constructor and deconstructor never selects a base case before a derived one. Per-case nullability is recorded on JsonUnionCaseInfo.IsNullable and threads through both directions of the bridge:

class Dog;
class Lab : Dog;
class Bird;

union Pet(Dog, Lab, Bird?);

Generated metadata:

var unionInfo = new JsonUnionInfoValues<Pet>
{
    UnionCases = new JsonUnionCaseInfo[]
    {
        new JsonUnionCaseInfo(typeof(Lab)),                          // most-derived first
        new JsonUnionCaseInfo(typeof(Dog)),
        new JsonUnionCaseInfo(typeof(Bird)) { IsNullable = true },  // nullable case
    },
    UnionConstructor = static (Type _, object? value) => value switch
    {
        Lab caseValue0 => new Pet(caseValue0),
        Dog caseValue1 => new Pet(caseValue1),
        Bird caseValue2 => new Pet(caseValue2),
        null => new Pet((Bird?)null), // null dispatches to the nullable case
    },
    UnionDeconstructor = static (Pet value) => value switch
    {
        Lab caseValue0 => (typeof(Lab), caseValue0),
        Dog caseValue1 => (typeof(Dog), caseValue1),
        Bird caseValue2 => (typeof(Bird), caseValue2),
        null => (typeof(Bird), null), // null reports the nullable case type
    },
};

The Type parameter of UnionConstructor is unused in the generated form because pattern matching on value is sufficient to pick the right union case constructor; it is preserved on the public delegate for hand-written contracts that may want to dispatch on case type explicitly.

API proposal

namespace System.Text.Json.Serialization;

// Classifier abstraction

public delegate Type? JsonTypeClassifier(ref Utf8JsonReader reader);

public sealed class JsonTypeClassifierContext
{
    public Type DeclaringType { get; }
    public IReadOnlyList<JsonUnionCaseInfo> UnionCases { get; }
    public IReadOnlyList<JsonDerivedType> DerivedTypes { get; }
    public string? TypeDiscriminatorPropertyName { get; }
}

public abstract class JsonTypeClassifierFactory
{
    public abstract bool CanClassify(Type declaringType);
    public abstract JsonTypeClassifier CreateJsonClassifier(
        JsonTypeClassifierContext context, JsonSerializerOptions options);
}

public abstract class JsonTypeClassifierFactory<T> : JsonTypeClassifierFactory
{
    public sealed override bool CanClassify(Type declaringType) => declaringType == typeof(T);
}

// New options-level surface

public sealed partial class JsonSerializerOptions
{
    public IList<JsonTypeClassifierFactory> Classifiers { get; }
}

public sealed partial class JsonSourceGenerationOptionsAttribute
{
    public Type[]? Classifiers { get; set; }
}

// New attribute APIs

[AttributeUsage(AttributeTargets.Class | AttributeTargets.Struct, AllowMultiple = false, Inherited = false)]
public sealed class JsonUnionAttribute : JsonAttribute
{
    [DynamicallyAccessedMembers(DynamicallyAccessedMemberTypes.PublicParameterlessConstructor)]
    public Type? TypeClassifier { get; set; }
}

public sealed partial class JsonPolymorphicAttribute
{
    [DynamicallyAccessedMembers(DynamicallyAccessedMemberTypes.PublicParameterlessConstructor)]
    public Type? TypeClassifier { get; set; }
}
namespace System.Text.Json.Serialization.Metadata;

// Contract customization surface for union types

public sealed class JsonUnionCaseInfo
{
    public JsonUnionCaseInfo(Type caseType);
    public Type CaseType { get; }
    public bool IsNullable { get; init; }
}

public enum JsonTypeInfoKind
{
    /** Existing values **/
    // None = 0,
    // Object = 1,
    // Enumerable = 2,
    // Dictionary = 3,

    Union = 4,
}

public abstract partial class JsonTypeInfo
{
    public JsonTypeClassifier? TypeClassifier { get; set; }
    public IList<JsonUnionCaseInfo>? UnionCases { get; set; }
    public Func<Type, object?, object>? UnionConstructor { get; set; }
    public Func<object, (Type? CaseType, object? CaseValue)>? UnionDeconstructor { get; set; }
}

public sealed partial class JsonTypeInfo<T>
{
    public new Func<Type, object?, T>? UnionConstructor { get; set; }
    public new Func<T, (Type? CaseType, object? CaseValue)>? UnionDeconstructor { get; set; }
}

We also need to expose a few APIs specifically for consumption by the source generator:

[EditorBrowsable(EditorBrowsableState.Never)]
public sealed class JsonUnionInfoValues<T>
{
    public IList<JsonUnionCaseInfo>? UnionCases { get; init; }
    public Func<Type, object?, T>? UnionConstructor { get; init; }
    public JsonTypeClassifier? TypeClassifier { get; init; }
    public Func<T, (Type? CaseType, object? CaseValue)>? UnionDeconstructor { get; init; }
}

[EditorBrowsable(EditorBrowsableState.Never)]
public static partial class JsonMetadataServices
{
    public static JsonTypeInfo<T> CreateUnionInfo<T>(JsonSerializerOptions options, JsonUnionInfoValues<T> unionInfo) where T : notnull;
}

Alternatives considered

  • $type on union serialization — unions lack a natural discriminator, particularly when union cases are not objects. Discussed under Serialization: no discriminator.
  • Structural matching / content sniffing as defaults — discussed under Why not structural matching or content sniffing?.
  • Built-in classifier factories — deferred. The current design exposes only the abstraction; common policies can be added later as concrete factory subclasses without breaking existing code.

Risks

  • Breaking changes: none — all new surface.
  • Performance: default deserialization is O(1) and zero-buffering. Custom classifiers may opt into read-ahead, with the cost paid only when configured.

Prototype

Branch: json-unions — commit b11e3d00c23.

The prototype additionally implements closed-hierarchy/closed-enum support and the InferDerivedTypes API surface, which are deliberately out of scope for this issue and will be proposed separately once the corresponding compiler features ship.

Metadata

Metadata

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions