Skip to content

Use discriminator to parse correct union type#131

Merged
joel-bach merged 4 commits intoHaskell-OpenAPI-Code-Generator:masterfrom
mjgpy3:respect-discrim
Mar 17, 2026
Merged

Use discriminator to parse correct union type#131
joel-bach merged 4 commits intoHaskell-OpenAPI-Code-Generator:masterfrom
mjgpy3:respect-discrim

Conversation

@mjgpy3
Copy link
Copy Markdown
Contributor

@mjgpy3 mjgpy3 commented Mar 12, 2026

The docs that I'm referencing are here: https://swagger.io/docs/specification/v3_0/data-models/inheritance-and-polymorphism/#discriminator

They are a little confusing, underly prescriptive (IMHO), and lack specificity
so I may have gotten some things wrong here. I figure though, as a first usable
pass, this might be "good enough" and we can iron out the edge cases should they
appear.

What's discriminator?

It specifies a propertyName which is used as a key to discriminate between
different types that a type might take (e.g. oneOf).

I'm fixing this in my Freckle day job because we have a few types which don't
have meaningful distinction in structure (e.g. { tag: 'comma' } | { tag: 'newline' }). Without discriminator, or something like it, our generated
parsers are just picking the first case when they should respect tag.
discriminator is designed to give this hint to the clients.

Key notes

  • The docs say this should work in anyOf but I can't really comprehend how
    that could work. I'm just ignoring it for now and putting it in the "later
    improvement" bucket since Freckle doesn't need that feature.
  • The docs also say that mappings are optional. If they're not present, then
    the schema's name will be used as the property value that's checked. The
    Lizard test case tries this.
  • I don't really know how far to take schema validation in this codebase. For
    example, the docs don't really say what should happen if there's a ref in
    mappings that's not in the oneOf (or vice versa). I could check this and
    log a warning but I opted for less until I hear otherwise. I really wish the
    schema was better structured to just eliminate these cases but here we are.
  • This is my first template Haskell so please lemme know if I should be doing
    something different.
  • The indexed-variant setting was tricky. I think I've gotten it right here but could use some checking (perhaps I should write a test?)

mjgpy3 added 2 commits March 10, 2026 16:12
The example (I believe) is good. Changes that are made to the generated code
will help me check that my changes to the generator are correct.

This is documented/defined here: https://swagger.io/docs/specification/v3_0/data-models/inheritance-and-polymorphism/#discriminator
The docs that I'm referencing are here: https://swagger.io/docs/specification/v3_0/data-models/inheritance-and-polymorphism/#discriminator

They are a little confusing, underly prescriptive (IMHO), and lack specificity
so I may have gotten some things wrong here. I figure though, as a first usable
pass, this might be "good enough" and we can iron out the edge cases should they
appear.

**What's `discriminator`?**

It specifies a `propertyName` which is used as a key to discriminate between
different types that a type might take (e.g. `oneOf`).

I'm fixing this in my Freckle day job because we have a few types which don't
have meaningful distinction in structure (e.g. `{ tag: 'comma' } | { tag:
'newline' }`). Without `discriminator`, or something like it, our generated
parsers are just picking the first case when they should respect `tag`.
`discriminator` is designed to give this hint to the clients.

**Key notes**

- The docs say this should work in `anyOf` but I can't really comprehend how
that _could_ work. I'm just ignoring it for now and putting it in the "later
improvement" bucket since Freckle doesn't need that feature.
- The docs also say that `mappings` are optional. If they're not present, then
the schema's name will be used as the property value that's checked. The
`Lizard` test case tries this.
- I don't really know how far to take schema validation in this codebase. For
example, the docs don't really say what should happen if there's a `ref` in
`mappings` that's not in the `oneOf` (or vice versa). I _could_ check this and
log a warning but I opted for less until I hear otherwise. I really wish the
schema was better structured to just eliminate these cases but here we are.
- This is my first template Haskell so please lemme know if I should be doing
something different.
@mjgpy3
Copy link
Copy Markdown
Contributor Author

mjgpy3 commented Mar 12, 2026

Should be sufficient to resolve #125

and provide the functionality that Freckle needs.

@mjgpy3
Copy link
Copy Markdown
Contributor Author

mjgpy3 commented Mar 12, 2026

Ping @joel-bach to help ensure you see this.

cc @chris-martin

Comment on lines +373 to +397
Lizard:
type: object
oneOf:
- $ref: '#/components/schemas/gecko'
- $ref: '#/components/schemas/gilaMonster'
discriminator:
propertyName: lizardType
gecko:
type: object
properties:
hasTail:
type: boolean
lizardType:
type: string
required:
- lizardType
gilaMonster:
type: object
properties:
hasTail:
type: boolean
lizardType:
type: string
required:
- lizardType
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason only the lizard schema names start with a lower case letter? Might be good to keep them consistent to clarify that that's not something pertinent to what the test is covering, because currently it stands out and makes me wonder.

Suggested change
Lizard:
type: object
oneOf:
- $ref: '#/components/schemas/gecko'
- $ref: '#/components/schemas/gilaMonster'
discriminator:
propertyName: lizardType
gecko:
type: object
properties:
hasTail:
type: boolean
lizardType:
type: string
required:
- lizardType
gilaMonster:
type: object
properties:
hasTail:
type: boolean
lizardType:
type: string
required:
- lizardType
Lizard:
type: object
oneOf:
- $ref: '#/components/schemas/Gecko'
- $ref: '#/components/schemas/GilaMonster'
discriminator:
propertyName: lizardType
Gecko:
type: object
properties:
hasTail:
type: boolean
lizardType:
type: string
required:
- lizardType
GilaMonster:
type: object
properties:
hasTail:
type: boolean
lizardType:
type: string
required:
- lizardType

Copy link
Copy Markdown
Contributor Author

@mjgpy3 mjgpy3 Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chris-martin I made them lower-case to test that we're not accidentally using haskell-ified propertyName discriminator values in the case obj .:? propertyName cases.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's great to test this behavior but it would make it clearer if you'd use more distinct values in the mapping, e.g. guppieType or guppieDiscriminator (etc.) so it's clear we're not just lowercasing some value. And then you can keep the rest consistent with the casing. What do you think?

Comment on lines +435 to +438
oneOfSchemaRefs = do
(ref, (_, name')) <- Map.toList schemaLookupFromRef
pure (name', ref)
propertyNamesWithReferences = maybe oneOfSchemaRefs Map.toList $ OAS.discriminatorObjectMapping disc
Copy link
Copy Markdown
Contributor

@chris-martin chris-martin Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments would help me here

Suggested change
oneOfSchemaRefs = do
(ref, (_, name')) <- Map.toList schemaLookupFromRef
pure (name', ref)
propertyNamesWithReferences = maybe oneOfSchemaRefs Map.toList $ OAS.discriminatorObjectMapping disc
-- Association of discriminator values to schema names
propertyNamesWithReferences =
case OAS.discriminatorObjectMapping disc of
Just objectMapping ->
-- When present, use the associations explicitly given by the `mapping` property in the spec
Map.toList objectMapping
Nothing -> do
-- When the spec does not provide a `mapping` property, default to using the names of the schemas as discriminator values.
(ref, (_, name')) <- Map.toList schemaLookupFromRef
pure (name', ref)

Copy link
Copy Markdown
Contributor Author

@mjgpy3 mjgpy3 Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about commenting this stuff well but there were almost no comments around here. That's probably a bad excuse though. I can add some if the maintainer(s) agree(s).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please feel free to add comments! I also prefer case over maybe here but both are fine.

@chris-martin
Copy link
Copy Markdown
Contributor

chris-martin commented Mar 12, 2026

The docs say this should work in anyOf but I can't really comprehend how that could work.

I think I can explain.

If the components of your subtype have their discriminator field pinned to a particular value using enum (or const, as of openapi 3.1), then the parsers are mutually exclusive. This is what any "normal" union e.g. generated from a Haskell sum type will look like.

components:
  schemas:
    Pet:
      type: object
      oneOf: # ⚠️
       - $ref: '#/components/schemas/Cat'
       - $ref: '#/components/schemas/Dog'
      discriminator:
        propertyName: petType
        mapping:
          cat: '#/components/schemas/Cat'
          dog: '#/components/schemas/Dog'
    Cat:
      type: object
      properties:
        petType:
          type: string
          enum: ['cat'] # ⚠️
      required:
        - petType
    Dog:
      type: object
      properties:
        petType:
          type: string
          enum: ['dog'] # ⚠️
      required:
        - petType

In this situation, I believe it doesn't matter whether you use oneOf or anyOf, since the only difference is that oneOf asserts that any value satisfying more than one schema is invalid (in other words, consumers are instructed to fail rather than to resolve ambiguity arbitrarily or via the discriminator); but since there are no such values, the assertion is vacuous. In a case like above,

  • oneOf is probably a better choice, just as a cue to the reader that this isn't a weird type.
  • If a discriminator definition is only offered, it is offered for the sake of documentation/performance but not strictly necessary.

(Aside: One of the bugs in this library, iirc, is that an enum with a single value won't be validated at all - e.g. parseJSON @Cat will happily parse a value with {"petType": "wizard"}" - and so the generated parsers for Dog and Cat will not actually be mutually exclusive. Fixing that bug would be another way to address the issue for us without considering the discriminator.)

But what if the Dog and Cat petTypes were not pinned with an enum in this way?

components:
  schemas:
    Pet:
      type: object
      anyOf: # ⚠️
       - $ref: '#/components/schemas/Cat'
       - $ref: '#/components/schemas/Dog'
      discriminator:
        propertyName: petType
        mapping:
          cat: '#/components/schemas/Cat'
          dog: '#/components/schemas/Dog'
    Cat:
      type: object
      properties:
        petType:
          type: string
          # ⚠️
      required:
        - petType
    Dog:
      type: object
      properties:
        petType:
          type: string
          # ⚠️
      required:
        - petType

This is dumb, but technically allowed. In this case, the discriminator is the only thing that tells a consumer whether what they're looking at is a Cat or a Dog. Without a discriminator, to interpret {"petType": "dog"} as a Pet, you would have to decide arbitrarily whether it is a Cat or a Dog.

I think your sense is right that most cases where you'd be using anyOf are not cases where it would be possible to use discriminator to describe anything of interest, but the above illustrates one exception. Whether this exception is significant to anyone or merely a technical curiosity, I cannot say. Really the larger conclusion to me from this thought exercise is that anyOf is crazy and one ought to avoid it altogether. But, that's me as a Haskeller: I like all my unions disjoint.

Comment on lines +63 to +67
GHC.Maybe.Just propertyName -> case propertyName :: Data.Text.Internal.Text of
{"guppie" -> FishGuppie Data.Functor.<$> Data.Aeson.Types.FromJSON.parseJSON val;
"minnow" -> FishMinnow Data.Functor.<$> Data.Aeson.Types.FromJSON.parseJSON val;
"shark" -> FishShark Data.Functor.<$> Data.Aeson.Types.FromJSON.parseJSON val;
_unmatched -> Control.Monad.Fail.fail "No match for discriminator property"}}}) val}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just commenting here to highlight for other reviewers: This case expression is the substantive effect of the PR.

Copy link
Copy Markdown
Contributor

@chris-martin chris-martin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have always really struggled to read Template Haskell code, but I think the results in the golden output mostly speak for themselves.

@NorfairKing
Copy link
Copy Markdown
Contributor

I have always really struggled to read Template Haskell code, but I think the results in the golden output mostly speak for themselves.

Golden tests FTW!

Copy link
Copy Markdown
Member

@joel-bach joel-bach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for your work @mjgpy3 ! This looks good to me 👍

Regarding anyOf: I agree with @chris-martin 's sentiment

Really the larger conclusion to me from this thought exercise is that anyOf is crazy and one ought to avoid it altogether. But, that's me as a Haskeller: I like all my unions disjoint.

If someone wants to use this and has a legitimate case, they can let us know and provide a contribution. From my POV, your PR addresses the main use case which makes it worth merging.

I don't really know how far to take schema validation in this codebase.

I am fine with the current approach. This is not a schema validator and there are better tools for that out there. If the generated code does not compile due to a non-sense schema (which I think is currently the case), it's not ideal but at least you'll get a hint something's up. If you want to add a warning that's even better but not a requirement from my side.

This is my first template Haskell so please lemme know if I should be doing
something different.

It is fine from my POV, as the others said, the golden test output is the relevant part here.

The indexed-variant setting was tricky. I think I've gotten it right here but could use some checking (perhaps I should write a test?)

Ideally, we'd have multiple configurations for the golden tests (with minimal specs to demonstrate the specific setting). I do not have the time on my hands to adjust this atm but if you want to give it a crack, feel free. I'd suggest doing that in a separate PR though. But in any case, I am fine with the current implementation of this.

One part which I would very much like to have is a level 3 test (essentially an e2e test). It's not a strict requirement to merge this PR but it would be helpful to not only verify that the code looks correct in the output but also does the deserialization correctly. Let me know if you can do this and if you need pointers.

I've also let the CI run and the only part failing is the formatting. Could you make sure to let the ormolu formatting run in the pre-commit hook?

Comment on lines +373 to +397
Lizard:
type: object
oneOf:
- $ref: '#/components/schemas/gecko'
- $ref: '#/components/schemas/gilaMonster'
discriminator:
propertyName: lizardType
gecko:
type: object
properties:
hasTail:
type: boolean
lizardType:
type: string
required:
- lizardType
gilaMonster:
type: object
properties:
hasTail:
type: boolean
lizardType:
type: string
required:
- lizardType
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's great to test this behavior but it would make it clearer if you'd use more distinct values in the mapping, e.g. guppieType or guppieDiscriminator (etc.) so it's clear we're not just lowercasing some value. And then you can keep the rest consistent with the casing. What do you think?

Comment on lines +435 to +438
oneOfSchemaRefs = do
(ref, (_, name')) <- Map.toList schemaLookupFromRef
pure (name', ref)
propertyNamesWithReferences = maybe oneOfSchemaRefs Map.toList $ OAS.discriminatorObjectMapping disc
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please feel free to add comments! I also prefer case over maybe here but both are fine.

Nothing -> []
Just (n, caseName) -> do
let
suffix = if OAO.settingUseNumberedVariantConstructors settings then "Variant" <> T.pack (show n) else ""
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, you'd extract the suffix generation above and reuse it here but I will not block the PR on this.

mjgpy3 added 2 commits March 16, 2026 08:39
**Why?**

To show that we're not just following some incorrect convention.
@mjgpy3 mjgpy3 requested a review from joel-bach March 16, 2026 12:40
@mjgpy3
Copy link
Copy Markdown
Contributor Author

mjgpy3 commented Mar 16, 2026

Thanks @joel-bach. I believe that I've applied the more important suggestions. Please let me know your thoughts!

Copy link
Copy Markdown
Member

@joel-bach joel-bach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is ready to be merged. Are you planning to address other parts of my comment here or in a separate PR?

@mjgpy3
Copy link
Copy Markdown
Contributor Author

mjgpy3 commented Mar 16, 2026

@joel-bach I can try to follow up on some of those points in a separate PR if that's okay

@joel-bach
Copy link
Copy Markdown
Member

I can try to follow up on some of those points in a separate PR if that's okay

That's fine with me, I'll merge the PR 👍

@joel-bach joel-bach merged commit c01eecc into Haskell-OpenAPI-Code-Generator:master Mar 17, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants