Skip to content

Support for the YAML 1.2 Core and JSON schemas [Take 2]#555

Open
perlpunk wants to merge 3 commits into
yaml:mainfrom
perlpunk:yaml12-take2
Open

Support for the YAML 1.2 Core and JSON schemas [Take 2]#555
perlpunk wants to merge 3 commits into
yaml:mainfrom
perlpunk:yaml12-take2

Conversation

@perlpunk
Copy link
Copy Markdown
Member

@perlpunk perlpunk commented Sep 22, 2021

Supersedes #512

This is a draft and subject to discussion.
See also #486

(For #512: Thanks to @SUSE for another hackweek! I had four days of work time dedicated to an open source project of my choice. https://hackweek.suse.com/20/projects/yaml-1-dot-2-schema-support-for-pyyaml)
Thanks to @SUSE for a volunteer day I used to make this continuation to my previous PR.

This PR depends on #483

Introduction

For a quick overview of the schema changes between YAML 1.1 and 1.2, look here: https://perlpunk.github.io/yaml-test-schema/schemas.html

While also the syntax was changed in YAML 1.2, this pull request is about the schema changes.
As an example, in 1.1, Y, yes, NO, on etc. are resolved as booleans in 1.1.
This sounds convenient, but also means that all these 22 different strings must be quoted if they are not meant as booleans. A very common obstacle is the country code for Norway, NO ("Norway Problem").
In YAML 1.2 this was improved by reducing the list of boolean representations.

Also other types have been improved. The 1.1 regular expression for float allows . and ._ as floats, although there isn't a single digit in these strings.

While the 1.2 Core Schema, the recommended default for 1.2, still allows a few variations (true, True and TRUE, etc.), the 1.2 JSON Schema is there to match JSON behaviour regarding types, so it allows only true and false.

Current State

PyYAML implements the 1.1 types (with a few changes like leaving out the single character booleans y, Y etc.), and it was never updated to support one of the 1.2 Schemas.

Problem

Besides the above mentioned problems with the 1.1 types, more and more libraries are created or updated for YAML 1.2, probably also thanks to the relatively new YAML Test Suite, and PyYAML should be able to read and write YAML files used or produced by other libraries.

This PR

The PyYAML Safeloader, which is currently the most recommended Loader if you don't need special behaviour, implements YAML 1.1 types.

I added tagsets for yaml11, json, core.
This way people can try out a YAML 1.2 Loader with little code:

    class MyCoreLoader(yaml.BaseLoader): pass
    class MyCoreDumper(yaml.CommonDumper): pass
    MyCoreLoader.init_tags('core')
    MyCoreDumper.init_tags('core')
    yaml.load(y, Loader=MyCoreLoader)

Out of Scope

One problem is that PyYAML's callbacks are class based, and while I was able to make the code a bit more compact via a dictionary of types/callbacks, there are still method calls which must be in a certain class.
The !!merge << key for example needs special handling.

That way it's tedious to add custom Loaders. Turning the class based approach into an instance based is on our wishlist.

One example use case we have in mind is, that you want to use the 1.2 CoreLoader, but on top of that you want it to recognize timestamps and mergekeys.
Or you want a very basic loader that should treat everything as a string except booleans and null.

Example

        import yaml
    
        class MyCoreLoader(yaml.BaseLoader): pass
        class MyJSONLoader(yaml.BaseLoader): pass
        class MyCoreDumper(yaml.CommonDumper): pass
        class MyJSONDumper(yaml.CommonDumper): pass
    
        MyCoreLoader.init_tags('core')
        MyJSONLoader.init_tags('json')
    
        MyCoreDumper.init_tags('core')
        MyJSONDumper.init_tags('json')
    
        input = """
        - TRUE
        - yes
        - ~
        - true
        #- .inf
        #- 23
        #- #empty
        #- !!str #empty
        #- 010
        #- 0o10
        #- 0b100
        #- 0x20
        #- -0x20
        #- 1_000
        #- 3:14
        #- 0011
        #- +0
        #- 0001.23
        #- !!str +0.3e3
        #- +0.3e3
        #- &x foo
        #- *x
        #- 1e27
        #- 1x+27
        """
    
        print('--------------------------------------------- BaseLoader')
        data = yaml.load(input, Loader=yaml.BaseLoader)
        print(data)
        print('--------------------------------------------- SafeLoader')
        data = yaml.load(input, Loader=yaml.SafeLoader)
        print(data)
        print('--------------------------------------------- CoreLoader')
        data = yaml.load(input, Loader=MyCoreLoader)
        print(data)
        print('--------------------------------------------- JSONLoader')
        data = yaml.load(input, Loader=MyJSONLoader)
        print(data)
    
        print('--------------------------------------------- SafeDumper')
        out = yaml.dump(data, Dumper=yaml.SafeDumper)
        print(out)
        print('--------------------------------------------- MyCoreDumper')
        out = yaml.dump(data, Dumper=MyCoreDumper)
        print(out)
        print('--------------------------------------------- MyJSONDumper')
        out = yaml.dump(data, Dumper=MyJSONDumper)
        print(out)

@perlpunk perlpunk changed the base branch from master to release/6.0 September 22, 2021 15:30
@perlpunk perlpunk marked this pull request as draft September 22, 2021 15:30
@perlpunk perlpunk force-pushed the yaml12-take2 branch 3 times, most recently from 11f107a to ecf1ed4 Compare September 22, 2021 18:07
@perlpunk perlpunk marked this pull request as ready for review September 22, 2021 18:09
@perlpunk perlpunk changed the title WIP Support for the YAML 1.2 Core and JSON schemas [Take 2] Support for the YAML 1.2 Core and JSON schemas [Take 2] Sep 22, 2021
@shelper
Copy link
Copy Markdown

shelper commented Oct 27, 2021

any updates on this?

@kislyuk
Copy link
Copy Markdown

kislyuk commented Jul 21, 2022

@perlpunk anything we can do to help push this along?

@ssbarnea
Copy link
Copy Markdown

I wonder when this will be merged...

@perlpunk
Copy link
Copy Markdown
Member Author

@ingydotnet

@ingydotnet
Copy link
Copy Markdown
Member

@ssbarnea I'll bring up the task of putting out a new pyyaml release, with the release team.

I suspect this would be merged into the next release.

@SubaruArai
Copy link
Copy Markdown

@ingydotnet 6.0.1 is out, can we have an eta on the next major/minor release that'll merge this pr?

@perlpunk perlpunk changed the base branch from release/6.0 to main November 10, 2023 20:42
so that other classes inheriting from it can use them

* Move methods from SafeConstructor to BaseConstructor
* Move methods from SafeRepresenter to BaseRepresenter
More and more YAML libraries are implementing YAML 1.2, either new ones
simply starting with 1.2 or older ones adding support for it.

While also the syntax was changed in YAML 1.2, this pull request is about the
schema changes.

As an example, in 1.1, Y, yes, NO, on etc. are resolved as booleans in 1.1.

This sounds convenient, but also means that all these 22 different strings must
be quoted if they are not meant as booleans. A very common obstacle is the
country code for Norway, NO ("Norway Problem").

In YAML 1.2 this was improved by reducing the list of boolean representations.

Also other types have been improved. The 1.1 regular expression for float allows
. and ._ as floats, although there isn't a single digit in these strings.

While the 1.2 Core Schema, the recommended default for 1.2, still allows a few
variations (true, True and TRUE, etc.), the 1.2 JSON Schema is there to match
JSON behaviour regarding types, so it allows only true and false.

Note that this implementation of the YAML JSON Schema might not be exactly like
the spec defines it (all plain scalars not resolving to numbers, null or
booleans would be an error).

Short usage example:

    class MyCoreLoader(yaml.BaseLoader): pass
    class MyCoreDumper(yaml.CommonDumper): pass
    MyCoreLoader.init_tags('core')
    MyCoreDumper.init_tags('core')
    data = yaml.load(input, Loader=MyCoreLoader)
    output = yaml.dump(data, Dumper=MyCoreDumper)

Detailed example code to play with:

    import yaml

    class MyCoreLoader(yaml.BaseLoader): pass
    MyCoreLoader.init_tags('core')

    class MyJSONLoader(yaml.BaseLoader): pass
    MyJSONLoader.init_tags('json')

    class MyCoreDumper(yaml.CommonDumper): pass
    MyCoreDumper.init_tags('core')

    class MyJSONDumper(yaml.CommonDumper): pass
    MyJSONDumper.init_tags('json')

    input = """
    - TRUE
    - yes
    - ~
    - true
    #- .inf
    #- 23
    #- #empty
    #- !!str #empty
    #- 010
    #- 0o10
    #- 0b100
    #- 0x20
    #- -0x20
    #- 1_000
    #- 3:14
    #- 0011
    #- +0
    #- 0001.23
    #- !!str +0.3e3
    #- +0.3e3
    #- &x foo
    #- *x
    #- 1e27
    #- 1x+27
    """

    print('--------------------------------------------- BaseLoader')
    data = yaml.load(input, Loader=yaml.BaseLoader)
    print(data)
    print('--------------------------------------------- SafeLoader')
    data = yaml.load(input, Loader=yaml.SafeLoader)
    print(data)
    print('--------------------------------------------- CoreLoader')
    data = yaml.load(input, Loader=MyCoreLoader)
    print(data)
    print('--------------------------------------------- JSONLoader')
    data = yaml.load(input, Loader=MyJSONLoader)
    print(data)

    print('--------------------------------------------- SafeDumper')
    out = yaml.dump(data, Dumper=yaml.SafeDumper)
    print(out)
    print('--------------------------------------------- MyCoreDumper')
    out = yaml.dump(data, Dumper=MyCoreDumper)
    print(out)
    print('--------------------------------------------- MyJSONDumper')
    out = yaml.dump(data, Dumper=MyJSONDumper)
    print(out)
This way people can play with it, and we don't promise this wrapper will stay
around forever, and newly created classes CommonDumper/CommonRepresenter aren't
exposed.

    MyCoreLoader = yaml.experimental_12_Core_loader()
    data = yaml.load(input, Loader=MyCoreLoader)

    MyCoreDumper = yaml.experimental_12_Core_dumper()
    out = yaml.dump(data, Dumper=MyCoreDumper)
@perlpunk
Copy link
Copy Markdown
Member Author

I just created a yamlcore package that allows you to use YAML 1.2 Core Tags on top of the PyYAML BaseLoader. As this PR is blocked on the API redesign, I decided to create something that users can use today already.
Feedback welcome, it is my first package on pypi :)

@God-damnit-all
Copy link
Copy Markdown

The contents of this PR are included in #700; we've spent much of this past week iterating on that locally in preparation for an upcoming PyYAML 7.0.0a1 in the next couple of weeks (an update/replacement to #700 should happen today with the recent changes).

Where can I find PyYAML 7.0.0a1 so that I can test it?

@ugomancz
Copy link
Copy Markdown

ugomancz commented Apr 8, 2025

@nitzmahone Hi, any updates on either this or #700?

@geyang
Copy link
Copy Markdown

geyang commented May 11, 2025

looks like this is ready to be merged @perlpunk

@God-damnit-all
Copy link
Copy Markdown

looks like this is ready to be merged @perlpunk

It sure is.

@shermansiu
Copy link
Copy Markdown

@nitzmahone Could you please look at this? Because you're the most active maintainer?

@geyang
Copy link
Copy Markdown

geyang commented Aug 10, 2025

The contents of this PR are included in #700; we've spent much of this past week iterating on that locally in preparation for an upcoming PyYAML 7.0.0a1 in the next couple of weeks (an update/replacement to #700 should happen today with the recent changes).

does this mean this is already merged?

@shermansiu
Copy link
Copy Markdown

No, #700 is a WIP PR that has not been merged yet

@shermansiu
Copy link
Copy Markdown

@ingydotnet Ingy, could you please look at this and merge it? It has two approvals and is written by perlpunk, who's a yaml member.

Also, does PyYAML need more maintainers?

@shermansiu
Copy link
Copy Markdown

Hmm, it seems like the plan is to not merge this PR in favor of #700, which has been delayed. I think it may be worth trying to complete the remaining to-do items for that PR to get it merged

@perlpunk
Copy link
Copy Markdown
Member Author

In case you haven't read it, you can also use https://pypi.org/project/yamlcore/ on top of PyYAML for YAML 1.2.
(I recently noticed that it doesn't work if the libyaml part wasn't built, so it unsuccessfully tries to import CLoader etc., but that should be easy to fix.)

@shermansiu
Copy link
Copy Markdown

Thanks for answering, perlpunk! It's good to know that there's a package that we can already use!

But in the meantime, is there anything we can do to help accelerate the process for some of the PRs?

@perlpunk
Copy link
Copy Markdown
Member Author

perlpunk commented Aug 12, 2025

I just assume it's missing time, because this is work done in people's free time :-/
(edit: I personally can't help much with #700 as my python knowledge isn't good enough for this)

@shermansiu
Copy link
Copy Markdown

Gotcha, thanks! In that case, I suppose it might be worth creating a continuation of #700 that completes all of the remaining to-do items!

@danijar
Copy link
Copy Markdown

danijar commented Sep 18, 2025

Hi team, would it be possible to merge the fix for scientific notation please?

There was a simple PR in #174 that has been marked as "superseded" by #555 yet has not been merged for over six years.

This bug is the only reason preventing me from using PyYAML.

@shermansiu
Copy link
Copy Markdown

#555 itself has also been superseded by #700, which is incomplete because no one has had the time to finish it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.