Skip to content

validate json #17

@mikeAdamss

Description

@mikeAdamss

What is this

We are going to want to validate json metadata files against specific schemas.

I do mean a json schema and not a schema defined by a python dataclasses (note: if you think of the api poc, jsonschema is what powers the schema validation with the schema being derrived from data classes - we're just using it directly instead here).

A json schema is language agnostic, so the goal here is to avoid hard coding data models in a python only way.

Jsonschema has good base level support, all we're after here is a thin wrapper to add some usability and quality of life.

Please put this code in /dpytools/validation/json/*

What to do

We're aiming for something like:

# data_dict is for passing in a dictionary to be validated
validate_json_schema("/path/to/schema.json", data_dict=some_dictionary)

# msg is to include a helpful context when debugging (i.e "what we're validating")
validate_json_schema("/path/to/schema.json", data_dict=some_dictionary, msg="Some helpful message should this validation fail")

validate_json_schema("/path/to/schema.json", data_path="/path/to/some/json", msg="Some helpful message should this validation fail")

# indent should pretty print the json contents of the error to make it
# more easily parsed by humans
validate_json_schema("/path/to/schema.json", data_dict=some_dictionary, indent=2)

Nuances/requirments

  • The positional argument should be one of:
    • a local file
    • a url
  • Where the above is a url raise a NotImplementedError, don't worry about remote schemas right now.
  • Raise a ValueError is someone tries to pass in both data_dict and data_path.

Example json

{
    "schema": "airflow.schemas.ingress.sdmx.v1.schema.json",
    "required_files": [
        {
            "matches": "*.sdmx",
            "count": 1 
        }
    ],
    "supplementary_distributions": [
        {
            "matches": "*.sdmx",
            "count": 1 
        }
    ],
    "priority": 1,
    "contact": [
      "jobloggs@ons.gov.uk"
    ],
    "pipeline": "default"
}

Example schema

{
    "$id": "airflow.schemas.ingress.sdmx.v1.json",
    "$schema": "http://json-schema.org/draft-04/schema#",
    "type": "object",
    "properties": {
      "schema": {
        "type": "string"
      },
      "required_files": {
        "type": "array",
        "items": [
          {
            "type": "object",
            "properties": {
              "matches": {
                "type": "string"
              },
              "count": {
                "type": "integer"
              }
            },
            "required": [
              "matches",
              "count"
            ]
          }
        ]
      },
      "supplementary_distributions": {
        "type": "array",
        "items": [
          {
            "type": "object",
            "properties": {
              "matches": {
                "type": "string"
              },
              "count": {
                "type": "integer"
              }
            },
            "required": [
              "matches",
              "count"
            ]
          }
        ]
      },
      "priority": {
        "type": "integer"
      },
      "pipeline": {
        "type": "string"
      }
    },
    "required": [
      "schema",
      "required_files",
      "supplementary_distributions",
      "priority",
      "pipeline"
    ]
  }

Don't manually write a json schemas btw, this above was created with this (I just added an id).

Please do use this example in testing and add a few more json+schemas shapes of your own devising.

Acceptance Criteria

  • Can validate against local schemas using both data_dict= and data_path=.
  • Exception can (optionally) include a message and nicely indented json output.
  • NotImplementedError is raised if positional argument is a url.
  • Accompanying readme explaining usage with examples.
  • Readme with user friendly examples included

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions