-
Notifications
You must be signed in to change notification settings - Fork 1
Description
What is this
We are going to want to validate json metadata files against specific schemas.
I do mean a json schema and not a schema defined by a python dataclasses (note: if you think of the api poc, jsonschema is what powers the schema validation with the schema being derrived from data classes - we're just using it directly instead here).
A json schema is language agnostic, so the goal here is to avoid hard coding data models in a python only way.
Jsonschema has good base level support, all we're after here is a thin wrapper to add some usability and quality of life.
Please put this code in /dpytools/validation/json/*
What to do
We're aiming for something like:
# data_dict is for passing in a dictionary to be validated
validate_json_schema("/path/to/schema.json", data_dict=some_dictionary)
# msg is to include a helpful context when debugging (i.e "what we're validating")
validate_json_schema("/path/to/schema.json", data_dict=some_dictionary, msg="Some helpful message should this validation fail")
validate_json_schema("/path/to/schema.json", data_path="/path/to/some/json", msg="Some helpful message should this validation fail")
# indent should pretty print the json contents of the error to make it
# more easily parsed by humans
validate_json_schema("/path/to/schema.json", data_dict=some_dictionary, indent=2)Nuances/requirments
- The positional argument should be one of:
- a local file
- a url
- Where the above is a url raise a
NotImplementedError, don't worry about remote schemas right now. - Raise a
ValueErroris someone tries to pass in bothdata_dictanddata_path.
Example json
{
"schema": "airflow.schemas.ingress.sdmx.v1.schema.json",
"required_files": [
{
"matches": "*.sdmx",
"count": 1
}
],
"supplementary_distributions": [
{
"matches": "*.sdmx",
"count": 1
}
],
"priority": 1,
"contact": [
"jobloggs@ons.gov.uk"
],
"pipeline": "default"
}Example schema
{
"$id": "airflow.schemas.ingress.sdmx.v1.json",
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"schema": {
"type": "string"
},
"required_files": {
"type": "array",
"items": [
{
"type": "object",
"properties": {
"matches": {
"type": "string"
},
"count": {
"type": "integer"
}
},
"required": [
"matches",
"count"
]
}
]
},
"supplementary_distributions": {
"type": "array",
"items": [
{
"type": "object",
"properties": {
"matches": {
"type": "string"
},
"count": {
"type": "integer"
}
},
"required": [
"matches",
"count"
]
}
]
},
"priority": {
"type": "integer"
},
"pipeline": {
"type": "string"
}
},
"required": [
"schema",
"required_files",
"supplementary_distributions",
"priority",
"pipeline"
]
}
Don't manually write a json schemas btw, this above was created with this (I just added an id).
Please do use this example in testing and add a few more json+schemas shapes of your own devising.
Acceptance Criteria
- Can validate against local schemas using both
data_dict=anddata_path=. - Exception can (optionally) include a message and nicely indented json output.
-
NotImplementedErroris raised if positional argument is a url. - Accompanying readme explaining usage with examples.
- Readme with user friendly examples included