Add Strict Metaschema Validation#1101
Conversation
d4cdf05 to
0b93060
Compare
|
Thanks for this @holmanb. I'm just starting the review amd will have some significant feedback tomorrow. In your suggested commit message could you capture the intent behind the strict_metaschema being a testing time validation of config module definitions rather than runtime validation. |
blackboxsw
left a comment
There was a problem hiding this comment.
@holmanb thanks for putting this together. A couple of minor requests inline and a general question for you about moving toward pytest to avoid all the stdout/error context manager work.
TheRealFalcon
left a comment
There was a problem hiding this comment.
A few inline questions/comments, but overall looking good.
blackboxsw
left a comment
There was a problem hiding this comment.
I think do need to include property descriptions in docs. We might be able to drop the individual module documentation from the command-line, but we need to fixup the argparse setup for the devel schema subcommand.
b4443e7 to
89ebf39
Compare
1ab87a1 to
3779ff2
Compare
blackboxsw
left a comment
There was a problem hiding this comment.
Thanks @holmanb any of these remaining comments are nits and I can +1 once you make any changes you think are relevant. Thank you for the restructure, I agree it is easier to read/maintain here and functionally works will with no degradation of existing behavior that I can see.
Approve pending any comments you decide to address and a passing CI run
blackboxsw
left a comment
There was a problem hiding this comment.
Combined diff of my musings for your thoughts. I think we want to minimally make the changes to simplify the get_json_validator return value to a 2-tuple instead of 3-tuple. But other diffs are included in case you find them reasonable.
diff --git a/cloudinit/config/schema.py b/cloudinit/config/schema.py
index 61f8cd9d..0c46cbdc 100644
--- a/cloudinit/config/schema.py
+++ b/cloudinit/config/schema.py
@@ -100,40 +100,48 @@ def get_jsonschema_validator():
# assignment-operation pylint warning which appears because this
# code path isn't written against the latest jsonschema).
types['string'] = (str, bytes) # pylint: disable=E1137
+ # disable bottom-level keys in meta-schema
+ strict_metaschema = deepcopymeta_schema=Draft4Validator.META_SCHEMA)
+ strict_metaschema["additionalProperties"] = False
cloudinitValidator = create(
- meta_schema=Draft4Validator.META_SCHEMA,
+ meta_schema=strict_metaschema,
validators=Draft4Validator.VALIDATORS,
version="draft4",
default_types=types)
- return (cloudinitValidator, FormatChecker, None)
+ return (cloudinitValidator, FormatChecker)
-def validate_cloudconfig_metaschema(schema: dict, throw=True):
+def validate_cloudconfig_metaschema(
+ validator, schema: dict, throw: bool = True
+):
"""Validate provided schema meets the metaschema definition. Return strict
Validator and FormatChecker for use in validation
+ @param validator: Draft4Validator instance used to validate the schema
@param schema: schema to validate
@param throw: Sometimes the validator and checker are required, even if
- the schema is invalid. Toggle for whether to raise SchemaErrors
-
- @returns: Tuple(jsonschema.Validator, jsonschema.FormatChecker, err)
+ the schema is invalid. Toggle for whether to raise
+ SchemaValidationError orlog warnings.
@raises: ImportError when jsonschema is not present
- @raises: jsonschema.exceptions.SchemaError if the schema is invalid
+ @raises: SchemaValidationError when the schema is invalid
"""
from jsonschema.exceptions import SchemaError
- (cloudinitValidator, FormatChecker, _) = get_jsonschema_validator()
try:
-
- # disable bottom-level keys
- cloudinitValidator.META_SCHEMA["additionalProperties"] = False
- cloudinitValidator.check_schema(schema)
- except SchemaError as e:
+ validator.check_schema(schema)
+ except SchemaError as err:
+ # Raise SchemaValidationError so we don't have jsonschema imports
+ # at call sites.
if throw:
- raise e
- return (cloudinitValidator, FormatChecker, e)
- return (cloudinitValidator, FormatChecker, None)
+ raise SchemaValidationError(
+ errors=(('.'.join([str(p) for p in err.path]), err.message),)
+ )
+ logging.warning(
+ "Meta-schema validation failed, attempting to validate config"
+ " anyway: %s",
+ err
+ )
def validate_cloudconfig_schema(
@@ -154,20 +162,14 @@ def validate_cloudconfig_schema(
against the provided schema.
"""
try:
- (cloudinitValidator, FormatChecker, err) = (
- get_jsonschema_validator()
- if not strict_metaschema
- else validate_cloudconfig_metaschema(schema, throw=False)
- )
+ (cloudinitValidator, FormatChecker) = get_jsonschema_validator()
+ if strict_metaschema:
+ validate_cloudconfig_metaschema(
+ cloudinitValidator, schema, throw=False
+ )
except ImportError:
logging.debug("Ignoring schema validation. jsonschema is not present")
return
- if err:
- logging.warning(
- "Meta-schema validation failed, attempting to validate config"
- " anyway: %s",
- err,
- )
validator = cloudinitValidator(schema, format_checker=FormatChecker())
errors = ()
for error in sorted(validator.iter_errors(config), key=lambda e: e.path):dfabeaa to
c93fde0
Compare
Thanks for the help on this. I wasn't sure about the best approach to the failure paths with this one. Besides one item, I think these are all improvements.
The problem with this is that get_jsonschema_validator() conditionally produces validator/formatchecker differently based on jsonschema version. The change proposed here only applies to the pre-3.0 jsonschema branch. I think we want both branches to be capable of strict validation. Assuming I don't see anything else in need of change I plan on applying the other proposals you made, in addition to a modified version of this one. Thanks! |
4aea9ce to
7dfe838
Compare
blackboxsw
left a comment
There was a problem hiding this comment.
Thank you for all the discussion and work here @holmanb +1. LGTM
Proposed Commit Message
Test Steps
Many of the files were only touched for function name/arg/schema/meta updates, however schema.py, test_schema.py, test_cli.py had the most substantial changes.
Checklist: