Skip to content

Add Strict Metaschema Validation#1101

Merged
blackboxsw merged 11 commits into
canonical:mainfrom
holmanb:holmanb/metaschema
Dec 6, 2021
Merged

Add Strict Metaschema Validation#1101
blackboxsw merged 11 commits into
canonical:mainfrom
holmanb:holmanb/metaschema

Conversation

@holmanb
Copy link
Copy Markdown
Member

@holmanb holmanb commented Nov 5, 2021

Proposed Commit Message

Improve schema validation.

This adds strict validation of config module definitions at testing
time, with plumbing included for future runtime validation. This
eliminates a class of bugs resulting from schemas that have definitions
that are incorrect, but get interpreted by jsonschema as
"additionalProperties" that are therefore ignored.


- Add strict meta-schema for jsonschema unit test validation
- Separate schema from module metadata structure
- Improve type annotations for various functions and data types

Cleanup:
- Remove unused jsonschema "required" elements 
- Eliminate manual memoization in schema.py:get_schema(),
        reference module.__doc__ directly

Test Steps

Many of the files were only touched for function name/arg/schema/meta updates, however schema.py, test_schema.py, test_cli.py had the most substantial changes.

Checklist:

  • My code follows the process laid out in the documentation
  • I have updated or added any unit tests accordingly
  • I have updated or added any documentation accordingly

@holmanb holmanb force-pushed the holmanb/metaschema branch 2 times, most recently from d4cdf05 to 0b93060 Compare November 16, 2021 19:52
Comment thread tests/unittests/test_handler/test_schema.py Outdated
@holmanb holmanb changed the title WIP: metaschema testing Add Strict Metaschema Validation Nov 18, 2021
@holmanb holmanb marked this pull request as ready for review November 18, 2021 22:45
Comment thread doc/rtd/conf.py Outdated
@blackboxsw
Copy link
Copy Markdown
Collaborator

Thanks for this @holmanb. I'm just starting the review amd will have some significant feedback tomorrow. In your suggested commit message could you capture the intent behind the strict_metaschema being a testing time validation of config module definitions rather than runtime validation.

Comment thread cloudinit/config/schema.py
Comment thread cloudinit/config/schema.py Outdated
Comment thread cloudinit/config/schema.py
Comment thread tests/unittests/test_handler/test_schema.py
Copy link
Copy Markdown
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@holmanb thanks for putting this together. A couple of minor requests inline and a general question for you about moving toward pytest to avoid all the stdout/error context manager work.

Comment thread cloudinit/config/schema.py
Comment thread tests/unittests/test_cli.py Outdated
Comment thread tests/unittests/test_cli.py Outdated
Comment thread tests/unittests/test_cli.py
Comment thread tests/unittests/test_handler/test_schema.py
Copy link
Copy Markdown
Contributor

@TheRealFalcon TheRealFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few inline questions/comments, but overall looking good.

Comment thread doc/rtd/conf.py Outdated
Comment thread doc/rtd/conf.py Outdated
Comment thread cloudinit/config/schema.py Outdated
Comment thread cloudinit/config/schema.py Outdated
Comment thread cloudinit/config/schema.py Outdated
Copy link
Copy Markdown
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think do need to include property descriptions in docs. We might be able to drop the individual module documentation from the command-line, but we need to fixup the argparse setup for the devel schema subcommand.

Comment thread cloudinit/config/schema.py Outdated
Comment thread cloudinit/config/schema.py
Comment thread tests/unittests/test_handler/test_schema.py
Comment thread doc/rtd/conf.py
@holmanb holmanb force-pushed the holmanb/metaschema branch 2 times, most recently from 1ab87a1 to 3779ff2 Compare November 24, 2021 15:33
Copy link
Copy Markdown
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @holmanb any of these remaining comments are nits and I can +1 once you make any changes you think are relevant. Thank you for the restructure, I agree it is easier to read/maintain here and functionally works will with no degradation of existing behavior that I can see.

Approve pending any comments you decide to address and a passing CI run

Comment thread cloudinit/config/schema.py Outdated
Comment thread cloudinit/config/schema.py Outdated
Comment thread tests/unittests/test_handler/test_schema.py Outdated
Comment thread tests/unittests/test_handler/test_schema.py Outdated
Comment thread cloudinit/config/schema.py Outdated
Copy link
Copy Markdown
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combined diff of my musings for your thoughts. I think we want to minimally make the changes to simplify the get_json_validator return value to a 2-tuple instead of 3-tuple. But other diffs are included in case you find them reasonable.

diff --git a/cloudinit/config/schema.py b/cloudinit/config/schema.py
index 61f8cd9d..0c46cbdc 100644
--- a/cloudinit/config/schema.py
+++ b/cloudinit/config/schema.py
@@ -100,40 +100,48 @@ def get_jsonschema_validator():
         # assignment-operation pylint warning which appears because this
         # code path isn't written against the latest jsonschema).
         types['string'] = (str, bytes)  # pylint: disable=E1137
+        # disable bottom-level keys in meta-schema
+        strict_metaschema = deepcopymeta_schema=Draft4Validator.META_SCHEMA)
+        strict_metaschema["additionalProperties"] = False
         cloudinitValidator = create(
-            meta_schema=Draft4Validator.META_SCHEMA,
+            meta_schema=strict_metaschema,
             validators=Draft4Validator.VALIDATORS,
             version="draft4",
             default_types=types)
-    return (cloudinitValidator, FormatChecker, None)
+    return (cloudinitValidator, FormatChecker)
 
 
-def validate_cloudconfig_metaschema(schema: dict, throw=True):
+def validate_cloudconfig_metaschema(
+    validator, schema: dict, throw: bool = True
+): 
     """Validate provided schema meets the metaschema definition. Return strict
     Validator and FormatChecker for use in validation
+    @param validator: Draft4Validator instance used to validate the schema
     @param schema: schema to validate
     @param throw: Sometimes the validator and checker are required, even if
-        the schema is invalid. Toggle for whether to raise SchemaErrors
-
-    @returns: Tuple(jsonschema.Validator, jsonschema.FormatChecker, err)
+        the schema is invalid. Toggle for whether to raise
+        SchemaValidationError orlog warnings.
 
     @raises: ImportError when jsonschema is not present
-    @raises: jsonschema.exceptions.SchemaError if the schema is invalid
+    @raises: SchemaValidationError when the schema is invalid
     """
 
     from jsonschema.exceptions import SchemaError
 
-    (cloudinitValidator, FormatChecker, _) = get_jsonschema_validator()
     try:
-
-        # disable bottom-level keys
-        cloudinitValidator.META_SCHEMA["additionalProperties"] = False
-        cloudinitValidator.check_schema(schema)
-    except SchemaError as e:
+        validator.check_schema(schema)
+    except SchemaError as err:
+        # Raise SchemaValidationError so we don't have jsonschema imports
+        # at call sites.
         if throw:
-            raise e
-        return (cloudinitValidator, FormatChecker, e)
-    return (cloudinitValidator, FormatChecker, None)
+            raise SchemaValidationError(
+              errors=(('.'.join([str(p) for p in err.path]), err.message),)
+            )
+        logging.warning(
+            "Meta-schema validation failed, attempting to validate config"
+            " anyway: %s",
+            err
+        )
 
 
 def validate_cloudconfig_schema(
@@ -154,20 +162,14 @@ def validate_cloudconfig_schema(
         against the provided schema.
     """
     try:
-        (cloudinitValidator, FormatChecker, err) = (
-            get_jsonschema_validator()
-            if not strict_metaschema
-            else validate_cloudconfig_metaschema(schema, throw=False)
-        )
+        (cloudinitValidator, FormatChecker) = get_jsonschema_validator()
+        if strict_metaschema:
+            validate_cloudconfig_metaschema(
+                cloudinitValidator, schema, throw=False
+            )
     except ImportError:
         logging.debug("Ignoring schema validation. jsonschema is not present")
         return
-    if err:
-        logging.warning(
-            "Meta-schema validation failed, attempting to validate config"
-            " anyway: %s",
-            err,
-        )
     validator = cloudinitValidator(schema, format_checker=FormatChecker())
     errors = ()
     for error in sorted(validator.iter_errors(config), key=lambda e: e.path):

Comment thread cloudinit/config/schema.py Outdated
Comment thread cloudinit/config/schema.py
Comment thread cloudinit/config/schema.py
Comment thread cloudinit/config/schema.py Outdated
Comment thread cloudinit/config/schema.py Outdated
@holmanb holmanb force-pushed the holmanb/metaschema branch from dfabeaa to c93fde0 Compare December 3, 2021 21:09
@holmanb
Copy link
Copy Markdown
Member Author

holmanb commented Dec 3, 2021

Combined diff of my musings for your thoughts. I think we want to minimally make the changes to simplify the get_json_validator return value to a 2-tuple instead of 3-tuple. But other diffs are included in case you find them reasonable.

Thanks for the help on this. I wasn't sure about the best approach to the failure paths with this one. Besides one item, I think these are all improvements.

diff --git a/cloudinit/config/schema.py b/cloudinit/config/schema.py
index 61f8cd9d..0c46cbdc 100644
--- a/cloudinit/config/schema.py
+++ b/cloudinit/config/schema.py
@@ -100,40 +100,48 @@ def get_jsonschema_validator():
         # assignment-operation pylint warning which appears because this
         # code path isn't written against the latest jsonschema).
         types['string'] = (str, bytes)  # pylint: disable=E1137
+        # disable bottom-level keys in meta-schema
+        strict_metaschema = deepcopymeta_schema=Draft4Validator.META_SCHEMA)
+        strict_metaschema["additionalProperties"] = False
         cloudinitValidator = create(
-            meta_schema=Draft4Validator.META_SCHEMA,
+            meta_schema=strict_metaschema,
             validators=Draft4Validator.VALIDATORS,
             version="draft4",
             default_types=types)
-    return (cloudinitValidator, FormatChecker, None)
+    return (cloudinitValidator, FormatChecker)

The problem with this is that get_jsonschema_validator() conditionally produces validator/formatchecker differently based on jsonschema version. The change proposed here only applies to the pre-3.0 jsonschema branch. I think we want both branches to be capable of strict validation. Assuming I don't see anything else in need of change I plan on applying the other proposals you made, in addition to a modified version of this one.

Thanks!

@blackboxsw blackboxsw self-assigned this Dec 3, 2021
@holmanb holmanb force-pushed the holmanb/metaschema branch from 4aea9ce to 7dfe838 Compare December 6, 2021 17:10
Copy link
Copy Markdown
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all the discussion and work here @holmanb +1. LGTM

@blackboxsw blackboxsw merged commit bedac77 into canonical:main Dec 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants