Skip to content

API based allow list script#223

Closed
jon-betts wants to merge 4 commits into
mainfrom
api-based-add-to-allow
Closed

API based allow list script#223
jon-betts wants to merge 4 commits into
mainfrom
api-based-add-to-allow

Conversation

@jon-betts
Copy link
Copy Markdown
Contributor

@jon-betts jon-betts commented Mar 12, 2021

This adds an API and calls it from the script instead of doing things locally. This is of course a lot more complicated, but it has some benefits:

  • It completely avoids any syncing issues
  • The changes take effect immediately on the target system
  • It provides a mechanism we can re-use in future

Review notes

  • This adds a new API end-point at POST:/ui/api/rule
  • This accepts JSON:API style bodies like:
{
    "data": {
        "type": "AllowRule",
        "meta": {"url": "http://please.let.me.annotate.com"}
    }
}
  • It responds with bodies like:
{
    "data": {
        "type": "AllowRule",
        "id": 23464,
        "attributes": {
            "hash": "97342342347abdef23423423",
            "tags": ["manual"],
            "force": false,
            "rule": "http://please.let.me.annotate.com"
        }
    }
}
  • We use JSON schema to completely vet the data structure to keep the view simple
  • The JSON schema functions are separated from the view, but they aren't very fancy
  • After that there's checking and inserting as before
  • The script has been converted to call this API
  • Due to the complexity of calling rest APIs, it's a similar size to before
  • The admin page has been altered to display the correct command to run it

Testing notes

Possible improvements

  • We don't respond with the correct content type for JSON:API. It really should be application/vnd.api+json.
  • We could package up the validation stuff in a way to make it very easy to declare as part of the view definition

@jon-betts jon-betts added the wip label Mar 12, 2021
@jon-betts jon-betts self-assigned this Mar 12, 2021
"required": ["url"],

"properties": {
"url": {"type": "string", "format": "public-url"}
Copy link
Copy Markdown
Contributor Author

@jon-betts jon-betts Mar 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"format": "public-url" is the most interesting part of this schema.

This is a custom format defined by us. It's totally valid to put whatever you like here in JSON schema. There are a few default formats understood by all standard JSON schema validators, but any they don't understand are just ignored.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to make the url part of the meta for this object rather than the attributes as:

  • The final rule object doesn't actually include the URL value as provided
  • This means upon getting a response, you'd get one without this attribute
  • It is however required to create it

"required": ["type", "meta"],

"properties": {
"type": {"enum": ["AllowRule"]},
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type is mandatory for JSON:API. I chose the class name as the name for the type as:

  • It's not directly tied to the DB
  • It might allow us to be fancy one day and automatically pick the right DB class or something

<style>
code {
word-break: break-all;
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The command gets very long with the session JWT, so enable word wrapping so we can copy with without scrolling horizontally.

Comment thread checkmate/validation.py

:raise MalformedJSONBody: If the JSON cannot be decoded or the body
does not conform to the schema provided
"""
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure we could do a fancy decorator like we do in h for applying the schema to a view. This works for now and achieves the main aim of making this not view specific. If we find we are using this a lot I think that would be a good time to think about making it slicker (rule of 3?)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to do it but a lighter weight nicety could be to add this as a request method: request.validated_json_body(validator). Similar to the built-in request.json_body property

"""Render an HTML version of a blocked URL with explanation."""

body = get_validated_json_body(request, _ALLOW_RULE_VALIDATOR)
url = body["data"]["meta"]["url"]
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We know this is totally valid as it's just come through the schema. If this was the wrong shape, we couldn't get this far as an exception would already have been raised.


rule = AllowRule(rule=rule_string, hash=hex_hash, tags=["manual"])
request.db.add(rule)
request.db.flush() # Make sure an id is allocated before we serialise
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The id field is mandatory in JSON:API

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a nice way to do this might be implement a JSON:API renderer so the view just does return rule and the renderer turns that returned object into JSON. Depending on when Pyramid calls renderers the renderer might not even have to call db.flush() if the transaction has already been commited by then. I suspect they probably get called before transaction commit though. As an example we have a custom SVG renderer in h.

I think that'd be trying way too hard for this PR though


return {
"type": self.__class__.__name__,
"id": self.id,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't flexible here, and I think it shouldn't be unless it has to be. Having every object have it's primary key as id is a very handy convention.

Copy link
Copy Markdown
Contributor

@seanh seanh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed on Slack I'd like us to give Marshmallow a try for the validation as I think it's our chosen validation library. If it turns out not to work very nicely for this then we can reject it.

Other than that just some minor suggestions.

I ran it locally and it seemed to work for me, loaded the rules into the DB, and after running the command the sites from the CSV were now allowed

Comment thread checkmate/exceptions.py
Comment on lines +49 to +54
class MalformedJSONBody(JSONAPIException):
"""The JSON body is malformed in some way."""


Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so it 400's instead of 500's for a malformed request:

Suggested change
class MalformedJSONBody(JSONAPIException):
"""The JSON body is malformed in some way."""
class MalformedJSONBody(JSONAPIException):
"""The JSON body is malformed in some way."""
status_code = 400

Comment thread checkmate/routes.py
config.add_route("login_callback", "/ui/api/login_callback")
config.add_route("logout", "/ui/api/logout")

config.add_route("add_to_allow_list", "/ui/api/rule", request_method="POST")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should probably be /api/rule/, as I don't think we know that we'll ever add an admin page that calls this API. We may never add admin pages for managing the allow list at all if the priority isn't high enough (which seems likely, if we don't get many requests). Even if we do one day end up adding admin pages, we don't know for sure that they'll end up calling this API from JavaScript. So we might end up with an API that never is called by admin pages, but is just confusingly under /ui/api/. It might be better to just put it under /api/ for now and move it in the future if we decide to.

This'd also mean moving add_to_allow_list.py

effective_principals=[Principals.STAFF],
)
def add_to_allow_list(_context, request):
"""Render an HTML version of a blocked URL with explanation."""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Render an HTML version of a blocked URL with explanation."""
"""Add a new rule to the allow list.."""

Comment thread checkmate/validation.py

:raise MalformedJSONBody: If the JSON cannot be decoded or the body
does not conform to the schema provided
"""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to do it but a lighter weight nicety could be to add this as a request method: request.validated_json_body(validator). Similar to the built-in request.json_body property

Comment on lines +27 to +28
# Check this isn't something really dumb like 'co.uk' which will ruin the
# allow list
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check still needs to be added?

Comment on lines +34 to +39
try:
# We expect a detection from not being on the allow list, so we'll
# remove it, which will trigger a ValueError if it wasn't there
reasons.remove(_ALLOW_LIST_DETECTION)
except ValueError:
raise ResourceConflict("Requested URL is already allowed") from None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tweaked the comments to be (I think) a littler clearer:

Suggested change
try:
# We expect a detection from not being on the allow list, so we'll
# remove it, which will trigger a ValueError if it wasn't there
reasons.remove(_ALLOW_LIST_DETECTION)
except ValueError:
raise ResourceConflict("Requested URL is already allowed") from None
try:
# We expect `reasons` to contain a NOT_ALLOWED detection because
# `url` shouldn't be on the allow list yet. Remove it.
reasons.remove(_ALLOW_LIST_DETECTION)
except ValueError:
# The expected NOT_ALLOWED detection wasn't found:
# `url` must already be on the allow list.
raise ResourceConflict("Requested URL is already allowed") from None

Comment on lines +30 to +32
# Don't fail fast, so we get all of the detections
checker = request.find_service(URLCheckerService)
reasons = list(checker.check_url(url, fail_fast=False))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making it a bit more explicit what this comment refers to:

Suggested change
# Don't fail fast, so we get all of the detections
checker = request.find_service(URLCheckerService)
reasons = list(checker.check_url(url, fail_fast=False))
checker = request.find_service(URLCheckerService)
# Pass fail_fast=True to get *all* the reasons not just the first one.
reasons = list(checker.check_url(url, fail_fast=False))


rule = AllowRule(rule=rule_string, hash=hex_hash, tags=["manual"])
request.db.add(rule)
request.db.flush() # Make sure an id is allocated before we serialise
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a nice way to do this might be implement a JSON:API renderer so the view just does return rule and the renderer turns that returned object into JSON. Depending on when Pyramid calls renderers the renderer might not even have to call db.flush() if the transaction has already been commited by then. I suspect they probably get called before transaction commit though. As an example we have a custom SVG renderer in h.

I think that'd be trying way too hard for this PR though

@jon-betts
Copy link
Copy Markdown
Contributor Author

This has been superceded in favour of: #224

@jon-betts jon-betts closed this Mar 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants