Skip to content

Serve error pages (as per ViaHTML) from Checkmate#148

Merged
jon-betts merged 5 commits into
mainfrom
central-error-pages
Jan 20, 2021
Merged

Serve error pages (as per ViaHTML) from Checkmate#148
jon-betts merged 5 commits into
mainfrom
central-error-pages

Conversation

@jon-betts
Copy link
Copy Markdown
Contributor

@jon-betts jon-betts commented Jan 15, 2021

For: #15

This adds a few things:

  • A view which will render an error page in HTML given a url and reason code
  • A service which signs params and creates URLs to the view, and checks that params are signed correctly
  • Updates the main checking code to add the signed URL in the feedback so the client can use it

Review notes:

  • The caching here is a fixed 3600 seconds (1h) without any fancy stuff like cache invalidation (i.e. whitenoise)
  • This means if we deploy changes, they might take 1h to show up in the wild
  • The templates are as they are in ViaHTML at the moment (without the new additions to Via)

Testing notes:

@jon-betts jon-betts requested a review from seanh January 15, 2021 17:08
@jon-betts jon-betts self-assigned this Jan 15, 2021
from checkmate.models import Reason
from checkmate.services.block_url import BlockURLService, factory

# pylint: disable=protected-access
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah... so sue me. We need to create a weird situation for testing here where something is "signed" but not correctly. Also... it's convenient

Copy link
Copy Markdown
Contributor

@seanh seanh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posting a partial review with some suggestions. I got part way through reviewing this and ran out of time, I'll continue next week

Comment thread tox.ini Outdated
dev: CELERY_BROKER_URL={env:CELERY_BROKER_URL:amqp://guest:guest@localhost:5673//}
dev: DATABASE_URL={env:DATABASE_URL:postgresql://postgres@localhost:5434/postgres}
dev: CHECKMATE_BLOCKLIST_URL = {env:CHECKMATE_BLOCKLIST_URL:http://dummy.example.com/blocklist}
dev: CHECKMATE_SECRET = 92384sdfsdfs87s9d87sdf
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 We sometimes use values like "not_a_secret" for these

Comment thread checkmate/app.py
Comment on lines +78 to +79
# Include the secret for crytographic functions
config.registry.settings["secret"] = os.environ["CHECKMATE_SECRET"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I like the simple approach to config settings. If the CHECKMATE_SECRET envvar is missing the app will simply crash on startup and it'll be pretty obvious why, and this keeps our code simple. h and lms support reading any setting from either the config file or the environment, but I don't think that's really necessary

Comment thread checkmate/routes.py
config.add_route("check_url", "/api/check")

# Serve content from the static/static directory at /ui/static
config.add_static_view("ui/static", "static/static", cache_max_age=3600)
Copy link
Copy Markdown
Contributor

@seanh seanh Jan 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reasoning for the cache_max_age=3600?

I think 3600 is the default value.

Both h and lms do cache_max_age=None:

At first I thought None must disable caching, but actually I think it might allow caching forever, because both h and lms add what look like cache-busting query params onto static asset URLs.

We might want to do that for Checkmate too so that we can let assets be cached forever and (more importantly) so that we don't have trouble with old cached versions of assets taking time to expire. I'm not exactly sure how h and lms do it but Pyramid has a built-in cache-busting thing that changes a cache-busting query param each time the app server restarts: https://docs.pylonsproject.org/projects/pyramid/en/latest/narr/assets.html#cache-busting That's not going to be perfect for us because each app server will cache each asset separately, but it might be good enough? You can also customize the cache buster so we could put in a better mechanism.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until we have some cache busting we can't serve the content with an infinite expiry date. And it's not trivial to add. We have delayed most of the fancy caching for checkmate as not being essential.

If 3600 is the default, it doesn't hurt to make it clear it's happening, and we do want some caching on this content, otherwise it's going to be served live every time.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose, since these are just error pages with a mailto link rather than an interface that you actually use (like an admin page, or fancier error pages), and since the only assets are a CSV file and an SVG, a one-hour delay in someone seeing an updated version of the page or the page loading with different versions of different assets (e.g. new CSV but old SVG) is unlikely to cause many problems.

In more complex cases I think old cached versions can cause breakage. So we'll probably have to deal with cache-busting when we add static assets for the admin pages (unless we use a different static view for those and don't cache them) or if the error pages become much fancier.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we will want it, I just remember it not being trivial to add

Comment thread checkmate/services/block_url.py Outdated
Comment on lines +4 to +54
class BlockURLService:
"""A class for signing and checking arguments for a block page."""

TOKEN_ARG = "sec"
VERSION_ARG = "v"

def __init__(self, secret, route_url):
"""Create a new BlockURLService object.

:param secret: The secret to sign and check args with
:param route_url: The pyramid `request.route_url` function
"""
self._secret = secret
self._route_url = route_url

def get_url(self, url, reason):
"""Get the correct URL to show a block page.

:param url: URL to show on the page
:param reason: The Reason object describing the problem
:return: URL to display an HTML version of the page
"""
args = {
"url": url,
"reason": reason.value,
# Include this to allow us to cope with breaking changes in the
# future if we ever decide to make some.
self.VERSION_ARG: "1",
}

args[self.TOKEN_ARG] = self._hash_args(args)

return self._route_url("present_block", _query=args)

def signed_args(self, args):
"""Check if given parameters have been signed correctly.

:param args: Args to check
:return: Boolean
"""
version = args.get(self.VERSION_ARG)
if version != "1":
return False

without_token = dict(args)
if self.TOKEN_ARG not in args:
return False

token = without_token.pop(self.TOKEN_ARG)

return token == self._hash_args(without_token)
Copy link
Copy Markdown
Contributor

@seanh seanh Jan 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:return: Boolean should be :rtype: bool.

I think verify_*() might be a better name than signed_*(), since verifying the args is what the method does. signed_args() sounds like a method that takes some args and returns a signed copy of them.

I think there's some confusion here because there are two different URLs in play: there's the URL that someone is trying to proxy that has been blocked. And then there's the URL of the block page. But both are called url in this code. Maybe consistently referring to the block page URL as "block page URL" would help (e.g. block_page_url etc). Or alternatively consistently refer to the blocked URL as "blocked URL" (blocked_url etc). Or both, maybe plain url is a confusing name in a class where two different URLs are in play.

I also wonder if, for symmetry, both methods could accept the entire URL rather than just the args? Then it'd be get_url(url, reason) and verify_url(url). (Or alternatively I guess both methods could deal with args only, which would remove the route_url dependency.)

class BlockURLService:
    """Methods for generating and verifying block page URLs."""

    ...

    def __init__(...):
        ...

    def get_block_page_url(self, url, reason):
        """Return the block page URL for the given blocked URL and reason.

        :param url: The URL that has been blocked
        :param reason: The Reason object describing the problem
        :return: The URL of the block page
        """
        ...

    def verify_block_page_url(self, block_page_url):
        """Verify that the given block page URL has been signed correctly.

        :param args: The URL to verify
        :rtype: bool
        """
        ...

Copy link
Copy Markdown
Contributor Author

@jon-betts jon-betts Jan 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth pointing out you don't verify the URL, but the arguments which contain a URL. There's some confusion about the names here for sure though. The collision of the block list HTML view, this service and the URL being blocked is a bit unfortunate.

I'm fairly certain you've told me in the past it's verbotten to have get_ on the front of things, but I forget why.

Copy link
Copy Markdown
Contributor

@seanh seanh Jan 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, it's verifying the args not the URL. Maybe it could just work with args then, and no route_url dependency?

class BlockURLArgsService:
    def get_args(self, url, reason):
        ...

    def verify_args(self, args):
        ...

Or maybe give up on the symmetry and have it generate URLs but verify args (same as you have currently but with slightly different names):

class BlockURLService:
    def get_url(self, blocked_url, reason):
        ...

    def verify_args(self, args):
        ...

(Or verify_args() could still take the entire block_page_url as argument but be called verify_args(), then you'd still have the symmetry.)

Copy link
Copy Markdown
Contributor

@seanh seanh Jan 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding get_(): I think avoiding it is a common style in Python because it's considered "just noise" and because you're supposed to use simple attributes or (if necessary) @property's and you wouldn't call an @property something like get_foo it'd just be foo. When something needs arguments it has to be a method but following the same style you'd call it foo(bar) not get_foo(bar). I don't think get_() is forbidden though, it depends on the particular example, for example on whether the method name without get_() is going to be clear enough in a given example, or whether the get_() is just noise.

See for example Do not use get_ here: https://python-consistency.readthedocs.io/en/latest/conventions.html#properties

Or this answer: https://softwareengineering.stackexchange.com/questions/334135/how-should-i-name-functions-that-return-values-in-python

(But again I don't think it's forbidden.)

Copy link
Copy Markdown
Contributor

@seanh seanh Jan 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BlockPageURLService might be a clearer name since it generates URLs for block pages and verifies args from block page URLs. Just BlockURLService sounds like it might have something to do with the actual blocking of URLs (especially in Checkmate's context) but this actually just has to do with the block pages.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've totally reworked this now. I've separated all the knowledge of the block page out of the service and changed it into a SecureLinkService. This has two methods now route_url which works identically to the pyramid route_url but also signs the request (along with the route name now) and is_secure which accepts the request object and checks it.

I think the result is much nicer name-wise. It's also a bit cleaner as the job and the whole job of this class is signing and checking things, not packing params for the block page.

@jon-betts
Copy link
Copy Markdown
Contributor Author

Missed a bit of coverage

@jon-betts jon-betts requested a review from seanh January 19, 2021 13:49
Copy link
Copy Markdown
Contributor

@seanh seanh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much appreciated the testing instructions in the PR description. Tested and works for me.

The commit history looks like it might need a little tidying up before merge.

Lots of small comments and suggestions, but I don't think any of them are required changes so this is approved.

I love the idea of SecureLinkService.route_url() following Request.route_url(). I bet we could go one step further and add it as a request method, Request.secure_route_url() (and maybe also Request.is_secure_link as a property). And I suspect those could still call a SecureLinkService behind the scenes if they had to. And reify=True would handle caching. See: https://docs.pylonsproject.org/projects/pyramid/en/latest/api/config.html#pyramid.config.Configurator.add_request_method. That might be pretty nice but there's no real need: the service approach is just fine.

from checkmate.services.secure_link import SecureLinkService


def includeme(config): # pragma: no cover
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍, yeah there seems little point in testing a method like this. My preference would probably be to add a test that simply calls the method but doesn't assert anything, so it at least tests that the method doesn't crash. But I think no cover on something like this is fair enough

Comment on lines +9 to +10
TOKEN_ARG = "sec"
VERSION_ARG = "v"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these actually TOKEN_KEY and VERSION_KEY? They're actually the keys not the values, and "arg" made met think of values rather than keys. They're awkward to name though, *_KEY doesn't seem wonderfully clear either. I think the code might actually be clearer and just as easy to maintain if it just duplicated the strings "sec" and "v" when needed instead of using constants, the indirection and constant names might just add confusion

Comment on lines +12 to +19
def __init__(self, secret, route_url):
"""Create a new BlockURLService object.

:param secret: The secret to sign and check args with
:param route_url: The pyramid `request.route_url` function
"""
self._secret = secret
self._route_url = route_url
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 It's really nice to inject things like secret directly and decouple SecureLinkService from reading a config setting or envvar itself

Comment on lines +21 to +27
def route_url(self, route_name, *elements, **kw):
"""Get a secure version of route to our own app.

This works identically to the Pyramid `Request.route_url()` function
with the exception that the parameters are modified in place to sign
them. This signature can be checked with `is_secure()`.
"""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love the idea of a generic secure links service and following request.route_url()'s interface


args = kw.get("_query")
if args is None:
raise ValueError("You must provide some parameters to sign")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise ValueError("You must provide some parameters to sign")
raise ValueError("You must provide some query parameters to sign")

(Just slightly clearer maybe)


# Include this to allow us to cope with breaking changes in the
# future if we ever decide to make some.
args[self.VERSION_ARG] = "1"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so it adds &v=1&sec=*** to the URL's query string. Sounds good 👍

We could get defensive and have it raise if either "v" or "sec" are already in args (could just be a couple of assert's). I think as written the method will behave strangely if args already contains "v" and "sec": it'll ovewrite them, but it'll also have signed the version of the args with the original "sec" so it'll return a secure link that actually won't pass the is_secure() test

Comment on lines +21 to +22
if not request.find_service(SecureLinkService).is_secure(request):
raise HTTPForbidden()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, you could totally add a resource and permission for this route that only grants the permission if the is_secure() test passes, then Pyramid would handle the error response, and this view (or any view with permission="secure_link") would only get called if is_secure() had passed.

I think doing it directly in the view like this (bypassing Pyramid's authorization system) is totally acceptable for this though


assert result == route_url.return_value

route_url.assert_called_once_with(self.ROUTE_NAME, _query=signed_args)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that route_url is a third-party thing, I think it'd be totally fine for these tests to use the real Request.route_url and assert about the URL strings returned. But I'm guessing the mock made the asserts easier to write?

route_url.assert_called_once_with(self.ROUTE_NAME, _query=signed_args)

def test_route_url_requires_a_query(self, service):
with pytest.raises(ValueError):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be any ValueError, the test doesn't actually test the "you must provide parameters" was the reason the error was raised:

Suggested change
with pytest.raises(ValueError):
with pytest.raises(ValueError, match="^You must provide some parameters to sign$")):

Comment on lines +41 to +42
# Check we don't mutate the args
assert before_call == signed_request.GET
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jon-betts jon-betts force-pushed the central-error-pages branch from 8458324 to ac2ba81 Compare January 20, 2021 13:16
@jon-betts jon-betts force-pushed the central-error-pages branch from ac2ba81 to 2a8e449 Compare January 20, 2021 13:28
@jon-betts
Copy link
Copy Markdown
Contributor Author

I've consolidated the commits. This is going to cause some almighty rebasing pain in all the other PRs, but hey ho.

@jon-betts
Copy link
Copy Markdown
Contributor Author

Secrets have been added to the live instances so we can deploy this.

@jon-betts jon-betts merged commit 463655a into main Jan 20, 2021
@jon-betts jon-betts deleted the central-error-pages branch January 20, 2021 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants