Serve error pages (as per ViaHTML) from Checkmate#148
Conversation
| from checkmate.models import Reason | ||
| from checkmate.services.block_url import BlockURLService, factory | ||
|
|
||
| # pylint: disable=protected-access |
There was a problem hiding this comment.
Ah... so sue me. We need to create a weird situation for testing here where something is "signed" but not correctly. Also... it's convenient
seanh
left a comment
There was a problem hiding this comment.
Posting a partial review with some suggestions. I got part way through reviewing this and ran out of time, I'll continue next week
| dev: CELERY_BROKER_URL={env:CELERY_BROKER_URL:amqp://guest:guest@localhost:5673//} | ||
| dev: DATABASE_URL={env:DATABASE_URL:postgresql://postgres@localhost:5434/postgres} | ||
| dev: CHECKMATE_BLOCKLIST_URL = {env:CHECKMATE_BLOCKLIST_URL:http://dummy.example.com/blocklist} | ||
| dev: CHECKMATE_SECRET = 92384sdfsdfs87s9d87sdf |
There was a problem hiding this comment.
👍 We sometimes use values like "not_a_secret" for these
| # Include the secret for crytographic functions | ||
| config.registry.settings["secret"] = os.environ["CHECKMATE_SECRET"] |
There was a problem hiding this comment.
👍 I like the simple approach to config settings. If the CHECKMATE_SECRET envvar is missing the app will simply crash on startup and it'll be pretty obvious why, and this keeps our code simple. h and lms support reading any setting from either the config file or the environment, but I don't think that's really necessary
| config.add_route("check_url", "/api/check") | ||
|
|
||
| # Serve content from the static/static directory at /ui/static | ||
| config.add_static_view("ui/static", "static/static", cache_max_age=3600) |
There was a problem hiding this comment.
What's the reasoning for the cache_max_age=3600?
I think 3600 is the default value.
Both h and lms do cache_max_age=None:
- https://github.com/hypothesis/lms/blob/7c10eb3ddae26434133b59b96d7874b851f61d94/lms/assets.py#L145
- https://github.com/hypothesis/h/blob/f46082b4a3f5fe1f7c70898ed6ba224cca5c35a8/h/assets.py#L135
At first I thought None must disable caching, but actually I think it might allow caching forever, because both h and lms add what look like cache-busting query params onto static asset URLs.
We might want to do that for Checkmate too so that we can let assets be cached forever and (more importantly) so that we don't have trouble with old cached versions of assets taking time to expire. I'm not exactly sure how h and lms do it but Pyramid has a built-in cache-busting thing that changes a cache-busting query param each time the app server restarts: https://docs.pylonsproject.org/projects/pyramid/en/latest/narr/assets.html#cache-busting That's not going to be perfect for us because each app server will cache each asset separately, but it might be good enough? You can also customize the cache buster so we could put in a better mechanism.
There was a problem hiding this comment.
Until we have some cache busting we can't serve the content with an infinite expiry date. And it's not trivial to add. We have delayed most of the fancy caching for checkmate as not being essential.
If 3600 is the default, it doesn't hurt to make it clear it's happening, and we do want some caching on this content, otherwise it's going to be served live every time.
There was a problem hiding this comment.
I suppose, since these are just error pages with a mailto link rather than an interface that you actually use (like an admin page, or fancier error pages), and since the only assets are a CSV file and an SVG, a one-hour delay in someone seeing an updated version of the page or the page loading with different versions of different assets (e.g. new CSV but old SVG) is unlikely to cause many problems.
In more complex cases I think old cached versions can cause breakage. So we'll probably have to deal with cache-busting when we add static assets for the admin pages (unless we use a different static view for those and don't cache them) or if the error pages become much fancier.
There was a problem hiding this comment.
Yeah, we will want it, I just remember it not being trivial to add
| class BlockURLService: | ||
| """A class for signing and checking arguments for a block page.""" | ||
|
|
||
| TOKEN_ARG = "sec" | ||
| VERSION_ARG = "v" | ||
|
|
||
| def __init__(self, secret, route_url): | ||
| """Create a new BlockURLService object. | ||
|
|
||
| :param secret: The secret to sign and check args with | ||
| :param route_url: The pyramid `request.route_url` function | ||
| """ | ||
| self._secret = secret | ||
| self._route_url = route_url | ||
|
|
||
| def get_url(self, url, reason): | ||
| """Get the correct URL to show a block page. | ||
|
|
||
| :param url: URL to show on the page | ||
| :param reason: The Reason object describing the problem | ||
| :return: URL to display an HTML version of the page | ||
| """ | ||
| args = { | ||
| "url": url, | ||
| "reason": reason.value, | ||
| # Include this to allow us to cope with breaking changes in the | ||
| # future if we ever decide to make some. | ||
| self.VERSION_ARG: "1", | ||
| } | ||
|
|
||
| args[self.TOKEN_ARG] = self._hash_args(args) | ||
|
|
||
| return self._route_url("present_block", _query=args) | ||
|
|
||
| def signed_args(self, args): | ||
| """Check if given parameters have been signed correctly. | ||
|
|
||
| :param args: Args to check | ||
| :return: Boolean | ||
| """ | ||
| version = args.get(self.VERSION_ARG) | ||
| if version != "1": | ||
| return False | ||
|
|
||
| without_token = dict(args) | ||
| if self.TOKEN_ARG not in args: | ||
| return False | ||
|
|
||
| token = without_token.pop(self.TOKEN_ARG) | ||
|
|
||
| return token == self._hash_args(without_token) |
There was a problem hiding this comment.
:return: Boolean should be :rtype: bool.
I think verify_*() might be a better name than signed_*(), since verifying the args is what the method does. signed_args() sounds like a method that takes some args and returns a signed copy of them.
I think there's some confusion here because there are two different URLs in play: there's the URL that someone is trying to proxy that has been blocked. And then there's the URL of the block page. But both are called url in this code. Maybe consistently referring to the block page URL as "block page URL" would help (e.g. block_page_url etc). Or alternatively consistently refer to the blocked URL as "blocked URL" (blocked_url etc). Or both, maybe plain url is a confusing name in a class where two different URLs are in play.
I also wonder if, for symmetry, both methods could accept the entire URL rather than just the args? Then it'd be get_url(url, reason) and verify_url(url). (Or alternatively I guess both methods could deal with args only, which would remove the route_url dependency.)
class BlockURLService:
"""Methods for generating and verifying block page URLs."""
...
def __init__(...):
...
def get_block_page_url(self, url, reason):
"""Return the block page URL for the given blocked URL and reason.
:param url: The URL that has been blocked
:param reason: The Reason object describing the problem
:return: The URL of the block page
"""
...
def verify_block_page_url(self, block_page_url):
"""Verify that the given block page URL has been signed correctly.
:param args: The URL to verify
:rtype: bool
"""
...There was a problem hiding this comment.
It's worth pointing out you don't verify the URL, but the arguments which contain a URL. There's some confusion about the names here for sure though. The collision of the block list HTML view, this service and the URL being blocked is a bit unfortunate.
I'm fairly certain you've told me in the past it's verbotten to have get_ on the front of things, but I forget why.
There was a problem hiding this comment.
Ah yeah, it's verifying the args not the URL. Maybe it could just work with args then, and no route_url dependency?
class BlockURLArgsService:
def get_args(self, url, reason):
...
def verify_args(self, args):
...Or maybe give up on the symmetry and have it generate URLs but verify args (same as you have currently but with slightly different names):
class BlockURLService:
def get_url(self, blocked_url, reason):
...
def verify_args(self, args):
...(Or verify_args() could still take the entire block_page_url as argument but be called verify_args(), then you'd still have the symmetry.)
There was a problem hiding this comment.
Regarding get_(): I think avoiding it is a common style in Python because it's considered "just noise" and because you're supposed to use simple attributes or (if necessary) @property's and you wouldn't call an @property something like get_foo it'd just be foo. When something needs arguments it has to be a method but following the same style you'd call it foo(bar) not get_foo(bar). I don't think get_() is forbidden though, it depends on the particular example, for example on whether the method name without get_() is going to be clear enough in a given example, or whether the get_() is just noise.
See for example Do not use get_ here: https://python-consistency.readthedocs.io/en/latest/conventions.html#properties
Or this answer: https://softwareengineering.stackexchange.com/questions/334135/how-should-i-name-functions-that-return-values-in-python
(But again I don't think it's forbidden.)
There was a problem hiding this comment.
BlockPageURLService might be a clearer name since it generates URLs for block pages and verifies args from block page URLs. Just BlockURLService sounds like it might have something to do with the actual blocking of URLs (especially in Checkmate's context) but this actually just has to do with the block pages.
There was a problem hiding this comment.
I've totally reworked this now. I've separated all the knowledge of the block page out of the service and changed it into a SecureLinkService. This has two methods now route_url which works identically to the pyramid route_url but also signs the request (along with the route name now) and is_secure which accepts the request object and checks it.
I think the result is much nicer name-wise. It's also a bit cleaner as the job and the whole job of this class is signing and checking things, not packing params for the block page.
|
Missed a bit of coverage |
seanh
left a comment
There was a problem hiding this comment.
Much appreciated the testing instructions in the PR description. Tested and works for me.
The commit history looks like it might need a little tidying up before merge.
Lots of small comments and suggestions, but I don't think any of them are required changes so this is approved.
I love the idea of SecureLinkService.route_url() following Request.route_url(). I bet we could go one step further and add it as a request method, Request.secure_route_url() (and maybe also Request.is_secure_link as a property). And I suspect those could still call a SecureLinkService behind the scenes if they had to. And reify=True would handle caching. See: https://docs.pylonsproject.org/projects/pyramid/en/latest/api/config.html#pyramid.config.Configurator.add_request_method. That might be pretty nice but there's no real need: the service approach is just fine.
| from checkmate.services.secure_link import SecureLinkService | ||
|
|
||
|
|
||
| def includeme(config): # pragma: no cover |
There was a problem hiding this comment.
👍, yeah there seems little point in testing a method like this. My preference would probably be to add a test that simply calls the method but doesn't assert anything, so it at least tests that the method doesn't crash. But I think no cover on something like this is fair enough
| TOKEN_ARG = "sec" | ||
| VERSION_ARG = "v" |
There was a problem hiding this comment.
Are these actually TOKEN_KEY and VERSION_KEY? They're actually the keys not the values, and "arg" made met think of values rather than keys. They're awkward to name though, *_KEY doesn't seem wonderfully clear either. I think the code might actually be clearer and just as easy to maintain if it just duplicated the strings "sec" and "v" when needed instead of using constants, the indirection and constant names might just add confusion
| def __init__(self, secret, route_url): | ||
| """Create a new BlockURLService object. | ||
|
|
||
| :param secret: The secret to sign and check args with | ||
| :param route_url: The pyramid `request.route_url` function | ||
| """ | ||
| self._secret = secret | ||
| self._route_url = route_url |
There was a problem hiding this comment.
👍 It's really nice to inject things like secret directly and decouple SecureLinkService from reading a config setting or envvar itself
| def route_url(self, route_name, *elements, **kw): | ||
| """Get a secure version of route to our own app. | ||
|
|
||
| This works identically to the Pyramid `Request.route_url()` function | ||
| with the exception that the parameters are modified in place to sign | ||
| them. This signature can be checked with `is_secure()`. | ||
| """ |
There was a problem hiding this comment.
I love the idea of a generic secure links service and following request.route_url()'s interface
|
|
||
| args = kw.get("_query") | ||
| if args is None: | ||
| raise ValueError("You must provide some parameters to sign") |
There was a problem hiding this comment.
| raise ValueError("You must provide some parameters to sign") | |
| raise ValueError("You must provide some query parameters to sign") |
(Just slightly clearer maybe)
|
|
||
| # Include this to allow us to cope with breaking changes in the | ||
| # future if we ever decide to make some. | ||
| args[self.VERSION_ARG] = "1" |
There was a problem hiding this comment.
Ok, so it adds &v=1&sec=*** to the URL's query string. Sounds good 👍
We could get defensive and have it raise if either "v" or "sec" are already in args (could just be a couple of assert's). I think as written the method will behave strangely if args already contains "v" and "sec": it'll ovewrite them, but it'll also have signed the version of the args with the original "sec" so it'll return a secure link that actually won't pass the is_secure() test
| if not request.find_service(SecureLinkService).is_secure(request): | ||
| raise HTTPForbidden() |
There was a problem hiding this comment.
Ooh, you could totally add a resource and permission for this route that only grants the permission if the is_secure() test passes, then Pyramid would handle the error response, and this view (or any view with permission="secure_link") would only get called if is_secure() had passed.
I think doing it directly in the view like this (bypassing Pyramid's authorization system) is totally acceptable for this though
|
|
||
| assert result == route_url.return_value | ||
|
|
||
| route_url.assert_called_once_with(self.ROUTE_NAME, _query=signed_args) |
There was a problem hiding this comment.
Given that route_url is a third-party thing, I think it'd be totally fine for these tests to use the real Request.route_url and assert about the URL strings returned. But I'm guessing the mock made the asserts easier to write?
| route_url.assert_called_once_with(self.ROUTE_NAME, _query=signed_args) | ||
|
|
||
| def test_route_url_requires_a_query(self, service): | ||
| with pytest.raises(ValueError): |
There was a problem hiding this comment.
This could be any ValueError, the test doesn't actually test the "you must provide parameters" was the reason the error was raised:
| with pytest.raises(ValueError): | |
| with pytest.raises(ValueError, match="^You must provide some parameters to sign$")): |
| # Check we don't mutate the args | ||
| assert before_call == signed_request.GET |
8458324 to
ac2ba81
Compare
ac2ba81 to
2a8e449
Compare
|
I've consolidated the commits. This is going to cause some almighty rebasing pain in all the other PRs, but hey ho. |
|
Secrets have been added to the live instances so we can deploy this. |
For: #15
This adds a few things:
urlandreasoncodeReview notes:
Testing notes:
make devlinks.htmlout of the JSON and look at it to view the error page