URL previewing support#688
Conversation
…d, experimental, etc. just putting it here for safekeeping for now
…ngcache; loads of other fixes
| def get_url_cache_txn(txn): | ||
| # get the most recently cached result (relative to the given ts) | ||
| sql = ( | ||
| "SELECT response_code, etag, expires, og, media_id, max(download_ts)" |
There was a problem hiding this comment.
You probably want to be doing ORDER BY download_ts DESC LIMIT 1 rather than max(download_ts)
|
I think you need to run |
|
Can we make the entire thing optional somehow? We probably can't run it by default anyway given that it needs an IP blacklist. |
| # first check the memory cache - good to handle all the clients on this | ||
| # HS thundering away to preview the same URL at the same time. | ||
| try: | ||
| og = self.cache[url] |
There was a problem hiding this comment.
Use cache.get() rather try: except:
…oint. defaults to off. Add url_preview_ip_range_blacklist to let admins specify internal IP ranges that must not be spidered. Add url_preview_url_blacklist to let admins specify URL patterns that must not be spidered. Implement a custom SpiderEndpoint and associated support classes to implement url_preview_ip_range_blacklist Add commentary and generally address PR feedback
|
incorporate all the PR feedback - @NegativeMjark PTAL |
| isLeaf = True | ||
|
|
||
| def __init__(self, hs, filepaths): | ||
| if not html: |
There was a problem hiding this comment.
The not html probably throws if lxml isn't installed.
|
@NegativeMjark addressed these too, and now throwing sensible exceptions. PTAL |
| "blacklist in url_preview_ip_range_blacklist for url previewing " | ||
| "to work" | ||
| ) | ||
| raise RunTimeError( |
There was a problem hiding this comment.
Its RuntimeError not RunTimeError. This sort of typo can be picked up by running flake8 synapse fwiw.
|
Other than fixing the typo's and style warnings, it LGTM. I'm slightly concerned by the lack of tests for it though. |
SpiderHttpClientderived fromSimpleHttpClient, which follows redirects and handles gzip CTE correctlyget_filesupport toSimpleHttpClient, knowingly duplicated for now from matrixfederationclient.preview_url_resourceto implement the new media/r0/preview_url API. This:lxml, returning the metadata as a JSON bloblocal_media_repository_url_cachetable to the DB for the on-disk URL cacheget_url_cacheandstore_url_cachetomedia_repository.pyto wrap the new tableN.B. that following redirects will not work correctly until https://twistedmatrix.com/trac/ticket/8265 is merged. Unsure if it's worth maintaining our own Twisted fork until that happens.
Given I'm hardly a python/twisted expert, review would be particularly appreciated on:
This is part of a set of PRs spanning vector-web, matrix-react-sdk, matrix-js-sdk and synapse.
See also element-hq/element-web#1343 and matrix-org/matrix-react-sdk#260 and matrix-org/matrix-js-sdk#122