Skip to content

Needs work: tolerate chunked responses without a DOS vulnerability#8

Open
roscoemcinerney wants to merge 2 commits intomasterfrom
bugfix/2023-01/tolerate-chunked-transfer
Open

Needs work: tolerate chunked responses without a DOS vulnerability#8
roscoemcinerney wants to merge 2 commits intomasterfrom
bugfix/2023-01/tolerate-chunked-transfer

Conversation

@roscoemcinerney
Copy link
Collaborator

Reiterating the code comments:

  • We encountered the file https://logo.clearbit.com/https%3A/duckduckgo.com/ which is a small and very normal PNG but can't be served.
  • It's because logo.clearbit.com serves via chunked transfer and doesn't provide a Content-Length header, which we currently use to decide if a file is small enough for us to cache.
  • Accepting chunked transfers is fine (code in this branch already does it), but there are 2 wrinkles it seems wise to resolve before merging.
    1: This code will block until the entire file, which could be gigabytes or even endless (for a malicious server) is transferred - rather than aborting the transfer when the size limit is passed, so we need to do something a little fiddlier than just using FileUtils.copy on the connection's InputStream.
    2: This code will allow repeated requests for the same file, transferring up to the limit and then returning a 400 error each time. My instinct is that when a cache request is made for a too-large file, we should copy a placeholder image into the destination path with readable text saying "File too large to cache" - so subsequent requests become a cache hit, and (probably more importantly) legitimate users immediately know what the problem is.

We're not in a hurry to tie off this branch (it's due to a problem found in Aidan's testing, not live) and the changes above aren't hard to implement at all - but I'm raising this PR as a sanity check for when I come back to it.

@winterstein
Copy link
Contributor

1: This code will block until the entire file, which could be gigabytes or even endless (for a malicious server) is transferred - rather than aborting the transfer when the size limit is passed, so we need to do something a little fiddlier than just using FileUtils.copy on the connection's InputStream.

We have code for that :)

new LimitedInputStream(conn.getInputStream(), MAX_SIZE)

FYI: the same code is available in FakeBrowser.setMaxDownload()

Sticking with FakeBrowser might be best as it can handle http redirects.

2: This code will allow repeated requests for the same file, transferring up to the limit and then returning a 400 error each time. My instinct is that when a cache request is made for a too-large file, we should copy a placeholder image into the destination path with readable text saying "File too large to cache" - so subsequent requests become a cache hit, and (probably more importantly) legitimate users immediately know what the problem is.

Good idea :)

…ird failure modes around files without extension on source URL
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants