Skip to content

Latest pywb changes#4

Merged
omgoo merged 16 commits intomirrorweb:masterfrom
webrecorder:master
Apr 28, 2021
Merged

Latest pywb changes#4
omgoo merged 16 commits intomirrorweb:masterfrom
webrecorder:master

Conversation

@omgoo
Copy link
Copy Markdown
Member

@omgoo omgoo commented Apr 28, 2021

No description provided.

ikreymer and others added 16 commits April 26, 2021 18:22
update to latest wombat (3.1.4)
* Pass collection name to ACL checker to load ACL lists
for automatic collections

* Typo: file suffix must be `.aclj`
…ixes #628 (#629)

* Add unit test to verify whether ACL exact-match rules in a single-line
*.aclj file are found

* Fix AccessChecker to match exact rules in a single-line rule file
- add unit test to verify unknown output formats are handled
  if output fields param is in request
* FrontendApp: forward HTTP status of CDX backend to allow clients
to handle errors more easily

* WarcServer: keep the HTTP status lines short
- append the exception message only if the status isn't a string
  (WbException and inherited classes already have nice status string)
- avoid overlong status lines, eg.
   HTTP/1.1 404 Not Found No Captures found for: https://very-long.url/...
* FrontendApp: forward HTTP status of CDX backend to allow clients
to handle errors more easily

* Handle CDXExceptions properly, returning the exception status code
- make that CDXException is raised early so that it can be handled
  in the IndexHandler
The 'dedup_index_url' configuration option should be inside the
'recorder' section.
- do not apply any filters (param filter, from, to, closest)
  if counting pages (param showNumPages=true)
The field is unfortunately misnamed compressedendoffset in XML but OWB
actually uses this for the compressed length 'S' CDX field.

Without this field when WARC files are accessed over HTTP pywb will make
open byte range requests which results in a lot more data being read
from disk than necessary.
This advertises the Python support that is already in place.
* post append improvements:
- parse json primitives for post query
- for text/plain, attempt to parse as json, then as binary
- standardize post append indexing
- include '__wb_method' in urlkey
- add 'requestBody' and 'method' to cdxj
- support unique dupe params for json-to-query conversion

* test fixes:
- update tests for test_inputreq,
- update post-test.cdxj and post-test.cdx

* ci: fixes
- tox: run full test suite!
- disable appveyor

* inputrequest buffering fix:
- never truncate reading POST request, must read entire POST data to avoid hung request in live mode
- truncate final query string to 4096
@omgoo omgoo merged commit 8445793 into mirrorweb:master Apr 28, 2021
@ikreymer ikreymer deleted the master branch April 28, 2021 17:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants