Skip to content

Use GzipFile python unpacker for speed, fall back on pigz if needed#472

Merged
whyitfor merged 2 commits into
redballoonsecurity:masterfrom
ANogin:feature/zip_in_python
Jun 6, 2024
Merged

Use GzipFile python unpacker for speed, fall back on pigz if needed#472
whyitfor merged 2 commits into
redballoonsecurity:masterfrom
ANogin:feature/zip_in_python

Conversation

@ANogin
Copy link
Copy Markdown
Contributor

@ANogin ANogin commented Jun 4, 2024

One sentence summary of this PR (This should go in the CHANGELOG!)
Use GzipFile python unpacker for speed, fall back on pigz if needed

Link to Related Issue(s)
N/A

Please describe the changes in your request.
Back before OFRAK became public we switched to using external gzip, and then later in #79 we switched to pigz. The reason the initial switch to gzip was made is because unlike python gzip, gzip and pigz binaries are willing to accept gzipped data with extra junk at the end as a "warning" (exit code 2, as opposed to exit code 1 for errors). Unfortunately, spawning external processes has nontrivial overhead, and asyncio is not enough to mitigate it. This commit tries a compromise - try gzip.GzipFile in python first, then fall back to pigz if BadGzipFile is raise.

Anyone you think should look at this, specifically?
@whyitfor

@whyitfor
Copy link
Copy Markdown
Contributor

whyitfor commented Jun 4, 2024

Would it make sense to add a benchmark test with this to record the speedup? Perhaps a recursive unpack of a filesystem pulled from http://tinycorelinux.net/15.x/x86/release/ or somewhere else?

We would probably want to find a pytest fixture that can record several runs of one test case, and/or perhaps fail if the execution time exceeds a certain threshold. Alternatively, we just add a fixture that records test duration

@whyitfor whyitfor requested a review from rbs-jacob June 5, 2024 11:15
Copy link
Copy Markdown
Contributor

@whyitfor whyitfor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the changelog

@ANogin
Copy link
Copy Markdown
Contributor Author

ANogin commented Jun 5, 2024

Please update the changelog

Done.

@ANogin ANogin requested a review from whyitfor June 5, 2024 17:48
@rbs-jacob
Copy link
Copy Markdown
Member

Since we're adding fallbacks like this, it probably also makes sense to fall back on gzip if pigz is not installed. We can be sure that it will be present in Docker images, but it likely will not for people using pip install OFRAK.

@whyitfor
Copy link
Copy Markdown
Contributor

whyitfor commented Jun 6, 2024

@rbs-jacob, interesting idea re:gzip/pigz. Implementing that cleanly with our ComponentExternalTool setup might take some additional thought. I'm in favor of merging this in for now to get the performance improvement -- if you see value in that as a further change please add an issue.

@whyitfor whyitfor merged commit 35aac89 into redballoonsecurity:master Jun 6, 2024
@ANogin ANogin deleted the feature/zip_in_python branch June 6, 2024 14:58
@ANogin ANogin mentioned this pull request Jun 6, 2024
1 task
ANogin pushed a commit to ANogin/ofrak that referenced this pull request Aug 20, 2024
…edballoonsecurity#472)

* Use GzipFile python unpacker for speed, fall back on pigz if needed

* Add a changelog entry
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants