Draft
Conversation
as well as add stdin or URLs support. Generating regex files out of `regexes.yaml` is a convenient first step to make subsequent scripts simpler (e.g. not require every one of them to read yaml). This script could be an yq command (plus an optional curl first step), but e.g. nix's `yq` is a python wrapper around `jq` which depends on pyyaml so the gain is limited.
These three files / scripts are 3 different implementations (python, regex, regex-filtered) of the same thing: taking a regex set and a bunch of needles, for each needle find the first matching regex, and output its index (0-indexed). This is the core loop of ua-parser, and allows validating that regex-filtered matches a more naive version of the same process. Happily I couldn't find any divergence although that means I did a fair amount of useless work. Also the python version is really slow compared to even the regex one, so probably don't use that... `paste` allows using it to combine index extraction of multiple domains as well as the original needle as TSV documents if that's of use. This could also be expanded to multi-index extraction if that's a need for anyone and should be checked more extensively. Note that only the python version supports stdin input at this point, I couldn't be arsed to do that with the Rust ones, but process substitution ought work fine anyway? The needles are read on the go so they should not need to be an actual file. This may not be in a state fit for performance checking as the output loop of the rust version is the worst (no buffering, no stdout-locking).
62ddfc3 to
d67517a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Try to add more validations / cross-impl checks to ensure regex-filtered yields the same result as more naive implementations even on complete / real-world datasets.
if you see bullying of unfortunate runtimes you should speak upin that case re2 and FilteredRE2 implementations should be added.