Enable custom regex engine use.#7
Conversation
Allow library users to customize the regular expression engine used during fingerprint matching. Custom regex engines can be configured by implementing the RecogPatternMatcher interface. This feature is desirable as Java's regex engine is susceptible to catastrophic backtracking.
|
For reference, here's an example of how a user would use the linear-time regex engine Rej2 Rej2PatternMatcher.javapackage com.rapid7.recog.pattern;
import com.google.re2j.Matcher;
import com.google.re2j.Pattern;
public class Re2jPatternMatcher implements RecogPatternMatcher {
private static class Re2jPatternMatchResult implements RecogPatternMatchResult {
private final Matcher matcher;
Re2jPatternMatchResult(Matcher matcher) {
this.matcher = matcher;
}
@Override
public int groupCount() {
return matcher.groupCount();
}
@Override
public String group(int group) {
return matcher.group(group);
}
@Override
public String group(String group) {
return matcher.group(group);
}
}
private final Pattern pattern;
public Re2jPatternMatcher(String pattern, int flags) {
this.pattern = Pattern.compile(pattern, flags);
}
@Override
public String getPattern() {
return pattern.pattern();
}
@Override
public int getFlags() {
return pattern.flags();
}
@Override
public boolean matches(String input) {
return input != null && pattern.matcher(input).find();
}
@Override
public RecogPatternMatchResult match(String input) {
if (input == null) {
return null;
}
Matcher matcher = pattern.matcher(input);
return matcher.find() ? new Re2jPatternMatchResult(matcher) : null;
}
} |
|
For a little context around why this is needed, in <fingerprint pattern="^(?:(?:\d+.){3}\d+):\d{1,4}$">
<description>A banner consisting of an IP address and port -- assert nothing.</description>
<example>192.168.0.4:9999</example>
</fingerprint>When running a Server header value found in the wild (pasted below), java's regex implementation will not complete and will pin a cpu until cancelled. DetailsOffending InputWhile patching offending fingerprints so that they're not susceptible to backtracking is an option, my team as opted to instead use a regex engine that can guarantee linear-time execution. Tagging @gschneider-r7, @tsellers-r7, @dabdine-r7 to get the ball rolling here 😄 |
|
Thanks for the PR @hudclark I'll have to defer to @gschneider-r7 on this project. That being said, that regex explosion is a bit rough. Please let us know anytime you see something like that because it likely impacts other engines. |
|
This seems fine to me at a glance, though I haven't worked on this project in a while so I have to defer to those who would be impacted by changes to recog-java. @ihorbatiuk-r7 @rkirk-r7 @ekelly-rapid7 Note that this repo isn't building on travis-ci anymore so I don't know if it is currently releasable to maven central or if someone has already addressed that internally at R7. |
Update the README to include documentation about the default regular expression engine used in recog-java and provide an example for how users can override this behavior.
Previously, this interface was package-private and didn't support library users supplying their own factories.
ekelly-1898
left a comment
There was a problem hiding this comment.
This looks like a good change to me.
Thanks very much for the contribution @hudclark. Apologies for the delay in reviewing.
I'll see about getting this landed and released.
|
Hi @hudclark apologies again for the delay. We've got the CI side sorted on our side. Could you please merge latest master to your branch and I'll be able to land and release this. |
Thanks @ekelly-rapid7! I've just merged |
|
Just an FYI, as part of working on rapid7/recog#367 I used @hudclark 's example catastrophic use case ( #7 (comment) ) against every regex currently in recog to ensure that nothing else would choke on it. I've identified a few other regexes that could probably use some performance tuning but none of them are likely to be noticeable in normal use.
I also checked to ensure that every regex will compile with |
Enable custom regex engine use.
Allow library users to customize the regular expression engine used
during fingerprint matching. Custom regex engines can be configured
by implementing the
RecogPatternMatcherinterface.This feature is desirable as Java's regex engine is susceptible to
catastrophic backtracking.
Description
RecogPatternMatcherinterface.RecogMatcherto use an underlyingRecogPatternMatcherinstance, rather thanjava.regex.*classes directlyJavaRegexRecogMatcherto use as a default an provide backwards compatability.How Has This Been Tested?
JavaRegexRecogMatcherhas been exercised via existing unit tests. Since it is the default, it is used in all existing tests.CustomPatternMatcherTesthas been added to validate customRecogPatternMatchersmay be used.Types of changes
Checklist: