Skip to content

Performance improvement with string matcher#2

Open
fulcrum70 wants to merge 2 commits into
nekosoftllc:mainfrom
fulcrum70:string_matcher
Open

Performance improvement with string matcher#2
fulcrum70 wants to merge 2 commits into
nekosoftllc:mainfrom
fulcrum70:string_matcher

Conversation

@fulcrum70
Copy link
Copy Markdown

This PR introduces a performance enhancement for matching user agents to crawler patterns. In the current version all matching is performed using regex Pattern.matcher. Since there are currently over 1,400 patterns this creates over 1,400 Matcher objects per crawler check. On a busy website, this generates a lot of pressure on the GC.

In this PR, crawler patterns are analyzed to detect if they can be matched using simple string operations (equals, contains, starts with and ends with) otherwise regex matching is used. This approach generates fewer Matcher objects and improves execution speed since string matching is faster than regex matching. Matching is done in new Matcher class which has corresponding new JUnit test.

All existing tests pass with this approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant