-
Notifications
You must be signed in to change notification settings - Fork 43
Optimize scan method #107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize scan method #107
Conversation
It shows String as a pattern is 1.23x faster than Regexp as a pattern.
```
$ benchmark-driver benchmark/check_until.yaml
Warming up --------------------------------------
regexp 9.300M i/s - 9.509M times in 1.022507s (107.53ns/i)
regexp_var 9.110M i/s - 9.262M times in 1.016682s (109.76ns/i)
string 9.051M i/s - 9.304M times in 1.028047s (110.49ns/i)
string_var 11.187M i/s - 11.722M times in 1.047826s (89.39ns/i)
Calculating -------------------------------------
regexp 10.197M i/s - 27.899M times in 2.735904s (98.06ns/i)
regexp_var 10.198M i/s - 27.331M times in 2.680120s (98.06ns/i)
string 10.089M i/s - 27.152M times in 2.691312s (99.12ns/i)
string_var 12.530M i/s - 33.562M times in 2.678533s (79.81ns/i)
Comparison:
string_var: 12529824.3 i/s
regexp_var: 10197773.2 i/s - 1.23x slower
regexp: 10197371.0 i/s - 1.23x slower
string: 10088701.3 i/s - 1.24x slower
```
See: https://github.com/ruby/ruby/blob/cf8388f76c4c2ff2f46d0d2aa2cf5186e05ff606/re.c#L251-L256
It shows String as a pattern is 2.43x faster than Regexp as a pattern.
```
$ benchmark-driver benchmark/check_until.yaml
Warming up --------------------------------------
regexp 7.371M i/s - 7.352M times in 0.997443s (135.67ns/i)
regexp_var 7.303M i/s - 7.262M times in 0.994284s (136.92ns/i)
string 13.596M i/s - 13.535M times in 0.995475s (73.55ns/i)
string_var 15.032M i/s - 14.942M times in 0.994038s (66.53ns/i)
Calculating -------------------------------------
regexp 9.120M i/s - 22.113M times in 2.424781s (109.65ns/i)
regexp_var 8.914M i/s - 21.910M times in 2.458050s (112.19ns/i)
string 22.174M i/s - 40.789M times in 1.839495s (45.10ns/i)
string_var 19.994M i/s - 45.095M times in 2.255454s (50.02ns/i)
Comparison:
string: 22174077.0 i/s
string_var: 19993967.8 i/s - 1.11x slower
regexp: 9119635.2 i/s - 2.43x slower
regexp_var: 8913743.3 i/s - 2.49x slower
```
See: https://github.com/jruby/jruby/blob/be7815ec02356a58891c8727bb448f0c6a826d96/core/src/main/java/org/jruby/util/StringSupport.java#L1706-L1720
kou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add how to optimize this to the description? Is avoiding grn_enc_get() the main optimization?
| if (S_RESTLEN(p) < RSTRING_LEN(pattern)) { | ||
| return Qnil; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this move? Why is this needless for !headonly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A similar check is made within rb_memsearch() within !headonly.
https://github.com/ruby/ruby/blob/cf8388f76c4c2ff2f46d0d2aa2cf5186e05ff606/re.c#L251-L256
long
rb_memsearch(const void *x0, long m, const void *y0, long n, rb_encoding *enc)
{
const unsigned char *x = x0, *y = y0;
if (m > n) return -1;
- m =
RSTRING_LEN(pattern) - n =
S_RESTLEN(p)
This means the following :
if (RSTRING_LEN(pattern) > S_RESTLEN(p)) return -1;
Sorry. |
|
Thanks. It seems that this has 4 optimizations:
|
OK, I see. |
CRuby
Why?
1. Remove duplicate
if (S_RESTLEN(p) < RSTRING_LEN(pattern)) return Qnil;checks in!headonly.A similar check is made within
rb_memsearch()within!headonly.https://github.com/ruby/ruby/blob/cf8388f76c4c2ff2f46d0d2aa2cf5186e05ff606/re.c#L251-L256
RSTRING_LEN(pattern)S_RESTLEN(p)This means the following :
if (RSTRING_LEN(pattern) > S_RESTLEN(p)) return -1;Both checks are the same.
2. Removed unnecessary use of
rb_enc_get()In
rb_strseq_index(), the result ofrb_enc_check()is used.Benchmark
It shows String as a pattern is 1.23x faster than Regexp as a pattern.
JRuby
Why?
1. Remove duplicate
if (restLen() < pattern.size()) return context.nil;checks in!headonly.strscan/ext/jruby/org/jruby/ext/strscan/RubyStringScanner.java
Lines 371 to 373 in d31274f
This means the following :
if (str.size() - curr < pattern.size()) return context.nil;A similar check is made within
StringSupport#index()within!headonly.https://github.com/jruby/jruby/blob/be7815ec02356a58891c8727bb448f0c6a826d96/core/src/main/java/org/jruby/util/StringSupport.java#L1706-L1720
strBLpatternBLstrBeg + currThis means the following :
if (strBL.realSize() - (strBeg + curr) < patternBL.realSize()) return -1;Both checks are the same.
2. Use
currPtr()instead ofstrBeg + curr. Because they are identical.strscan/ext/jruby/org/jruby/ext/strscan/RubyStringScanner.java
Lines 267 to 268 in d31274f
strscan/ext/jruby/org/jruby/ext/strscan/RubyStringScanner.java
Lines 359 to 361 in d31274f
Benchmark
It shows String as a pattern is 2.43x faster than Regexp as a pattern.