Skip to content

Conversation

@headius
Copy link
Contributor

@headius headius commented Feb 26, 2024

A few simple changes on top of #89:

  • Fields accessed more than once should be loaded into a local variable (curr, byteList, runtime). Otherwise, the JVM in some modes will re-load from memory due to the possibility of a concurrent update (better caching, constant propagation).
  • Use ByteList.get(index) to access directly rather than str.getBytes which makes a copy byte[] (reduce allocation, let ByteList calculate begin offset and 0xFF masking).

Performance is nearly 2x better, mostly because we've eliminated the copy:

BEFORE:

[] strscan $ jruby -Xcompile.invokedynamic -Ilib -rstrscan -rbenchmark -e 'ss = StringScanner.new("x"); 10.times { puts Benchmark.measure { i = 0; while i < 100_000_000; i+=1; ss.peek_byte; end } }'
  1.870000   0.080000   1.950000 (  1.393004)
  1.300000   0.020000   1.320000 (  1.221120)
  1.210000   0.000000   1.210000 (  1.201006)
  1.220000   0.010000   1.230000 (  1.188845)
  1.240000   0.000000   1.240000 (  1.226052)
  1.230000   0.010000   1.240000 (  1.203179)
  1.490000   0.000000   1.490000 (  1.254489)
  1.230000   0.010000   1.240000 (  1.218346)
  1.230000   0.000000   1.230000 (  1.206789)
  1.300000   0.010000   1.310000 (  1.237078)

AFTER:

[] strscan $ jruby -Xcompile.invokedynamic -Ilib -rstrscan -rbenchmark -e 'ss = StringScanner.new("x"); 10.times { puts Benchmark.measure { i = 0; while i < 100_000_000; i+=1; ss.peek_byte; end } }'
  1.250000   0.080000   1.330000 (  0.937111)
  0.740000   0.000000   0.740000 (  0.707315)
  0.710000   0.010000   0.720000 (  0.701804)
  0.710000   0.000000   0.710000 (  0.697933)
  0.710000   0.000000   0.710000 (  0.704491)
  0.730000   0.010000   0.740000 (  0.712514)
  0.710000   0.000000   0.710000 (  0.701410)
  0.740000   0.000000   0.740000 (  0.715182)
  0.720000   0.010000   0.730000 (  0.703247)
  0.770000   0.000000   0.770000 (  0.768175)

A few simple changes:

* Fields accessed more than once should be loaded into a local
  variable (curr, byteList, runtime).
* str.getBytes makes a copy of the visible range of the underlying
  ByteList; use ByteList.get(index) to access directly.

Performance is nearly 2x better, mostly because we've eliminated
the copy:

BEFORE:
[] strscan $ jruby -Xcompile.invokedynamic -Ilib -rstrscan -rbenchmark -e 'ss = StringScanner.new("x"); 10.times { puts Benchmark.measure { i = 0; while i < 100_000_000; i+=1; ss.peek_byte; end } }'
  1.870000   0.080000   1.950000 (  1.393004)
  1.300000   0.020000   1.320000 (  1.221120)
  1.210000   0.000000   1.210000 (  1.201006)
  1.220000   0.010000   1.230000 (  1.188845)
  1.240000   0.000000   1.240000 (  1.226052)
  1.230000   0.010000   1.240000 (  1.203179)
  1.490000   0.000000   1.490000 (  1.254489)
  1.230000   0.010000   1.240000 (  1.218346)
  1.230000   0.000000   1.230000 (  1.206789)
  1.300000   0.010000   1.310000 (  1.237078)

AFTER:
[] strscan $ jruby -Xcompile.invokedynamic -Ilib -rstrscan -rbenchmark -e 'ss = StringScanner.new("x"); 10.times { puts Benchmark.measure { i = 0; while i < 100_000_000; i+=1; ss.peek_byte; end } }'
  1.250000   0.080000   1.330000 (  0.937111)
  0.740000   0.000000   0.740000 (  0.707315)
  0.710000   0.010000   0.720000 (  0.701804)
  0.710000   0.000000   0.710000 (  0.697933)
  0.710000   0.000000   0.710000 (  0.704491)
  0.730000   0.010000   0.740000 (  0.712514)
  0.710000   0.000000   0.710000 (  0.701410)
  0.740000   0.000000   0.740000 (  0.715182)
  0.720000   0.010000   0.730000 (  0.703247)
  0.770000   0.000000   0.770000 (  0.768175)
@kou
Copy link
Member

kou commented Feb 27, 2024

Thanks!

@kou kou merged commit fcaf4c1 into ruby:master Feb 27, 2024
@headius headius deleted the jruby_getbyte_optz branch February 27, 2024 02:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants