-
Notifications
You must be signed in to change notification settings - Fork 43
Add a method for peeking and scanning bytes as integers #89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
kou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that #read_byte and #peek_byte are better because #getbyte is obsoleted by #get_byte.
BTW, #read_byte may not a good name... I don't feel that it implies that integer not string... Do you have other candidate name? (I'll also consider better name.)
Adding an underscore seems fine. I chose
I'm not sure what to name it. I think Maybe |
55cde40 to
1e8ec46
Compare
kou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
I chose
readbytebecause ofIO#readbyte.
Ah, OK. I didn't notice it. It makes sense. Let's use read (read_byte) for this feature.
We may want to deprecate get_byte and getbyte.
Could you update these lists?
Lines 1604 to 1619 in 338d870
| * === Advancing the Scan Pointer | |
| * | |
| * - #getch | |
| * - #get_byte | |
| * - #scan | |
| * - #scan_until | |
| * - #skip | |
| * - #skip_until | |
| * | |
| * === Looking Ahead | |
| * | |
| * - #check | |
| * - #check_until | |
| * - #exist? | |
| * - #match? | |
| * - #peek |
Could you also update the PR description because we use the PR description for commit message?
@headius Could you review JRuby part?
| @@ -1,5 +1,6 @@ | |||
| #!/usr/bin/env ruby | |||
|
|
|||
| gem 'strscan' | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow! I didn't notice that we need this to use installed gem with Ruby 3.2 and 3.3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might be able to use bundle exec? I'm not sure what the policy of bundled gems is. I thought you didn't need gem either, but it seems to fix tests in CI.
I kind of wish
Fixed!
Done! |
|
I can understand what you said. I will reconsider API. Please give me a few days... |
|
How about using
I can understand the same |
Yes! I like this a lot. I'll change the method name.
💯 |
This commit adds `scan_byte` and `peek_byte`. `scan_byte` will read the current byte, return it as an integer, and advance the cursor. `peek_byte` will return the current byte as an integer without advancing the cursor.
Co-authored-by: Sutou Kouhei <kou@clear-code.com>
Co-authored-by: Sutou Kouhei <kou@clear-code.com>
|
Let's merge this. @headius If you have any comment, please let us know later. |
integers (ruby/strscan#89) This commit adds `scan_byte` and `peek_byte`. `scan_byte` will scan the current byte, return it as an integer, and advance the cursor. `peek_byte` will return the current byte as an integer without advancing the cursor. Currently `StringScanner#get_byte` returns a string, but I want to get the current byte without allocating a string. I think this will help with writing high performance lexers. --------- ruby/strscan@873aba2e5d Co-authored-by: Sutou Kouhei <kou@clear-code.com>
|
I was unable to review in time but there are some performance improvements I will submit as a separate PR. |
|
The PR in #90 optimizes the new methods and should be merged before release. The original was functionally correct but |
A few simple changes on top of #89: * Fields accessed more than once should be loaded into a local variable (curr, byteList, runtime). Otherwise, the JVM in some modes will re-load from memory due to the possibility of a concurrent update (better caching, constant propagation). * Use ByteList.get(index) to access directly rather than str.getBytes which makes a copy byte[] (reduce allocation, let ByteList calculate begin offset and 0xFF masking). Performance is nearly 2x better, mostly because we've eliminated the copy: BEFORE: ``` [] strscan $ jruby -Xcompile.invokedynamic -Ilib -rstrscan -rbenchmark -e 'ss = StringScanner.new("x"); 10.times { puts Benchmark.measure { i = 0; while i < 100_000_000; i+=1; ss.peek_byte; end } }' 1.870000 0.080000 1.950000 ( 1.393004) 1.300000 0.020000 1.320000 ( 1.221120) 1.210000 0.000000 1.210000 ( 1.201006) 1.220000 0.010000 1.230000 ( 1.188845) 1.240000 0.000000 1.240000 ( 1.226052) 1.230000 0.010000 1.240000 ( 1.203179) 1.490000 0.000000 1.490000 ( 1.254489) 1.230000 0.010000 1.240000 ( 1.218346) 1.230000 0.000000 1.230000 ( 1.206789) 1.300000 0.010000 1.310000 ( 1.237078) ``` AFTER: ``` [] strscan $ jruby -Xcompile.invokedynamic -Ilib -rstrscan -rbenchmark -e 'ss = StringScanner.new("x"); 10.times { puts Benchmark.measure { i = 0; while i < 100_000_000; i+=1; ss.peek_byte; end } }' 1.250000 0.080000 1.330000 ( 0.937111) 0.740000 0.000000 0.740000 ( 0.707315) 0.710000 0.010000 0.720000 ( 0.701804) 0.710000 0.000000 0.710000 ( 0.697933) 0.710000 0.000000 0.710000 ( 0.704491) 0.730000 0.010000 0.740000 ( 0.712514) 0.710000 0.000000 0.710000 ( 0.701410) 0.740000 0.000000 0.740000 ( 0.715182) 0.720000 0.010000 0.730000 ( 0.703247) 0.770000 0.000000 0.770000 ( 0.768175) ```
This commit adds
scan_byteandpeek_byte.scan_bytewill scan the current byte, return it as an integer, and advance the cursor.peek_bytewill return the current byte as an integer without advancing the cursor.Currently
StringScanner#get_bytereturns a string, but I want to get the current byte without allocating a string. I think this will help with writing high performance lexers.Thanks!