Skip to content

Conversation

@byroot
Copy link
Member

@byroot byroot commented Nov 26, 2024

Followup: #115

scan_integer is now implemented in Ruby as to efficiently handle keyword arguments without allocating a Hash. Given the goal of scan_integer is to more effciently parse integers without having to allocate an intermediary object, using rb_scan_args would defeat the purpose.

Additionally, the C implementation now uses rb_isdigit and rb_isxdigit, because on Windows isdigit is locale dependent.

cc @kou

@byroot byroot force-pushed the parse-integer-base branch 2 times, most recently from 958d211 to e68e38f Compare November 26, 2024 11:27
@byroot byroot force-pushed the parse-integer-base branch 3 times, most recently from ff7f16a to 89292ec Compare November 27, 2024 07:49
@byroot
Copy link
Member Author

byroot commented Nov 27, 2024

Hum, Truffle CI is broken, because now lib/strscan.rb is shadowing the lib/truffle/strscan.rb that's embeded by Truffle.

@eregon any idea how to fix it? Is there even value in testing TruffleRuby in this repo if the implementation is in Truffle itself?

@byroot
Copy link
Member Author

byroot commented Nov 27, 2024

I pushed caaaf94 to fix TruffleRuby CI, but it's quite ugly.

@kou
Copy link
Member

kou commented Nov 27, 2024

I want to remove TruffleRuby related codes from this repository but how about moving lib/strscan.rb to lib/strscan/strscan.rb and calling require "strscan/strscan.rb" in Init_strscan()?

@byroot
Copy link
Member Author

byroot commented Nov 27, 2024

how about moving lib/strscan.rb to lib/strscan/strscan.rb and calling require "strscan/strscan.rb" in Init_strscan()?

If that's what you prefer, it's possible yes.

Followup: ruby#115

`scan_integer` is now implemented in Ruby as to efficiently handle
keyword arguments without allocating a Hash. Given the goal of `scan_integer`
is to more effciently parse integers without having to allocate an intermediary
object, using `rb_scan_args` would defeat the purpose.

Additionally, the C implementation now uses `rb_isdigit` and `rb_isxdigit`,
because on Windows `isdigit` is locale dependent.
@byroot byroot force-pushed the parse-integer-base branch from caaaf94 to 5b950ca Compare November 27, 2024 08:09
@byroot byroot requested a review from kou November 27, 2024 08:12

rb_define_method(StringScanner, "named_captures", strscan_named_captures, 0);

rb_require("strscan/strscan");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to use rb_funcall(rb_mKernel, rb_intern("require"), 1, rb_str_new_cstr("strscan/strscan")) or something because RubyGems may replace rb_require().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure? Other gems like psych, json etc use rb_require directly without known issues.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I didn't know it.
OK. Let's use this. I hope that this is a temporary workaround...

@kou kou merged commit eb875ea into ruby:master Nov 27, 2024
37 checks passed
@kou
Copy link
Member

kou commented Nov 27, 2024

Thanks.

@byroot byroot deleted the parse-integer-base branch November 27, 2024 08:32
@byroot
Copy link
Member Author

byroot commented Nov 27, 2024

Thank you. Let me know if there any more features you'd like for the initial version, and if some use cases come up in the future I'm happy to help implement them.

@kou
Copy link
Member

kou commented Nov 27, 2024

I don't have any more requests for the initial version.

I'll release a new version in a few weeks.

@byroot
Copy link
Member Author

byroot commented Nov 27, 2024

For the record:

# frozen_string_literal: true
require 'strscan'
require 'benchmark/ips'

source = 10_000.times.map { rand(9999999).to_s }.join(",").freeze


def scan_to_i(source)
  scanner = StringScanner.new(source)
  while number = scanner.scan(/\d+/)
    number.to_i
    scanner.skip(",")
  end
end

def scan_integer(source)
  scanner = StringScanner.new(source)
  while scanner.scan_integer
    scanner.skip(",")
  end
end


Benchmark.ips do |x|
  x.report("scan.to_i") { scan_to_i(source) }
  x.report("scan_integer") { scan_integer(source) }
  x.compare!
end
$ bundle exec ruby --yjit /tmp/bench-scan.rb 
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
           scan.to_i    90.000 i/100ms
        scan_integer   215.000 i/100ms
Calculating -------------------------------------
           scan.to_i    907.964 (± 1.1%) i/s    (1.10 ms/i) -      4.590k in   5.055944s
        scan_integer      2.144k (± 0.2%) i/s  (466.37 μs/i) -     10.750k in   5.013472s

Comparison:
        scan_integer:     2144.2 i/s
           scan.to_i:      908.0 i/s - 2.36x  slower

I was hopping for more, but 2.3x is already a nice gain, and profile shows there is some optimizations we could do in the Ruby side: https://share.firefox.dev/49bOcTn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants