Parsing performance #2652

soutaro · 2025-09-02T08:17:39Z

This is related to the RBS file parsing performance degradation.

The RBS file parsing in rbs-4.0 (dev) is ~5x slower than rbs-3.9, based on the core/**/*.rbs files, especially for smaller files.

https://docs.google.com/spreadsheets/d/1dQBGIC1_zWco6c5OHH5VunP8ZuZLzwaMDB-U31OgUAc/edit?gid=0#gid=0

I implemented minor changes to improve the parsing performance, reusing empty array and hash objects. The changes improved the parsing performance slightly, but it still clearly slower than rbs-3.9.

soutaro · 2025-09-12T14:33:24Z

Try another allocator API, not memmap with linked list of heap (try reverting Use one massive slice of virtual memory per arena Shopify/rbs#38)
Try introduce batch loading API -- to reuse the allocators, reduce C-Ruby swtiching
Reuse hash object for keyword arguments

amomchilov · 2025-09-12T14:47:38Z

Another thought:

How much time do we spend computing assertions? Should we disable them in "release" mode?

Currently, rbs_assert() is just a function that's never disabled:

rbs/src/util/rbs_assert.c

Lines 8 to 19 in 4baf465

    
           void rbs_assert(bool condition, const char *fmt, ...) { 
        
               if (condition) { 
        
                   return; 
        
               } 
        
               va_list args; 
        
               va_start(args, fmt); 
        
               vfprintf(stderr, fmt, args); 
        
               va_end(args); 
        
               fprintf(stderr, "\n"); 
        
               exit(EXIT_FAILURE); 
        
           }

Perhaps we should rename the function to rbs_assert_impl(), and create an #define rbs_assert() macro that calls it only if we're in debug mode.

For comparison, C's standard library's assert.h, makes assert() a no-op if you #define NDEBUG

amomchilov · 2025-09-12T14:49:01Z

Another thought: our extconf.rb doesn't specify an optimization level. We disable optimizations (-O0) if DEBUG is defined:

rbs/ext/rbs_extension/extconf.rb

Line 21 in 4baf465

append_cflags ['-O0', '-g'] if ENV['DEBUG']

But we never specify -O3 if we're in a "release" mode

soutaro · 2025-09-16T07:50:09Z

It looks like Makefile has -O3.

...
cflags   = $(hardenflags) -fdeclspec  $(optflags) $(debugflags) $(warnflags)
optflags = -O3 -fno-fast-math
CFLAGS   = $(CCDLFLAGS) $(cflags) -fno-common -pipe -std=gnu99 -Wimplicit-fallthrough -Wunused-result -Wc++-compat $(ARCH_FLAG)
...

soutaro · 2025-09-16T08:01:55Z

It looks like ~3% are spent for rbs_assert in debug build. Removing the calls in release build slightly improved.

Add prepare_bench task

`bundle exec ruby benchmarks.rb core/**/*.rbs`

This is a prototype which doesn't `munmap` the heap. Introducing parser class to manage the allocators and changing the parsing methods to instance methods should allow releasing the heap.

soutaro · 2025-09-19T08:28:43Z

Finally the parser is about twice faster than 3.9 parser!

The benchmark parses all of the files given as ARGV, and runs the benchmark-ips: Higher number means better.

> BUNDLE_GEMFILE=Gemfile-39 bundle exec ruby benchmarks2.rb core/**/*.rbs ../../ruby/gem_rbs_collection/gems/activerecord/8.0/*.rbs sig/**/*.rbs
Benchmarking parsing 177 files...
ruby 3.4.4 (2025-05-14 revision a38531fd3f) +PRISM [arm64-darwin23]
Warming up --------------------------------------
             parsing     1.000 i/100ms
Calculating -------------------------------------
             parsing      9.749 (± 0.0%) i/s  (102.58 ms/i) -     49.000 in   5.030113s
> bundle exec ruby benchmarks2.rb core/**/*.rbs ../../ruby/gem_rbs_collection/gems/activerecord/8.0/*.rbs sig/**/*.rbs                          
Benchmarking parsing 177 files...
ruby 3.4.4 (2025-05-14 revision a38531fd3f) +PRISM [arm64-darwin23]
Warming up --------------------------------------
             parsing     1.000 i/100ms
Calculating -------------------------------------
             parsing     19.478 (± 0.0%) i/s   (51.34 ms/i) -     98.000 in   5.032890s

soutaro · 2025-09-19T08:34:33Z

src/lexstate.c

-    if (lexer->current.char_pos == lexer->end_pos) {
-        lexer->last_char = '\0';
-        return 0;
+    return lexer->current_code_point;


The rbs_peek simply returns the current_code_point that was read in rbs_next_char. We can skip calling expensive rbs_utf8_string_to_codepoint function.

soutaro · 2025-09-19T08:35:16Z

include/rbs/lexer.h

-    rbs_position_t start;     /* The start position of the current token */
+    int start_pos;          /* The character position that defines the start of the input */
+    int end_pos;            /* The character position that defines the end of the input */
+    rbs_position_t current; /* The current position: just before the current_character */


The lexer struct now stores the character that has read -- its codepoint and byte width.

soutaro · 2025-09-26T08:32:14Z

Merged pull requests that is extracted from this PR. 🎉

soutaro added 19 commits September 19, 2025 15:42

Add benchmark script and task

b9cbad1

Add prepare_bench task

Add -p option if DEBUG

4cae175

clang-format@21

82446c9

Add Gemfile-39 for rbs-3.9

529d2e9

Add benchmarks.rb

08adc23

`bundle exec ruby benchmarks.rb core/**/*.rbs`

Reuse empty array

fbe5ae0

Use EMPTY_ARRAY for location list too

e76a753

rb_ary_cat

c55e09c

Define functions for each translation

4c2914b

Reuse empty hash

b82b0ac

Reuse memory allocator

59db993

This is a prototype which doesn't `munmap` the heap. Introducing parser class to manage the allocators and changing the parsing methods to instance methods should allow releasing the heap.

Update benchmark.rb to print progress

4e9ef1c

Add parse.rb for profiling

99c1fd7

format?

8f149ec

Delete rbs_assert in the *release* build.

dc99f39

Add RBS_LIKELY and RBS_UNLIKELY macros

d3205f3

Optimize lexer

6ada4aa

Add loading multiple files benchmark

3acbb70

Add batch parsing API

bec4a54

soutaro force-pushed the parsing-performance branch from 4bb28e4 to bec4a54 Compare September 19, 2025 08:26

soutaro commented Sep 19, 2025

View reviewed changes

This was referenced Sep 26, 2025

Faster lexical analyzer #2665

Merged

Use malloc based allocator #2666

Merged

soutaro closed this Sep 26, 2025

amomchilov mentioned this pull request Oct 24, 2025

Degradation of parsing performance with v4.0.0.dev.4 #2563

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parsing performance #2652

Parsing performance #2652

Uh oh!

soutaro commented Sep 2, 2025

Uh oh!

soutaro commented Sep 12, 2025 •

edited

Loading

Uh oh!

amomchilov commented Sep 12, 2025 •

edited

Loading

Uh oh!

amomchilov commented Sep 12, 2025

Uh oh!

soutaro commented Sep 16, 2025

Uh oh!

soutaro commented Sep 16, 2025

Uh oh!

soutaro commented Sep 19, 2025

Uh oh!

soutaro Sep 19, 2025

Uh oh!

soutaro Sep 19, 2025

Uh oh!

soutaro commented Sep 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Parsing performance #2652

Parsing performance #2652

Uh oh!

Conversation

soutaro commented Sep 2, 2025

Uh oh!

soutaro commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amomchilov commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amomchilov commented Sep 12, 2025

Uh oh!

soutaro commented Sep 16, 2025

Uh oh!

soutaro commented Sep 16, 2025

Uh oh!

soutaro commented Sep 19, 2025

Uh oh!

soutaro Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

soutaro Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

soutaro commented Sep 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

soutaro commented Sep 12, 2025 •

edited

Loading

amomchilov commented Sep 12, 2025 •

edited

Loading