-
Notifications
You must be signed in to change notification settings - Fork 227
Parsing performance #2652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parsing performance #2652
Conversation
|
|
Another thought: How much time do we spend computing assertions? Should we disable them in "release" mode? Currently, Lines 8 to 19 in 4baf465
Perhaps we should rename the function to For comparison, C's standard library's |
|
Another thought: our rbs/ext/rbs_extension/extconf.rb Line 21 in 4baf465
But we never specify |
|
It looks like |
|
It looks like ~3% are spent for |
Add prepare_bench task
`bundle exec ruby benchmarks.rb core/**/*.rbs`
This is a prototype which doesn't `munmap` the heap. Introducing parser class to manage the allocators and changing the parsing methods to instance methods should allow releasing the heap.
4bb28e4 to
bec4a54
Compare
|
Finally the parser is about twice faster than 3.9 parser! The benchmark parses all of the files given as |
| if (lexer->current.char_pos == lexer->end_pos) { | ||
| lexer->last_char = '\0'; | ||
| return 0; | ||
| return lexer->current_code_point; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rbs_peek simply returns the current_code_point that was read in rbs_next_char. We can skip calling expensive rbs_utf8_string_to_codepoint function.
| rbs_position_t start; /* The start position of the current token */ | ||
| int start_pos; /* The character position that defines the start of the input */ | ||
| int end_pos; /* The character position that defines the end of the input */ | ||
| rbs_position_t current; /* The current position: just before the current_character */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lexer struct now stores the character that has read -- its codepoint and byte width.
|
Merged pull requests that is extracted from this PR. 🎉 |
This is related to the RBS file parsing performance degradation.
The RBS file parsing in rbs-4.0 (dev) is ~5x slower than rbs-3.9, based on the
core/**/*.rbsfiles, especially for smaller files.https://docs.google.com/spreadsheets/d/1dQBGIC1_zWco6c5OHH5VunP8ZuZLzwaMDB-U31OgUAc/edit?gid=0#gid=0
I implemented minor changes to improve the parsing performance, reusing empty array and hash objects. The changes improved the parsing performance slightly, but it still clearly slower than rbs-3.9.