-
Notifications
You must be signed in to change notification settings - Fork 14
Add lex based banner #96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Described in #68 (comment). This implements a method of checking the source code for logically missing component pairs: - {} - [] - |<value>| - () - keyword/end It seems to be much more robust than using a regex on the parser errors. It also seems to provide more confident results than the parser errors. For cases where we are not confident about why code is invalid, we still fall back on the parser error messages. That allows this functionality to act as a progressive enhancement rather than a wholesale replacement. We may be able to specialize other checks in the future as well. The most common one for me that comes up is missing a trailing comma in a hash/array/method-call: ``` query = Cutlass::FunctionQuery.new( port: port # <== here body: body ).call ```
The integration test was poorly named. It's really integration cases for when we're not using a different process.
As listed in #95 there are other cases where we have valid code nested in a way we didn't previously handle (via `CleanDocument`). The specific case is: ```ruby def ruby_install_binstub_path(ruby_layer_path = ".") @ruby_install_binstub_path ||= if ruby_version.build? "#{build_ruby_path}/bin" elsif ruby_version "#{ruby_layer_path}/#{slug_vendor_ruby}/bin" else "" end end ``` As it's being through being validated/hidden, the inside is removed first, which generates an invalid code block. This fix is to re-check all code in a block even if it is "hidden". This conveniently drastically improves the results of #88 beyond just that small case, the whole output becomes very focused close #95 close #88
Benchmark: ``` require 'ripper' require 'benchmark/ips' @array = 20.times.map do obj = "puts a + b" def obj.empty? true end def obj.hidden? true end obj end def invalid?(source) source = source.join if source.is_a?(Array) source = source.to_s Ripper.new(source).tap(&:parse).error? end def valid?(source) !invalid?(source) end b = Benchmark.ips do |b| b.report("empty") { @array.all? {|x| x.empty? || x.hidden? } } b.report("ripper") { valid?(@array) } b.compare! end ``` Gives us: ``` Warming up -------------------------------------- empty 59.015k i/100ms ripper 2.064k i/100ms Calculating ------------------------------------- empty 594.981k (± 3.6%) i/s - 3.010M in 5.065759s ripper 20.694k (± 3.3%) i/s - 105.264k in 5.092576s Comparison: empty: 594981.2 i/s ripper: 20693.8 i/s - 28.75x (± 0.00) slower ``` This reduces overall test run time from ~5.3 seconds to 5.1 seconds. It's not a huge win, but it is something. We could try to also use this same strategy in the CodeFrontier where we're likely doing a bunch of redundant parsing (i.e. adding empty lines won't ever change the parse results, but it is expensive to re-parse everything.
Before: ``` $ git co 903d434 $ time be rspec ./spec/ --format=f bundle exec rspec ./spec/ --format=f 4.52s user 1.38s system 97% cpu 6.032 total ``` After: ``` $ git co - $ time be rspec ./spec/ --format=f bundle exec rspec ./spec/ --format=f 4.07s user 1.39s system 97% cpu 5.593 total ``` Benchmarking just that test: Before: ``` 0.899206 0.011891 0.911097 ( 0.911433) ``` After: ``` 0.485635 0.008435 0.494070 ( 0.495051) ```
8314e15 to
cabce76
Compare
Before: ``` 0.518002 0.007275 0.525277 ( 0.526825) ``` After: ``` 0.277188 0.004640 0.281828 ( 0.281940) ``` Instead of checking if the frontier holds all syntax errors any time there's an `invalid?` block in the frontier, only check when an invalid block is added. This prevents redundant checks as invalid blocks that are already on the frontier won't change and therefore won't give different results. This also prevents a pathological performance case where there are two syntax errors, one at the furthest indentation and one at the lowest indentation.
cabce76 to
9bfed9b
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The "banner" is now based on lexical analysis rather than parser regex (fix #68, fix #87)
As described in #68 (comment). This implements a method of checking the source code for logically missing component pairs:
It seems to be much more robust than using a regex on the parser errors. It also seems to provide more confident results than the parser errors.
Here's example output:
For cases where we are not confident about why code is invalid, we still fall back on the parser error messages. That allows this functionality to act as a progressive enhancement rather than a wholesale replacement.
We may be able to specialize other checks in the future as well. The most common one for me that comes up is missing a trailing comma in a hash/array/method-call:
Fix bug causing poor results (fix #95, fix #88)
As listed in #95 there are other cases where we have valid code nested in a way we didn't previously handle (via
CleanDocument). The specific case is:As it's being through being validated/hidden, the inside is removed first, which generates an invalid code block.
This fix is to re-check all code in a block even if it is "hidden".
This conveniently drastically improves the results of #88 beyond just that small case, the whole output becomes very focused
close #95
close #88
CodeBlock Performance
28x faster empty line checks on CodeBlock
Benchmark:
Gives us:
This reduces overall test run time from ~5.3 seconds to 5.1 seconds. It's not a huge win, but it is something.
We could try to also use this same strategy in the CodeFrontier where we're likely doing a bunch of redundant parsing (i.e. adding empty lines won't ever change the parse results, but it is expensive to re-parse everything.
Frontier check performance
The error case with the ruby buildpack source code was added for "Fix bug causing poor results" which is a pathological case. The syntax error sits at the lowest level of indentation and the file is over a thousand lines long. This ran with no problem on my machine, but when running on shared hardware with circleci it hit the timeout causing tests to fail.
That failure prompted me to apply similar logic from the CodeBlock performance improvement to the CodeFrontier because I know that its where a bulk of time is spent. I had the insight that we cannot ever detect a fixed document unless we also have one or more blocks that are "invalid". With that info I was able to write a short-circuit check that skips Ripper when there are no invalid blocks on the frontier.
That shaved off a whole second from the test suite 🏎!!