From eb693e61f8b6d695dab95ca5cbd804f7c8223143 Mon Sep 17 00:00:00 2001 From: schneems Date: Thu, 5 Nov 2020 11:11:22 -0600 Subject: [PATCH 1/4] Don't gate adding to the frontier Previously there was a bug in the system in which the frontier couldn't find the syntax error because it had not been introduced to the frontier yet, however it wasn't introducing new blocks anymore. The frontier already sorts itself so it's fine if we add a new block to the frontier each time. This helps us guarantee that every line is eventually added to the search space. Since an empty string is parsable, it guarantees that we find a solution even if that solution says "the syntax error is somewhere in this entire document. --- lib/syntax_error_search/code_frontier.rb | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/lib/syntax_error_search/code_frontier.rb b/lib/syntax_error_search/code_frontier.rb index 7cd6670..f026cc8 100644 --- a/lib/syntax_error_search/code_frontier.rb +++ b/lib/syntax_error_search/code_frontier.rb @@ -35,9 +35,7 @@ def holds_all_syntax_errors?(block_array = @frontier) def pop return nil if empty? - if generate_new_block? - self << next_block - end + self << next_block unless @indent_hash.empty? return @frontier.pop end From e8a26a16fa77037e7d52ba0e40082631a611cb12 Mon Sep 17 00:00:00 2001 From: schneems Date: Thu, 5 Nov 2020 11:12:02 -0600 Subject: [PATCH 2/4] Sort code blocks so that the first one comes first. --- lib/syntax_error_search/code_search.rb | 1 + 1 file changed, 1 insertion(+) diff --git a/lib/syntax_error_search/code_search.rb b/lib/syntax_error_search/code_search.rb index a08a785..cb39430 100644 --- a/lib/syntax_error_search/code_search.rb +++ b/lib/syntax_error_search/code_search.rb @@ -24,6 +24,7 @@ def call end @invalid_blocks.concat(frontier.detect_invalid_blocks ) + @invalid_blocks.sort_by! {|block| block.starts_at } self end end From 1ca53ceedec9bc395717c69de6c18d51480ad419 Mon Sep 17 00:00:00 2001 From: schneems Date: Thu, 5 Nov 2020 11:12:24 -0600 Subject: [PATCH 3/4] Start tracking meh searched results --- spec/unit/code_search_spec.rb | 90 ++++++++++++++++++++++------------- 1 file changed, 57 insertions(+), 33 deletions(-) diff --git a/spec/unit/code_search_spec.rb b/spec/unit/code_search_spec.rb index 4d58b40..4d3c779 100644 --- a/spec/unit/code_search_spec.rb +++ b/spec/unit/code_search_spec.rb @@ -1,40 +1,64 @@ - require_relative "../spec_helper.rb" module SyntaxErrorSearch RSpec.describe CodeSearch do - it "does not go into an infinite loop" do - skip("infinite loop") - search = CodeSearch.new(<<~EOM) - Foo.call - def foo - puts "lol" - puts "lol" - end - end - EOM - search.call - - expect(search.invalid_blocks.join).to eq(<<~EOM) - end - EOM - end - - it "handles mis-matched-indentation-but-maybe-not-so-well" do - skip("wip") - search = CodeSearch.new(<<~EOM) - Foo.call - def foo - puts "lol" - puts "lol" - end - end - EOM - search.call - - expect(search.invalid_blocks.join).to eq(<<~EOM) - end - EOM + # For code that's not perfectly formatted, we ideally want to do our best + # These examples represent the results that exist today, but I would like to improve upon them + describe "needs improvement" do + describe "mis-matched-indentation" do + it "stacked ends " do + search = CodeSearch.new(<<~EOM) + Foo.call + def foo + puts "lol" + puts "lol" + end + end + EOM + search.call + + # Does not include the line with the error Foo.call + expect(search.invalid_blocks.join).to eq(<<~EOM) + def foo + end + end + EOM + end + + it "extra space before end" do + search = CodeSearch.new(<<~EOM) + Foo.call + def foo + puts "lol" + puts "lol" + end + end + EOM + search.call + + # Does not include the line with the error Foo.call + expect(search.invalid_blocks.join).to eq(<<~EOM.indent(3)) + end + EOM + end + + it "missing space before end" do + search = CodeSearch.new(<<~EOM) + Foo.call + def foo + puts "lol" + puts "lol" + end + end + EOM + search.call + + # Does not include the line with the error Foo.call + expect(search.invalid_blocks.join).to eq(<<~EOM) + end + EOM + end + end end it "returns syntax error in outer block without inner block" do From 46d8150d1c38424fa71083a60fbb45823826fb18 Mon Sep 17 00:00:00 2001 From: schneems Date: Thu, 5 Nov 2020 11:12:48 -0600 Subject: [PATCH 4/4] Update readme --- README.md | 54 +++++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 45 insertions(+), 9 deletions(-) diff --git a/README.md b/README.md index 3a2a31e..040f1ee 100644 --- a/README.md +++ b/README.md @@ -34,7 +34,7 @@ Or install it yourself as: ## What does it do? -When your code triggers a SyntaxError due to an "expecting end-of-input" in a file, this library fires to narrow down your search to the most likely offending locations. +When your code triggers a SyntaxError due to an "unexpected `end'" in a file, this library fires to narrow down your search to the most likely offending locations. ## Sounds cool, but why isn't this baked into Ruby directly? @@ -45,20 +45,56 @@ I would love to get something like this directly in Ruby, but I first need to pr ## How does it detect syntax error locations? -Source code with a syntax error in it can be thought of valid code with one or more invalid chunks in it. With this in mind we can "search" for both invalid and valid chunks of code. This library uses a parser to tell if a given chunk of code is valid in which case it's certainly not the cause of our problem. If it's invalid, then we can test to see if removing that chunk from our file would make the whole thing valid. When that happens, we've narrowed down our search. But...things aren't always so easy. +We know that source code that does not contain a syntax error can be parsed. We also know that code with a syntax error contains both valid code and invalid code. If you remove the invalid code, then we can programatically determine that the code we removed contained a syntax error. We can do this detection by generating small code blocks and searching for which blocks need to be removed to generate valid source code. + +Since there can be multiple syntax errors in a document it's not good enough to check individual code blocks, we've got to check multiple at the same time. We will keep creating and adding new blocks to our search until we detect that our "frontier" (which contains all of our blocks) contains the syntax error. After this, we can stop our search and instead focus on filtering to find the smallest subset of blocks that contain the syntax error. + +## How is source code broken up into smaller blocks? By definition source code with a syntax error in it cannot be parsed, so we have to guess how to chunk up the file into smaller pieces. Once we've split up the file we can safely rule out or zoom into a specific piece of code to determine the location of the syntax error. This libary uses indentation and empty lines to make guesses about what might be a "block" of code. Once we've got a chunk of code, we can test it. -- If the code parses, it cannot be the cause of our syntax error. We can remove it from our search -- If the code does not parse, it may be the cause of the error, but we also might have made a bad guess in splitting up the source - - If we remove that chunk of code from the document and that allows the whole thing to parse, it means the syntax error was for sure in that location. - - Otherwise, it could mean that either there are multiple syntax errors or that we have a bad guess and need to expand our search. +At the end of the day we can't say where the syntax error is FOR SURE, but we can get pretty close. It sounds simple when spelled out like this, but it's a very complicated problem. Even when code is not correctly indented/formatted we can still likely tell you where to start searching even if we can't point at the exact problem line or location. + +## Complicating concerns + +The biggest issue with searching for syntax errors stemming from "unexpected end" is that while the `end` in the code triggered the error, the problem actually came from somewhere else. Effectively these syntax errors always involve 2 or more lines of code, but one of those lines (without the end) may be syntatically valid on its own. For example: + +``` +1 Foo.call +2 +3 puts "lol +4 end +``` + +Here there's a missing `do` after `Foo.call` however `Foo.call` by itself is perfectly valid ruby code syntax. We don't find the error until we remove the `end` even though the problem is caused on the first line. This means that if our clode blocks aren't sliced totally correctly the error output might just point at: + +``` +4 end +``` + +Instead of: + +``` +1 Foo.call +4 end +``` + +Here's a similar issue, but with more `end` lines in the code to demonstrate. The same line of code causes the issue: + +``` +1 it "foo" do +2 Foo.call +3 +4 puts "lol +5 end +6 end +``` -At the end of the day we can't say where the syntax error is FOR SURE, but we can get pretty close. It sounds simple when spelled out like this, but it's a very complicated problem. +In this example we could make this code valid by either the end on line 5 or 6. As far as the program is concerned it's effectively got one too many ends and it won't care which you remove. The "correct" line to remove would be for the inner block, but it's hard to know this programatically. Whitespace can help guide us, but it's still a guess. -This one person on twitter told me it's "not possible". +One of the biggest challenges then is not finding code that can be removed to make the program syntatically correct (just remove an `end` and it works) but to also provide a reasonable guess as to the "pair" line that would have otherwise required an end (such as a `do` or a `def`). -## How does this gem know when a syntax error occured? +## How does this gem know when a syntax error occured in my code? While I wish you hadn't asked: If you must know, we're monkey-patching require. It sounds scary, but bootsnap does essentially the same thing and we're way less invasive.