From a21f4011d697cc50126409098bf60d91b526b7e0 Mon Sep 17 00:00:00 2001 From: schneems Date: Wed, 12 Jan 2022 16:59:04 -0600 Subject: [PATCH] Add BalanceHeuristicExpand, 1.7x faster search. TLDR; I've been trying to optimize a "worst-case" perf scenario (9,000 line file) to get under my 1-second threshold. When I started, it was at ~2 seconds. After this last optimization, it's well under the threshold! ``` Before [main]: 1.22 seconds After [this commit]: 0.67386 seconds ``` > Cumulatively, this is 2.6x faster than 3.1.1 and 47x faster than 1.2.0. ## Profiling before/after Expansion before (called 2,291 times, 42.04% of overall time) Parse before (called 3,315 for 31.16% of overall time) ![](https://www.dropbox.com/s/brw7ix5b0mhwy1z/Screen%20Shot%202022-01-14%20at%208.54.31%20PM.png?raw=1) ![](https://www.dropbox.com/s/8mx20auvod5wb8t/Screen%20Shot%202022-01-15%20at%201.10.41%20PM.png?raw=1) Expansion after (called 654 times, 29.42% of overall time) Parse after (called 1,507 times for 29.35% of overall time) ![](https://www.dropbox.com/s/3rmtpfk315ge7e6/Screen%20Shot%202022-01-14%20at%208.55.45%20PM.png?raw=1) > Note that parsing happens within the expansion method call, so while it seems "cheaper" to parse than expand based on this profiling data, it's the parsing that is making expansion slow. ## Goal Make the algorithm faster, so it doesn't timeout even when given a file with 9,000 lines. ## Strategy (background) There are two general strategies for improving dead_end perf: - Reduce/remove calls to Ripper (syntax error detection), as it is slow. For example, not calling Ripper if all code lines have been previously "hidden" (indicating valid code). - Improve computational complexity for non-ripper operations. For example, using a priority heap over sorting an array We call Ripper for 2 cases. Both for individual code blocks to see if it contains a syntax error. We also call Ripper later on the whole document to see if that particular syntax error was making the document unparsable. Ripper is slower to parse more lines, so we only re-parse the entire document when we detect a new invalid block as a prior optimization. If we can build "better" valid blocks, then we call Ripper fewer times on the overall document. If we can build larger blocks, we reduce the overall number of iterations for the algorithm. This property reduces Ripper calls and speeds up general "computational complexity" as there are N fewer operations to perform overall. ## Approach This approach builds on the concept that dead_end is a uniform cost search by adding a "heuristic" (think a-star search) when building blocks. At a high level, a heuristic is a quality measure that can also be incomplete. For a great explanation, see https://www.redblobgames.com/pathfinding/a-star/introduction.html. What heuristic can we add? We know that if the code has an unbalanced pair of special characters, it cannot be valid. For example, this code `{ hello: "world"` is unbalanced. It is missing a `}`. It contains a syntax error due to this imbalance. In the dead_end code, we can count known special pairs using `Ripper.lex` output (which we already have). Here are the currently tracked pairs: - `{}` - `()` - `[]` - keyword/end This information can be used as a heuristic. Code with unbalanced pairs always contains a syntax error, but balanced code may have a different (harder to detect) syntax error. The code's balance hints at how we should expand an existing code block. For example, the code `{ hello: "world"` must expand towards the right/down to be valid. This code would be known as "left-leaning" as it is heavier on the left side. Rather than searching in an arbitrary direction, the heuristic determines the most sensible direction to expand. Previously, we expanded blocks by scanning up and down, counting keyword/end pairs (outside of the block), and looking at indentation. This process was adequate but required that we take small steps and produce small blocks. It also has no concept of if the code it holds is syntactically valid or not until a full Ripper parse is performed. That combination means it can produce invalid code blocks (which can be desirable when hunting for invalid code). But when we want blocks to be valid, we can make more efficient moves. ## Implementation: LexDiffPair helper class I introduced a new class, `LexDiffPair`, to accommodate a heuristic expansion. A `LexDiffPair` can be in one of several states: - Balanced (:equal) - Leaning left (:left) - Leaning right (:right) - Leaning both (:both) > An example of code line leaning both ways would be `) do |x|`. Here the `)` is leaning right (needs to expand up) while the `do` is leaning left (needs to expand down). Each code line generates its own `LexDiffPair`. Internally the information is stored as a diff of a count of each of the above pairs. A positive number indicates left-leaning, a negative number indicates right-leaning, a zero is balanced. This format allows for fast concatenation and comparison. ## BalanceHeuristicExpand When a block is passed to `BalanceHeuristicExpand`, a `LexPairDiff` is created from all of its lines. The code's balance is then used to determine which direction to expand. If the code is leaning left or right, the main goal is to get it back into balance. Continue to expand it until balance is achieved. If the code is already balanced, try to expand it while maintaining balance. For example, add already balanced code and any above/below pair of lines that cancel out. Finally, intentionally knock it off-balance by expanding up and then recurse the process (to ideally re-balance it and then exit). If the code is leaning both ways, try to grab any extra "free" (already balanced) lines. Then expand up and down since it must expand both ways to become valid. As this process is repeated, it should eventually find a match in one direction and then be leaning left or right, which will expand faster on the next iteration (after it is put back on the frontier and popped again later). Example: An invalid code block might come in like this: ``` start_line: keyword.location.start_line, start_char: keyword.location.end_char + 1, end_line: last_node.location.end_line, end_char: last_node.location.end_char ``` And it could expand it to: ``` if exceptions || variable RescueEx.new( exceptions: exceptions, variable: variable, location: Location.new( start_line: keyword.location.start_line, start_char: keyword.location.end_char + 1, end_line: last_node.location.end_line, end_char: last_node.location.end_char ) ) end ``` > Note it would expand it quite a bit more, but I'm abbreviating here. Here's the complete process across several expansions https://gist.github.com/schneems/e62216b610a36a81a98d9d8146b0611a ## When to use heuristic expand After experimentation, the best place to use this new expansion technique was when an existing invalid block comes off of the frontier. This algorithm tends to correct such blocks and balance them out. When a block is "corrected" in this way, it reduces the overall number of times the document must be passed to Ripper (extremely slow). Also, since larger blocks reduce the overall iteration, we try to expand several times (while preserving balance) and take the largest valid expansion. We use the original indentation-based expansion for code blocks that are already valid. The downside of using a heuristic that preserves balance is that ultimately we want the algorithm to generate invalid blocks. The original expansion can produce invalid expansions, which is useful. There is a separate process for adding unvisited lines to the frontier (rather than expanding existing blocks). Unvisited lines are not a good candidate for heuristic expansion as it works a little too well. If we only add "unbalanced" code in an added block, we lose some of the context we desire (see comments for more details in `parse_blocks_from_indent_line`). ## Concerns One concern with this implementation is that calling the heuristic expansion three times was the only way to produce valid results. I'm not sure why. It might be an undiscovered property of the system, or perhaps all of my code examples to date are somehow biased in a specific way. The way to get more information is to put it out into the world and get feedback. Another concern is that there are now two expansion classes. Or three if you count `parse_blocks_from_indent_line`. It's not always clear why you would choose one over another except that "it provides the best outcome". It might be possible to simplify some of this logic or remove or combine competing expansion methods. Hopefully, patterns will emerge pointing to opportunities to guide that usage. --- CHANGELOG.md | 4 + Gemfile.lock | 2 +- lib/dead_end/api.rb | 6 +- lib/dead_end/balance_heuristic_expand.rb | 281 ++++++++++++++++++ lib/dead_end/code_frontier.rb | 2 + lib/dead_end/code_line.rb | 12 +- lib/dead_end/code_search.rb | 41 ++- ...block_expand.rb => indent_block_expand.rb} | 6 +- lib/dead_end/left_right_lex_count.rb | 32 ++ lib/dead_end/lex_pair_diff.rb | 104 +++++++ lib/dead_end/parse_blocks_from_indent_line.rb | 17 ++ spec/integration/dead_end_spec.rb | 6 +- spec/integration/exe_cli_spec.rb | 2 +- spec/integration/ruby_command_line_spec.rb | 4 +- spec/unit/balance_heuristic_expand_spec.rb | 230 ++++++++++++++ ...nd_spec.rb => indent_block_expand_spec.rb} | 14 +- spec/unit/left_right_lex_count_spec.rb | 8 + spec/unit/lex_pair_diff_spec.rb | 43 +++ 18 files changed, 791 insertions(+), 23 deletions(-) create mode 100644 lib/dead_end/balance_heuristic_expand.rb rename lib/dead_end/{block_expand.rb => indent_block_expand.rb} (92%) create mode 100644 lib/dead_end/lex_pair_diff.rb create mode 100644 spec/unit/balance_heuristic_expand_spec.rb rename spec/unit/{block_expand_spec.rb => indent_block_expand_spec.rb} (89%) create mode 100644 spec/unit/left_right_lex_count_spec.rb create mode 100644 spec/unit/lex_pair_diff_spec.rb diff --git a/CHANGELOG.md b/CHANGELOG.md index a77e2b4..8bc8875 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,9 @@ ## HEAD (unreleased) +## 4.0.0 + +- Introduce experimental algorithm for block expansion, 1.7x faster overall search (https://github.com/zombocom/dead_end/pull/129) + ## 3.1.1 - Fix case where Ripper lexing identified incorrect code as a keyword (https://github.com/zombocom/dead_end/pull/122) diff --git a/Gemfile.lock b/Gemfile.lock index a9ac0d4..590c561 100644 --- a/Gemfile.lock +++ b/Gemfile.lock @@ -45,7 +45,7 @@ GEM rubocop-ast (>= 0.4.0) ruby-prof (1.4.3) ruby-progressbar (1.11.0) - stackprof (0.2.16) + stackprof (0.2.17) standard (1.3.0) rubocop (= 1.20.0) rubocop-performance (= 1.11.5) diff --git a/lib/dead_end/api.rb b/lib/dead_end/api.rb index 0a52d28..f497057 100644 --- a/lib/dead_end/api.rb +++ b/lib/dead_end/api.rb @@ -16,7 +16,7 @@ class Error < StandardError; end # DeadEnd.handle_error [Public] # - # Takes a `SyntaxError`` exception, uses the + # Takes a `SyntaxError` exception, uses the # error message to locate the file. Then the file # will be analyzed to find the location of the syntax # error and emit that location to stderr. @@ -187,12 +187,14 @@ def self.valid?(source) require_relative "lex_all" require_relative "code_line" require_relative "code_block" -require_relative "block_expand" +require_relative "lex_pair_diff" require_relative "ripper_errors" require_relative "priority_queue" require_relative "unvisited_lines" require_relative "around_block_scan" +require_relative "indent_block_expand" require_relative "priority_engulf_queue" require_relative "pathname_from_message" require_relative "display_invalid_blocks" +require_relative "balance_heuristic_expand" require_relative "parse_blocks_from_indent_line" diff --git a/lib/dead_end/balance_heuristic_expand.rb b/lib/dead_end/balance_heuristic_expand.rb new file mode 100644 index 0000000..b1fe96b --- /dev/null +++ b/lib/dead_end/balance_heuristic_expand.rb @@ -0,0 +1,281 @@ +# frozen_string_literal: true + +module DeadEnd + # Expand code based on lexical heuristic + # + # Code that has unbalanced pairs cannot be valid + # i.e. `{` must always be matched with a `}`. + # + # This expansion class exploits that knowledge to + # expand a logical block towards equal pairs. + # + # For example: if code is missing a `]` it cannot + # be on a line above, so it must expand down + # + # This heuristic allows us to make larger and more + # accurate expansions which means fewer invalid + # blocks to check which means overall faster search. + # + # This class depends on another class LexPairDiff can be + # accesssed per-line. It holds the delta of tracked directional + # pairs: curly brackets, square brackets, parens, and kw/end + # with positive count (leaning left), 0 (balanced), or negative + # count (leaning right). + # + # With this lexical diff information we can look around a given + # block and move with inteligently. For instance if the current + # block has a miss matched `end` and the line above it holds + # `def foo` then the block will be expanded up to capture that line. + # + # An unbalanced block can never be valid (this provides info to + # the overall search). However a balanced block may contain other syntax + # error and so must be re-checked using Ripper (slow). + # + # Example + # + # lines = CodeLines.from_source(<~'EOM') + # if bark? + # end + # EOM + # block = CodeBlock.new(lines: lines[0]) + # + # expand = BalanceHeuristicExpand.new( + # code_lines: lines, + # block: block + # ) + # expand.direction # => :down + # expand.call + # expand.direction # => :equal + # + # expect(expand.to_s).to eq(lines.join) + class BalanceHeuristicExpand + attr_reader :start_index, :end_index + + def initialize(code_lines:, block:) + @block = block + @iterations = 0 + @code_lines = code_lines + @last_index = @code_lines.length - 1 + @max_iterations = @code_lines.length * 2 + @start_index = block.lines.first.index + @end_index = block.lines.last.index + @last_equal_range = nil + + set_lex_diff_from(block) + end + + private def set_lex_diff_from(block) + @lex_diff = LexPairDiff.new( + curly: 0, + square: 0, + parens: 0, + kw_end: 0 + ) + block.lines.each do |line| + @lex_diff.concat(line.lex_diff) + end + end + + # Converts the searched lines into a source string + def to_s + @code_lines[start_index..end_index].join + end + + # Converts the searched lines into a code block + def to_block + CodeBlock.new(lines: @code_lines[start_index..end_index]) + end + + # Returns true if all lines are equal + def balanced? + @lex_diff.balanced? + end + + # Returns false if captured lines are "leaning" + # one direction + def unbalanced? + !balanced? + end + + # Main search entrypoint + # + # Essentially a state machine, determine the leaning + # of the given block, then figure out how to either + # move it towards balanced, or expand it while keeping + # it balanced. + def call + case direction + when :up + # the goal is to become balanced + while keep_going? && direction == :up && try_expand_up + end + when :down + # the goal is to become balanced + while keep_going? && direction == :down && try_expand_down + end + when :equal + while keep_going? && grab_equal_or { + # Cannot create a balanced expansion, choose to be unbalanced + try_expand_up + } + end + + call # Recurse + when :both + while keep_going? && grab_equal_or { + try_expand_up + try_expand_down + } + end + when :stop + return self + end + + self + end + + # Convert a lex diff to a direction to search + # + # leaning left -> down + # leaning right -> up + # + def direction + leaning = @lex_diff.leaning + case leaning + when :left # go down + stop_bottom? ? :stop : :down + when :right # go up + stop_top? ? :stop : :up + when :equal, :both + if stop_top? && stop_bottom? + :stop + elsif stop_top? && !stop_bottom? + :down + elsif !stop_top? && stop_bottom? + :up + else + leaning + end + end + end + + # Limit rspec failure output + def inspect + "#" + end + + # Upper bound on iterations + private def keep_going? + if @iterations < @max_iterations + @iterations += 1 + true + else + warn <<~EOM + DeadEnd: Internal problem detected, possible infinite loop in #{self.class} + + Please open a ticket with the following information. Max: #{@max_iterations}, actual: #{@iterations} + + Original block: + + ``` + #{@block.lines.map(&:original).join}``` + + Stuck at: + + ``` + #{to_block.lines.map(&:original).join}``` + EOM + + false + end + end + + # Attempt to grab "free" lines + # + # if either above, below or both are + # balanced, take them, return true. + # + # If above is leaning left and below + # is leaning right and they cancel out + # take them, return true. + # + # If we couldn't grab any balanced lines + # then call the block and return false. + private def grab_equal_or + did_expand = false + if above&.balanced? + did_expand = true + try_expand_up + end + + if below&.balanced? + did_expand = true + try_expand_down + end + + return true if did_expand + + if make_balanced_from_up_down? + try_expand_up + try_expand_down + true + else + yield + false + end + end + + # If up is leaning left and down is leaning right + # they might cancel out, to make a complete + # and balanced block + private def make_balanced_from_up_down? + return false if above.nil? || below.nil? + return false if above.lex_diff.leaning != :left + return false if below.lex_diff.leaning != :right + + @lex_diff.dup.concat(above.lex_diff).concat(below.lex_diff).balanced? + end + + # The line above the current location + private def above + @code_lines[@start_index - 1] unless stop_top? + end + + # The line below the current location + private def below + @code_lines[@end_index + 1] unless stop_bottom? + end + + # Mutates the start index and applies the new line's + # lex diff + private def expand_up + @start_index -= 1 + @lex_diff.concat(@code_lines[@start_index].lex_diff) + end + + private def try_expand_up + stop_top? ? false : expand_up + end + + private def try_expand_down + stop_bottom? ? false : expand_down + end + + # Mutates the end index and applies the new line's + # lex diff + private def expand_down + @end_index += 1 + @lex_diff.concat(@code_lines[@end_index].lex_diff) + end + + # Returns true when we can no longer expand up + private def stop_top? + @start_index == 0 + end + + # Returns true when we can no longer expand down + private def stop_bottom? + @end_index == @last_index + end + end +end diff --git a/lib/dead_end/code_frontier.rb b/lib/dead_end/code_frontier.rb index f9e6920..71af0b5 100644 --- a/lib/dead_end/code_frontier.rb +++ b/lib/dead_end/code_frontier.rb @@ -50,6 +50,8 @@ module DeadEnd # CodeFrontier#detect_invalid_blocks # class CodeFrontier + attr_reader :queue + def initialize(code_lines:, unvisited: UnvisitedLines.new(code_lines: code_lines)) @code_lines = code_lines @unvisited = unvisited diff --git a/lib/dead_end/code_line.rb b/lib/dead_end/code_line.rb index 6520518..43cead8 100644 --- a/lib/dead_end/code_line.rb +++ b/lib/dead_end/code_line.rb @@ -38,7 +38,7 @@ def self.from_source(source, lines: nil) end end - attr_reader :line, :index, :lex, :line_number, :indent + attr_reader :line, :index, :lex, :line_number, :indent, :lex_diff def initialize(line:, index:, lex:) @lex = lex @line = line @@ -57,6 +57,16 @@ def initialize(line:, index:, lex:) end set_kw_end + + @lex_diff = LexPairDiff.from_lex( + lex: @lex, + is_kw: is_kw?, + is_end: is_end? + ) + end + + def balanced? + @lex_diff.balanced? end # Used for stable sort via indentation level diff --git a/lib/dead_end/code_search.rb b/lib/dead_end/code_search.rb index 19b5bc8..12072c2 100644 --- a/lib/dead_end/code_search.rb +++ b/lib/dead_end/code_search.rb @@ -15,7 +15,7 @@ module DeadEnd # # - CodeFrontier (Holds information for generating blocks and determining if we can stop searching) # - ParseBlocksFromLine (Creates blocks into the frontier) - # - BlockExpand (Expands existing blocks to search more code) + # - IndentBlockExpand (Expands existing blocks to search more code) # # ## Syntax error detection # @@ -61,7 +61,7 @@ def initialize(source, record_dir: DEFAULT_VALUE) @code_lines = CleanDocument.new(source: source).call.lines @frontier = CodeFrontier.new(code_lines: @code_lines) - @block_expand = BlockExpand.new(code_lines: @code_lines) + @indent_block_expand = IndentBlockExpand.new(code_lines: @code_lines) @parse_blocks_from_indent_line = ParseBlocksFromIndentLine.new(code_lines: @code_lines) end @@ -88,6 +88,7 @@ def record(block:, name: "record") end end + # Add a block back onto the frontier def push(block, name:) record(block: block, name: name) @@ -100,6 +101,10 @@ def push(block, name:) def create_blocks_from_untracked_lines max_indent = frontier.next_indent_line&.indent + # Expand an unvisited line into a block and put it on the frontier + # This registers all lines and removes "univisted" lines from the + # frontier. The process continues until all unvisited lines at a given + # indentation are added while (line = frontier.next_indent_line) && (line.indent == max_indent) @parse_blocks_from_indent_line.each_neighbor_block(frontier.next_indent_line) do |block| push(block, name: "add") @@ -115,7 +120,37 @@ def expand_existing record(block: block, name: "before-expand") - block = @block_expand.call(block) + if block.invalid? + # When a block is invalid the BalanceHeuristicExpand class tends to make it valid + # again. This property reduces the number of Ripper calls to + # `frontier.holds_all_syntax_errors?`. + # + # This class tends to produce larger expansions meaning fewer + # total expansion steps. + blocks = [] + expand = BalanceHeuristicExpand.new(code_lines: code_lines, block: block) + + # Expand magic number 3 times + # + # There's likely a hidden property that explains why. I + # guessed it accidentally and it works really well. Reducing or increasing + # call count produces awful results. I'm not entirely sure why. + blocks << expand.call.to_block + blocks << expand.to_block if expand.call.balanced? + blocks << expand.to_block if expand.call.balanced? + + # Take the largest generated, valid block + block = blocks.reverse_each.detect(&:valid?) || blocks.first + else + # The original block expansion process works well when it starts + # with good i.e. "valid" input. Unlike BalanceHeuristicExpand, it does not self-correct + # towards a valid state. This naive property is desireable since + # we want to generate invalid code blocks (that make logical sense) + # or the algorithm will tend towards matching incorrect pairs + # at the expense of an incorrect result. + block = @indent_block_expand.call(block) + end + push(block, name: "expand") end diff --git a/lib/dead_end/block_expand.rb b/lib/dead_end/indent_block_expand.rb similarity index 92% rename from lib/dead_end/block_expand.rb rename to lib/dead_end/indent_block_expand.rb index de47e6a..ae7d61b 100644 --- a/lib/dead_end/block_expand.rb +++ b/lib/dead_end/indent_block_expand.rb @@ -10,7 +10,7 @@ module DeadEnd # puts "wow" # end # - # block = BlockExpand.new(code_lines: code_lines) + # block = IndentBlockExpand.new(code_lines: code_lines) # .call(CodeBlock.new(lines: code_lines[1])) # # puts block.to_s @@ -21,7 +21,7 @@ module DeadEnd # Once a code block has captured everything at a given indentation level # then it will expand to capture surrounding indentation. # - # block = BlockExpand.new(code_lines: code_lines) + # block = IndentBlockExpand.new(code_lines: code_lines) # .call(block) # # block.to_s @@ -30,7 +30,7 @@ module DeadEnd # puts "wow" # end # - class BlockExpand + class IndentBlockExpand def initialize(code_lines:) @code_lines = code_lines end diff --git a/lib/dead_end/left_right_lex_count.rb b/lib/dead_end/left_right_lex_count.rb index 3b71ade..6ddf731 100644 --- a/lib/dead_end/left_right_lex_count.rb +++ b/lib/dead_end/left_right_lex_count.rb @@ -22,6 +22,8 @@ module DeadEnd # left_right.missing.first # # => "}" class LeftRightLexCount + attr_reader :kw_count, :end_count + def initialize @kw_count = 0 @end_count = 0 @@ -37,6 +39,16 @@ def initialize } end + def concat(other) + @count_for_char.each do |(k, _)| + @count_for_char[k] += other[k] + end + + @kw_count += other.kw_count + @end_count += other.end_count + self + end + def count_kw @kw_count += 1 end @@ -45,6 +57,14 @@ def count_end @end_count += 1 end + def count_lines(lines) + lines.each do |line| + line.lex.each do |lex| + count_lex(lex) + end + end + end + # Count source code characters # # Example: @@ -121,6 +141,18 @@ def missing "(" => ")" }.freeze + def curly_diff + @count_for_char["{"] - @count_for_char["}"] + end + + def square_diff + @count_for_char["["] - @count_for_char["]"] + end + + def parens_diff + @count_for_char["("] - @count_for_char[")"] + end + # Opening characters like `{` need closing characters # like `}`. # # When a mis-match count is detected, suggest the diff --git a/lib/dead_end/lex_pair_diff.rb b/lib/dead_end/lex_pair_diff.rb new file mode 100644 index 0000000..bc02510 --- /dev/null +++ b/lib/dead_end/lex_pair_diff.rb @@ -0,0 +1,104 @@ +module DeadEnd + # Holds a diff of lexical pairs + # + # Example: + # + # diff = LexPairDiff.from_lex(LexAll.new("}"), is_kw: false, is_end: false) + # diff.curly # => 1 + # diff.balanced? # => false + # diff.leaning # => :right + # + # two = LexPairDiff.from_lex(LexAll.new("{"), is_kw: false, is_end: false) + # two.curly => -1 + # + # diff.concat(two) + # diff.curly # => 0 + # diff.balanced? # => true + # diff.leaning # => :equal + # + # Internally a pair is stored as a single value + # positive indicates more left elements, negative + # indicates more right elements, and zero indicates + # balanced pairs. + class LexPairDiff + # Convienece constructor + def self.from_lex(lex:, is_kw:, is_end:) + left_right = LeftRightLexCount.new + lex.each do |l| + left_right.count_lex(l) + end + + kw_end = 0 + kw_end += 1 if is_kw + kw_end -= 1 if is_end + + LexPairDiff.new( + curly: left_right.curly_diff, + square: left_right.square_diff, + parens: left_right.parens_diff, + kw_end: kw_end + ) + end + + attr_reader :curly, :square, :parens, :kw_end + + def initialize(curly:, square:, parens:, kw_end:) + @curly = curly + @square = square + @parens = parens + @kw_end = kw_end + end + + def each + yield @curly + yield @square + yield @parens + yield @kw_end + end + + # Returns :left if all there are more unmatched pairs to + # left i.e. "{" + # Returns :right if all there are more unmatched pairs to + # left i.e. "}" + # + # If pairs are unmatched like "(]" returns `:both` + # + # If everything is balanced returns :equal + def leaning + dir = 0 + each do |v| + case v <=> 0 + when 1 + return :both if dir == -1 + dir = 1 + when -1 + return :both if dir == 1 + dir = -1 + end + end + + case dir + when 1 + :left + when 0 + :equal + when -1 + :right + end + end + + # Returns true if all pairs are equal + def balanced? + @curly == 0 && @square == 0 && @parens == 0 && @kw_end == 0 + end + + # Mutates the existing diff with contents of another diff + def concat(other) + @curly += other.curly + @square += other.square + @parens += other.parens + @kw_end += other.kw_end + self + end + end +end diff --git a/lib/dead_end/parse_blocks_from_indent_line.rb b/lib/dead_end/parse_blocks_from_indent_line.rb index 11fa2b8..ec2dc98 100644 --- a/lib/dead_end/parse_blocks_from_indent_line.rb +++ b/lib/dead_end/parse_blocks_from_indent_line.rb @@ -26,6 +26,10 @@ module DeadEnd # # At this point it has no where else to expand, and it will yield this inner # code as a block + # + # The other major concern is eliminating all lines that do not contain + # an end. In the above example, if we started from the top and moved + # down we might accidentally eliminate everything but `end` class ParseBlocksFromIndentLine attr_reader :code_lines @@ -42,6 +46,19 @@ def each_neighbor_block(target_line) neighbors = scan.code_block.lines + # Block production here greatly affects quality and performance. + # + # Larger blocks produce a faster search as the frontier must go + # through fewer iterations. However too large of a block, will + # degrade output quality if too many unrelated lines are caught + # in an invalid block. + # + # Another concern is being too clever with block production. + # Quality of the end result depends on sometimes including unrelated + # lines. For example in code like `deffoo; end` we want to match + # both lines as the programmer's mistake was missing a space in the + # `def` even though technically we could make it valid by simply + # removing the "extra" `end`. block = CodeBlock.new(lines: neighbors) if neighbors.length <= 2 || block.valid? yield block diff --git a/spec/integration/dead_end_spec.rb b/spec/integration/dead_end_spec.rb index 8ad9913..281345f 100644 --- a/spec/integration/dead_end_spec.rb +++ b/spec/integration/dead_end_spec.rb @@ -4,9 +4,7 @@ module DeadEnd RSpec.describe "Integration tests that don't spawn a process (like using the cli)" do - it "does not timeout on massive files" do - next unless ENV["DEAD_END_TIMEOUT"] - + it "does not timeout on massive files", slow: true do file = fixtures_dir.join("syntax_tree.rb.txt") lines = file.read.lines lines.delete_at(768 - 1) @@ -142,6 +140,8 @@ module DeadEnd expect(out).to include(<<~EOM) 16 class Rexe + 18 VERSION = '1.5.1' + 20 PROJECT_URL = 'https://github.com/keithrbennett/rexe' ❯ 77 class Lookups ❯ 78 def input_modes ❯ 148 end diff --git a/spec/integration/exe_cli_spec.rb b/spec/integration/exe_cli_spec.rb index 5a49d9a..75e3bbf 100644 --- a/spec/integration/exe_cli_spec.rb +++ b/spec/integration/exe_cli_spec.rb @@ -14,7 +14,7 @@ def exe(cmd) out end - it "prints the version" do + it "prints the version", slow: true do out = exe("-v") expect(out.strip).to include(DeadEnd::VERSION) end diff --git a/spec/integration/ruby_command_line_spec.rb b/spec/integration/ruby_command_line_spec.rb index e124287..35a2ded 100644 --- a/spec/integration/ruby_command_line_spec.rb +++ b/spec/integration/ruby_command_line_spec.rb @@ -4,7 +4,7 @@ module DeadEnd RSpec.describe "Requires with ruby cli" do - it "namespaces all monkeypatched methods" do + it "namespaces all monkeypatched methods", slow: true do Dir.mktmpdir do |dir| tmpdir = Pathname(dir) script = tmpdir.join("script.rb") @@ -43,7 +43,7 @@ module DeadEnd end end - it "detects require error and adds a message with auto mode" do + it "detects require error and adds a message with auto mode", slow: true do Dir.mktmpdir do |dir| tmpdir = Pathname(dir) script = tmpdir.join("script.rb") diff --git a/spec/unit/balance_heuristic_expand_spec.rb b/spec/unit/balance_heuristic_expand_spec.rb new file mode 100644 index 0000000..cc3066a --- /dev/null +++ b/spec/unit/balance_heuristic_expand_spec.rb @@ -0,0 +1,230 @@ +# frozen_string_literal: true + +require_relative "../spec_helper" + +module DeadEnd + RSpec.describe BalanceHeuristicExpand do + it "can handle 'unknown' direction code" do + source = <<~'EOM' + parser.on('-r', '--require REQUIRE(S)', + 'Gems and built-in libraries (e.g. shellwords, yaml) to require, comma separated, or ! to clear') do |v| + if v == '!' + options.requires.clear + else + v.split(',').map(&:strip).each do |r| + if r[0] == '-' + options.requires -= [r[1..-1]] + else + options.requires << r + end + end + end + end + EOM + + lines = CleanDocument.new(source: source).call.lines + expand = BalanceHeuristicExpand.new( + code_lines: lines, + block: CodeBlock.new(lines: lines[1]) + ) + + expect(expand.direction).to eq(:both) + expand.call + expect(expand.to_s).to eq(<<~'EOM') + parser.on('-r', '--require REQUIRE(S)', + 'Gems and built-in libraries (e.g. shellwords, yaml) to require, comma separated, or ! to clear') do |v| + if v == '!' + EOM + + expand.call + expect(expand.to_s).to eq(<<~'EOM') + parser.on('-r', '--require REQUIRE(S)', + 'Gems and built-in libraries (e.g. shellwords, yaml) to require, comma separated, or ! to clear') do |v| + if v == '!' + options.requires.clear + else + v.split(',').map(&:strip).each do |r| + if r[0] == '-' + options.requires -= [r[1..-1]] + else + options.requires << r + end + end + end + end + EOM + end + + it "does not generate (known) invalid blocks when started at different positions" do + source = <<~EOM + Foo.call do |a + # inner + end # one + + print lol + class Foo + end # two + EOM + lines = CodeLine.from_source(source) + expand = BalanceHeuristicExpand.new( + code_lines: lines, + block: CodeBlock.new(lines: lines[1]) + ) + expect(expand.direction).to eq(:equal) + expand.call + expect(expand.to_s).to eq(<<~'EOM') + Foo.call do |a + # inner + end # one + + print lol + class Foo + end # two + EOM + + expand = BalanceHeuristicExpand.new( + code_lines: lines, + block: CodeBlock.new(lines: lines[0]) + ) + expect(expand.call.to_s).to eq(<<~'EOM') + Foo.call do |a + # inner + end # one + + print lol + class Foo + end # two + EOM + + expand = BalanceHeuristicExpand.new( + code_lines: lines, + block: CodeBlock.new(lines: lines[2]) + ) + expect(expand.direction).to eq(:up) + + expand.call + + expect(expand.to_s).to eq(<<~'EOM') + Foo.call do |a + # inner + end # one + EOM + + expand = BalanceHeuristicExpand.new( + code_lines: lines, + block: CodeBlock.new(lines: lines[3]) + ) + expect(expand.direction).to eq(:equal) + expand.call + expect(expand.to_s).to eq(<<~'EOM') + Foo.call do |a + # inner + end # one + + print lol + EOM + + expand = BalanceHeuristicExpand.new( + code_lines: lines, + block: CodeBlock.new(lines: lines[4]) + ) + expect(expand.direction).to eq(:equal) + expand.call + expect(expand.to_s).to eq(<<~'EOM') + Foo.call do |a + # inner + end # one + + print lol + EOM + + expand = BalanceHeuristicExpand.new( + code_lines: lines, + block: CodeBlock.new(lines: lines[5]) + ) + expect(expand.direction).to eq(:down) + expand.call + expect(expand.to_s).to eq(<<~'EOM') + class Foo + end # two + EOM + end + + it "expands" do + source = <<~EOM + class Blerg + Foo.call do |a + end # one + + print lol + class Foo + end # two + end # three + EOM + lines = CodeLine.from_source(source) + expand = BalanceHeuristicExpand.new( + code_lines: lines, + block: CodeBlock.new(lines: lines[5]) + ) + expect(expand.call.to_s).to eq(<<~'EOM'.indent(2)) + class Foo + end # two + EOM + expect(expand.call.to_s).to eq(<<~'EOM'.indent(2)) + Foo.call do |a + end # one + + print lol + class Foo + end # two + EOM + + expect(expand.call.to_s).to eq(<<~'EOM') + class Blerg + Foo.call do |a + end # one + + print lol + class Foo + end # two + end # three + EOM + end + + it "expands up when on an end" do + lines = CodeLine.from_source(<<~'EOM') + Foo.new do + end + EOM + expand = BalanceHeuristicExpand.new( + code_lines: lines, + block: CodeBlock.new(lines: lines[1]) + ) + expect(expand.direction).to eq(:up) + expand.call + expect(expand.direction).to eq(:stop) + + expect(expand.start_index).to eq(0) + expect(expand.end_index).to eq(1) + expect(expand.to_s).to eq(lines.join) + end + + it "expands down when on a keyword" do + lines = CodeLine.from_source(<<~'EOM') + Foo.new do + end + EOM + expand = BalanceHeuristicExpand.new( + code_lines: lines, + block: CodeBlock.new(lines: lines[0]) + ) + expect(expand.direction).to eq(:down) + expand.call + expect(expand.direction).to eq(:stop) + + expect(expand.start_index).to eq(0) + expect(expand.end_index).to eq(1) + expect(expand.to_s).to eq(lines.join) + end + end +end diff --git a/spec/unit/block_expand_spec.rb b/spec/unit/indent_block_expand_spec.rb similarity index 89% rename from spec/unit/block_expand_spec.rb rename to spec/unit/indent_block_expand_spec.rb index dc4dade..ca9afcc 100644 --- a/spec/unit/block_expand_spec.rb +++ b/spec/unit/indent_block_expand_spec.rb @@ -3,7 +3,7 @@ require_relative "../spec_helper" module DeadEnd - RSpec.describe BlockExpand do + RSpec.describe IndentBlockExpand do it "captures multiple empty and hidden lines" do source_string = <<~EOM def foo @@ -22,7 +22,7 @@ def foo code_lines[6].mark_invisible block = CodeBlock.new(lines: [code_lines[3]]) - expansion = BlockExpand.new(code_lines: code_lines) + expansion = IndentBlockExpand.new(code_lines: code_lines) block = expansion.call(block) expect(block.to_s).to eq(<<~EOM.indent(4)) @@ -47,7 +47,7 @@ def foo code_lines = code_line_array(source_string) block = CodeBlock.new(lines: [code_lines[3]]) - expansion = BlockExpand.new(code_lines: code_lines) + expansion = IndentBlockExpand.new(code_lines: code_lines) block = expansion.call(block) expect(block.to_s).to eq(<<~EOM.indent(4)) @@ -71,7 +71,7 @@ def foo code_lines = code_line_array(source_string) block = CodeBlock.new(lines: [code_lines[3]]) - expansion = BlockExpand.new(code_lines: code_lines) + expansion = IndentBlockExpand.new(code_lines: code_lines) block = expansion.call(block) expect(block.to_s).to eq(<<~EOM.indent(4)) @@ -104,7 +104,7 @@ def foo code_lines = code_line_array(source_string) block = CodeBlock.new(lines: [code_lines[2]]) - expansion = BlockExpand.new(code_lines: code_lines) + expansion = IndentBlockExpand.new(code_lines: code_lines) block = expansion.call(block) expect(block.to_s).to eq(<<~EOM.indent(2)) @@ -138,7 +138,7 @@ def foo lines: code_lines[6] ) - expansion = BlockExpand.new(code_lines: code_lines) + expansion = IndentBlockExpand.new(code_lines: code_lines) block = expansion.call(block) expect(block.to_s).to eq(<<~EOM.indent(2)) @@ -171,7 +171,7 @@ def foo EOM code_lines = code_line_array(source_string) - expansion = BlockExpand.new(code_lines: code_lines) + expansion = IndentBlockExpand.new(code_lines: code_lines) block = CodeBlock.new(lines: code_lines[3]) block = expansion.call(block) diff --git a/spec/unit/left_right_lex_count_spec.rb b/spec/unit/left_right_lex_count_spec.rb new file mode 100644 index 0000000..ce6ee51 --- /dev/null +++ b/spec/unit/left_right_lex_count_spec.rb @@ -0,0 +1,8 @@ +# frozen_string_literal: true + +require_relative "../spec_helper" + +module DeadEnd + RSpec.describe LeftRightLexCount do + end +end diff --git a/spec/unit/lex_pair_diff_spec.rb b/spec/unit/lex_pair_diff_spec.rb new file mode 100644 index 0000000..bceb2f6 --- /dev/null +++ b/spec/unit/lex_pair_diff_spec.rb @@ -0,0 +1,43 @@ +# frozen_string_literal: true + +require_relative "../spec_helper" + +module DeadEnd + RSpec.describe "LexPairDiff" do + it "leans unknown" do + diff = LexPairDiff.from_lex( + lex: LexAll.new(source: "[}").to_a, + is_kw: false, + is_end: false + ) + expect(diff.leaning).to eq(:both) + end + + it "leans right" do + diff = LexPairDiff.from_lex( + lex: LexAll.new(source: "}").to_a, + is_kw: false, + is_end: false + ) + expect(diff.leaning).to eq(:right) + end + + it "leans left" do + diff = LexPairDiff.from_lex( + lex: LexAll.new(source: "{").to_a, + is_kw: false, + is_end: false + ) + expect(diff.leaning).to eq(:left) + end + + it "leans equal" do + diff = LexPairDiff.from_lex( + lex: LexAll.new(source: "{}").to_a, + is_kw: false, + is_end: false + ) + expect(diff.leaning).to eq(:equal) + end + end +end