From 7b6f5199e167f256b9536cbdd19e275d50d7ecab Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Thu, 12 Sep 2024 10:52:27 +0200 Subject: [PATCH 01/76] experimental HiFi tree diff algorithm for use with quick-fixes and refactoring commands in the IDE --- .../analysis/diff/edits/HiFiTreeDiff.rsc | 235 ++++++++++++++++++ 1 file changed, 235 insertions(+) create mode 100644 src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc new file mode 100644 index 00000000000..8708f61a192 --- /dev/null +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -0,0 +1,235 @@ +@license{ +Copyright (c) 2018-2023, NWO-I Centrum Wiskunde & Informatica +All rights reserved. + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + +1. Redistributions of source code must retain the above copyright notice, +this list of conditions and the following disclaimer. + +2. Redistributions in binary form must reproduce the above copyright notice, +this list of conditions and the following disclaimer in the documentation +and/or other materials provided with the distribution. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. +} +@synopsis{Infer ((TextEdit)) from the differences between two parse ((ParseTree::Tree))s} +@description{ +This module will move to the Rascal standard library. +} +module analysis::diff::edits::HiFiTreeDiff + +extend analysis::diff::edits::TextEdits; +import ParseTree; +import List; +import String; + +@synopsis{Detects minimal differences between parse trees and makes them explicit as ((TextEdit)) instructions.} +@description{ +This is a "diff" algorithm of two parse trees to generate a ((TextEdit)) script that applies the differences on +the textual level, _with minimal collatoral damage in whitespace_. This is why it is called "HiFi": minimal unnecessary +noise introduction to the original file. + +The resulting ((TextEdit))s are an intermediate representation for making changes in source code text files. They can be executed independently via ((ExecuteTextEdits)), or interactively via ((IDEServices)), or LanguageServer features. + +This top-down diff algorithm takes two arguments: +1. an _original_ parse tree for a text file, +2. and a _derived_ parse tree that is mostly equal to the original but has pieces of it substituted or rewritten. + +From the tree node differences between these two trees, ((TextEdit))s are derived such that: +* when the edited source text is parsed again, the resulting tree would match the derived tree. +However, the parsed tree could be different from the derived tree in terms of whitespace, indentation and case-insensitive literals (see below). +* when tree nodes (grammar rules) are equal, smaller edits are searched by pair-wise comparison of the children +* differences between respective layout or (case insensitve) literal nodes are always ignored +* when lists have changed, careful editing of possible separators ensures syntactic correctness +* when new sub-trees are inserted, the replacement will be at the same indentation level as the original. (((TODO this is a todo))) +* when case-insensitive literals have been changed under a grammar rule that remained the same, no edits are produced. + +The function comes in handy when we use Rascal to rewrite parse trees, and then need to communicate the effect +back to the IDE (for example using ((util::IDEServices)) or ((util::LanguageServer)) interfaces). We use +((ExecuteTextEdits)) to _test_ the effect of ((TextEdits)) while developing a source-to-source transformation. +} +@benefits{ +* This function allows the language engineer to work in terms of abstract and concrete syntax trees while manipulating source text. The +((TextEdit))s intermediate representation bridge the gap to the minute details of IDE interaction such as "undo" and "preview" features. +* Text editing is fraught with details of whitespace, comments, list separators; all of which are handled here by +the exactness of syntactic and semantic knowledge of the parse trees. +* Where possible the algorithm also retains the capitalization of case-insensitive literals. +* The algorithm retrieves and retains indentation levels from the original tree, even if sub-trees in the +derived tree have mangled indentation. This allows us to ignore the indentation concern while thinking of rewrite +rules for source-to-souce transformation, and focus on the semantic effect. +} +@pitfalls{ +* If the first argument is not an original parse tree, then basic assumptions of the algorithm fail and it may produce erroneous text edits. +* If the second argument is not derived from the original, then the algorithm will produce a single text edit to replace the entire source text. +* If the parse tree of the original does not reflect the current state of the text in the file, then the generated text edits will do harm. +* If the original tree is not annotated with source locations, the algorithm fails. +* Both parse trees must be type correct, e.g. the number of symbols in a production rule, must be equal to the number of elements of the argument list of ((Tree::appl)). +* This algorithm does not work with ambiguous (sub)trees. +} +@examples{ +If we rewrite parse trees, this can be done with concrete syntax matching. +The following example swaps the if-branch with the else-branch in Pico: + +```rascal-shell +import lang::pico::\syntax::Main; +import IO; +import analysis::diff::edits::ExecuteTextEdits; +import analysis::diff::edits::TextEdits; +import analysis::diff::edits::TreeDiff; +// an example Pico program: +writeFile(|tmp://example.pico|, + "begin + ' declare + ' a : natural, + ' b : natural; + ' if a then + ' a := b + ' else + ' b := a + ' fi + 'end"); +original = parse(#start[Program], |tmp://example.pico|); +// match and replace all conditionals +rewritten = visit(original) { + case (Statement) `if then <{Statement ";"}* ifBranch> else <{Statement ";"}* elseBranch> fi` + => (Statement) `if then + ' <{Statement ";"}* elseBranch> + 'else + ' <{Statement ";"}* ifBranch> + 'fi` +} +// Check the result as a string. It worked, but we see some collatoral damage in whitespace (indentation). +"" +// Now derive text edits from the two parse trees: +edits = treeDiff(original, rewritten); +// Wrap them in a single document edit +edit = changed(original@\loc.top, edits); +// Apply the document edit on disk: +executeDocumentEdit(edit); +// and when we read the result back, we see the transformation succeeded, and indentation was not lost: +readFile(tmp://example.pico|); +``` +} +// equal trees generate empty diffs (note this already ignores whitespace differences) +default list[TextEdit] treeDiff(Tree a, a) = []; + +// When the productions are different, we've found an edit, and there is no need to recurse deeper. +list[TextEdit] treeDiff( + t:appl(Production p:prod(_,_,_), list[Tree] _), + r:appl(Production q:!p , list[Tree] _)) + = t@\loc? + ? [replace(t@\loc, learnIndentation("", "")] + : /* literals and layout (without @\loc) are ignored */ []; + +// If a first element is removed and there are elements left, skip the separator too +list[TextEdit] treeDiff( + t:appl(Production p:regular(Symbol reg), list[Tree] aElems), + appl(p, list[Tree] bElems)) + = listDiff(t@\loc, prepareSeparators(aElems, seps(reg)), prepareSeparators(bElems, seps(reg))); + +// When the productions are equal, but the trees may be different, we dig deeper for differences +list[TextEdit] treeDiff(appl(Production p, list[Tree] argsA), appl(p, list[Tree] argsB)) + = [*treeDiff(a, b) | <- zip2(argsA, argsB)]; + +@synopsis{decide how many separators we have} +int seps(\iter-seps(_,list[Symbol] s)) = size(s); +int seps(\iter-star-seps(_,list[Symbol] s)) = size(s); +default int seps(Symbol _) = 0; + +@synopsis{Finds minimal edits to list elements, taking extra care of removing separators when so required.} +@description{ +To make this easy, we add source location information to each original separator first, and then +reuse the rest of the algorithm which normally ignores separators. +} +list[TextEdit] listDiff(loc _span, [], []) = []; + +// equal length, we assume only specific elements have changed. +list[TextEdit] listDiff(loc _span, list[Tree] elemsA, list[Tree] elemsB) = equalLengthDiff(elemsA, elemsB) + when size(elemsA) == size(elemsB); + +// additional elements, and possibly other elements have changed. +list[TextEdit] listDiff(loc span, list[Tree] elemsA, list[Tree] elemsB) = longerLengthDiff(span, elemsA, elemsB) + when size(elemsA) < size(elemsB); + +// fewer elements, and possibly other elements have changed. +list[TextEdit] listDiff(list[Tree] elemsA, list[Tree] elemsB) = shorterLengthDiff(elemsA, elemsB) + when size(elemsA) > size(elemsB); + +// this works only because we annotated the separators. +list[TextEdit] equalLengthDiff(list[Tree] elemsA, list[Tree] elemsB) + = [*treeDiff(a,b) | <- zip2(elemsA, elemsB)]; + +// added things to an empty list. this is also the final stage of a deep recursion +list[TextEdit] longerLengthDiff(loc span, [], list[Tree] elemsB) = [replace(span, yield(elemsB))]; + +// equal length lists can be forwarded (this happens when we already found the extra elements) +list[TextEdit] longerLengthDiff(loc span, list[Tree] elemsA, list[Tree] elemsB) + = equalLengthDiff(elemsA, elemsB) when size(elemsA) == size(elemsB); + +// always ignore identical trees, and continue with the rest +list[TextEdit] longerLengthDiff(loc span, [Tree a, *Tree elemsA], [a, *Tree elemsB]) + = longerLengthDiff(span[offset=a@\loc.offset][length=span.length-a@\loc.length], elemsA, elemsB); + +// a single elem is different and also new by definition because ("longerLengthDiff") +list[TextEdit] longerLengthDiff(loc span, [Tree a, *Tree elemsA], [Tree b:!a, *Tree elemsB]) + = [replace(span[length=0], "")] // we put b in front of a + + (size(elemsA) + 1 == size(elemsB) // and continue with the rest + ? equalLengthDiff([a, *elemsA], elemsB) // this could have been the last additional element + : longerLengthDiff(span, [a, *elemsA], elemsB)) // or we still have more to add + ; + +// we have to remove the elements that are replaced by an empty list +list[TextEdit] shorterLengthDiff(loc span, list[Tree] _, []) + = [replace(span, "")]; + +// always ignore identical trees, and continue with the rest +list[TextEdit] shorterLengthDiff(loc span, [Tree a, *Tree elemsA], [a, *Tree elemsB]) + = shorterLengthDiff(span[offset=a@\loc.offset][length=span.length-a@\loc.length], elemsA, elemsB); + +// a single elem is different and also superfluous by definition because ("shorterLengthDiff") +list[TextEdit] shorterLengthDiff(loc span, [Tree a, *Tree elemsA], [Tree b:!a, *Tree elemsB]) + = [replace(a@\loc, "")] // we replace a by b + + shorterLengthDiff(span, elemsA, elemsB) // and continue with the rest + // TODO: the lists could have become of equal length. Deal with that case. + ; + +private Production sepProd = prod(layouts("*separators*"),[],{}); + +@synopsis{yield a consecutive list of trees} +private str yield(list[Tree] elems) = "<}>"; + +@synopsis{Separator literals need location annotations because they have to be edited.} +private list[Tree] prepareSeparators([], int _) = []; + +private list[Tree] prepareSeparators([Tree t], int _) = [t]; + +// we group the 3 separators into a single tree with accurate position information. +private list[Tree] prepareSeparators([Tree head, Tree l1, Tree sep, Tree l2, *Tree rest], 3) + = [head, appl(sepProd, [l1, newSep, l2])[@\loc=span], *prepareSeparators(rest)] + when + span := head@\loc.top(end(head@\loc), size("")); + +// single separators get accurate position informaiton (even if they are layout) +private list[Tree] prepareSeparators([Tree head, Tree sep, *Tree rest], 1) + = [head, sep[\loc=span], *prepareSeparators(rest)] + when + span := head@\loc.top(end(head@\loc), size("")); + +// unseparated lists are ready +private list[Tree] prepareSeparators(list[Tree] elems, 0) = elems; + +private int end(loc src) = src.offset + src.length; + +private str learnIndentation(str replacement, str original) = replacement; // TODO: learn minimal indentaton from original From 374a8a295362bc296741f81e11b2178aa6ec1d2b Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Tue, 1 Oct 2024 11:32:03 +0200 Subject: [PATCH 02/76] developing the list diff algorithms with inspiration from the diff tool --- .../analysis/diff/edits/HiFiTreeDiff.rsc | 101 ++++++++++++++++-- 1 file changed, 93 insertions(+), 8 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index 8708f61a192..88b4a8d3699 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -34,6 +34,7 @@ extend analysis::diff::edits::TextEdits; import ParseTree; import List; import String; +import Locations; @synopsis{Detects minimal differences between parse trees and makes them explicit as ((TextEdit)) instructions.} @description{ @@ -123,24 +124,63 @@ readFile(tmp://example.pico|); ``` } // equal trees generate empty diffs (note this already ignores whitespace differences) -default list[TextEdit] treeDiff(Tree a, a) = []; +list[TextEdit] treeDiff(Tree a, a) = []; + +// skip production labels of original rules when diffing +list[TextEdit] treeDiff( + appl(prod(label(_, Symbol s), syms, attrs), list[Tree] args), + Tree u) + = treeDiff(appl(prod(s, syms, attrs), args), u); + +// skip production labels of replacement rules when diffing +list[TextEdit] treeDiff( + Tree t, + appl(prod(label(_, Symbol s), syms, attrs), list[Tree] args)) + = treeDiff(t, appl(prod(s, syms, attrs), args)); + +// matched layout trees generate empty diffs such that the original is maintained +list[TextEdit] treeDiff( + appl(prod(layouts(_), _, _), list[Tree] _), + appl(prod(layouts(_), _, _), list[Tree] _)) + = []; + +// matched literal trees generate empty diffs +list[TextEdit] treeDiff( + appl(prod(lit(str l), _, _), list[Tree] _), + appl(prod(lit(l) , _, _), list[Tree] _)) + = []; + +// matched case-insensitive literal trees generate empty diffs such that the original is maintained +list[TextEdit] treeDiff( + appl(prod(cilit(str l), _, _), list[Tree] _), + appl(prod(cilit(l) , _, _), list[Tree] _)) + = []; + +// different lexicals generate small diffs even if the parent is equal +list[TextEdit] treeDiff( + t:appl(prod(lex(str l), _, _), list[Tree] _), + r:appl(prod(lex(l) , _, _), list[Tree] _)) + = [replace(t@\loc, learnIndentation("", ""))] + when t != r; // When the productions are different, we've found an edit, and there is no need to recurse deeper. list[TextEdit] treeDiff( t:appl(Production p:prod(_,_,_), list[Tree] _), r:appl(Production q:!p , list[Tree] _)) = t@\loc? - ? [replace(t@\loc, learnIndentation("", "")] + ? [replace(t@\loc, learnIndentation("", ""))] : /* literals and layout (without @\loc) are ignored */ []; -// If a first element is removed and there are elements left, skip the separator too + +// If list production are the same, then the element lists can still be of different length +// and we switch to listDiff which has different heuristics than normal trees. list[TextEdit] treeDiff( - t:appl(Production p:regular(Symbol reg), list[Tree] aElems), + Tree t:appl(Production p:regular(Symbol reg), list[Tree] aElems), appl(p, list[Tree] bElems)) - = listDiff(t@\loc, prepareSeparators(aElems, seps(reg)), prepareSeparators(bElems, seps(reg))); + = listDiff(t@\loc, seps(reg), aElems, bElems); -// When the productions are equal, but the trees may be different, we dig deeper for differences -list[TextEdit] treeDiff(appl(Production p, list[Tree] argsA), appl(p, list[Tree] argsB)) +// When the productions are equal, but the children may be different, we dig deeper for differences +default list[TextEdit] treeDiff(appl(Production p, list[Tree] argsA), appl(p, list[Tree] argsB)) = [*treeDiff(a, b) | <- zip2(argsA, argsB)]; @synopsis{decide how many separators we have} @@ -148,6 +188,51 @@ int seps(\iter-seps(_,list[Symbol] s)) = size(s); int seps(\iter-star-seps(_,list[Symbol] s)) = size(s); default int seps(Symbol _) = 0; +@synsopis{List diff is like text diff on lines; complex and easy to make slow} +list[TextEdit] listDiff(loc _span, int seps, list[Tree] originals, list[Tree] replacements) { + assert originals != replacements && originals == []; + = trimEqualElements(originals, replacements); + span = cover([orig@\loc | orig <- originals, orig@\loc?]); + + assert originals != replacements && originals != []; + = commonSpecialCases(span, seps, originals, replacements); + + return [*edits, *genericListDiff(span, originals, replacements)]; +} + +@synopsis{trips equal elements from the front and the back of both lists, if any.} +tuple[list[Tree], list[Tree]] trimEqualElements([Tree a, *Tree aTail], [ a, *Tree bTail]) + = ; + +tuple[list[Tree], list[Tree]] trimEqualElements([*Tree aHead, Tree a], [*Tree bHead, a]) + = ; + +default tuple[list[Tree], list[Tree]] trimEqualElements(list[Tree] a, list[Tree] b) + = ; + +// only one element removed in front, then we are done +tuple[list[TextEdit], list[Tree], list[Tree]] commonSpecialCases(loc span, 0, [Tree a, *Tree tail], [*tail]) + = <[replace(a@\loc, "", "")], [], []>; + +// only one element removed in front, plus 1 separator, then we are done because everything is the same +tuple[list[TextEdit], list[Tree], list[Tree]] commonSpecialCases(loc span, 1, + [Tree a, Tree _sep, Tree tHead, *Tree tail], [tHead, *tail]) + = <[replace(fromUntil(a, tHead), "", "")], [], []>; + +@synopsis{Compute location span that is common between an element and a succeeding element} +@description{ +The resulting loc is including the `from` but exclusing the `until`. It goes right +up to `until`. +```ascii-art + [from] gap [until] + <---------> +```` +} +private loc fromUntil(loc from, loc until) = from.top(from.offset, until.offset - from.offset); + +@synopsis{convenience overload for shorter code} +private loc fromUntil(Tree from, Tree until) = fromUntil(fro@\loc, until@\loc); + @synopsis{Finds minimal edits to list elements, taking extra care of removing separators when so required.} @description{ To make this easy, we add source location information to each original separator first, and then @@ -164,7 +249,7 @@ list[TextEdit] listDiff(loc span, list[Tree] elemsA, list[Tree] elemsB) = longer when size(elemsA) < size(elemsB); // fewer elements, and possibly other elements have changed. -list[TextEdit] listDiff(list[Tree] elemsA, list[Tree] elemsB) = shorterLengthDiff(elemsA, elemsB) +list[TextEdit] listDiff(loc span, list[Tree] elemsA, list[Tree] elemsB) = shorterLengthDiff(span, elemsA, elemsB) when size(elemsA) > size(elemsB); // this works only because we annotated the separators. From c623d2b37bfcf0cb1335e0eecdd3966ba662fbb4 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Mon, 7 Oct 2024 16:14:33 +0200 Subject: [PATCH 03/76] made some progress with the list algorithm --- .../analysis/diff/edits/HiFiTreeDiff.rsc | 232 ++++++++++-------- 1 file changed, 129 insertions(+), 103 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index 88b4a8d3699..0b77130ecec 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -26,7 +26,59 @@ POSSIBILITY OF SUCH DAMAGE. } @synopsis{Infer ((TextEdit)) from the differences between two parse ((ParseTree::Tree))s} @description{ -This module will move to the Rascal standard library. +This module provides an essential building block for creating high-fidelity source-to-source code transformations. +It is common for industrial use cases of source-to-source transformation to extract +a list of text edits programmatically using parse tree pattern matching. This way the +changes are made on the textual level, with less introduction of noise and fewer removals +of valuable layout (indentation) and source code comments. + +The construction of such high-fidelity edit lists can be rather involved because it tangles +and scatters a number of concerns: +1. syntax-directed pattern matching +2. string substitution; construction of the rewritten text + * retention of layout and in particular indentation + * retention of source code comments + * retention of specific case-insensitive keyword style + * syntactic correctness of the result; especially in relation to list separators there are many corner-cases to thing of + +On the other hand, ParseTree to ParseTree rewrites are much easier to write and get correct. +They are "syntax directed" via the shape of the tree that follows the grammar of the language. +Some if not all of the above aspects are tackled by the rewriting mechanism with concrete patterns. +Especially the corner cases w.r.t. list separators are all handled by the rewriting mechanisms. +Also the rules are in "concrete syntax", on both the matching and the substition side. So they are +readable for all who know the object language. The rules guarantee syntactic correctness of the +rewritten source code. However, rewrite rules do quite some noisy damage to the layout, indentation +and comments, of the result. + +With this module we bring these two modalities of source-to-source transformations together: +1. The language engineer uses concrete syntax rewrite rules to derive a new ParseTree from the original; +2. We run ((treeDiff)) to obtain a set of minimal text edit; +3. We apply the text edits to the editor contents or the file system. +} +@benefits{ +* Because the derived text edits change fewer characters, the end result is more "hifi" than simply +unparsing the rewritten ParseTree. More comments are retained and more indentation is kept the same. More +case-insensitive keywords retain their original shape. +* At the same time the rewrite rules are easier to maintain as they remain "syntax directed". +* Changes to the grammar will be picked up when checking all source and target patterns. +* The diff algorithm uses cross-cutting information from the parse tree (what is layout and what not, + what is case-insensitive, etc.) which would otherwise have to be managed by the language engineer in _every rewrite rule_. +* The diff algoritm understands what indentation is and brings new sub-trees to the original level +of indentation (same as the sub-trees they are replacing) +* Typically the algorithm's run-time is lineair in the size of the tree, or better. Same for memory usage. +} +@pitfalls{ +* ((treeDiff)) only works under the assumption that the second tree was derived from the first +by applying concrete syntax rewrite rules in Rascal. If there is no origin relation between the two +then its heuristics will not work. The algorithm could degenerate to substituting the entire file, +or worse it could degenerate to an exponential search for commonalities in long lists. +* ((treeDiff))'s efficiency is predicated on the two trees being derived from each other in main memory of the currently running JVM. +This way both trees will share pointers where they are the same, which leads to very efficient equality +testing. If the trees are first independently serialized to disk and then deserialized again, and then ((treeDiff)) is called, +this optimization is not present and the algorithm will perform (very) poorly. +* Substitution patterns should be formatted as best as possible. The algorithm will not infer +spacing or relative indentation inside of the substituted subtree. It will only infer indentation +for the entire subtree. } module analysis::diff::edits::HiFiTreeDiff @@ -34,7 +86,7 @@ extend analysis::diff::edits::TextEdits; import ParseTree; import List; import String; -import Locations; +import Location; @synopsis{Detects minimal differences between parse trees and makes them explicit as ((TextEdit)) instructions.} @description{ @@ -42,7 +94,8 @@ This is a "diff" algorithm of two parse trees to generate a ((TextEdit)) script the textual level, _with minimal collatoral damage in whitespace_. This is why it is called "HiFi": minimal unnecessary noise introduction to the original file. -The resulting ((TextEdit))s are an intermediate representation for making changes in source code text files. They can be executed independently via ((ExecuteTextEdits)), or interactively via ((IDEServices)), or LanguageServer features. +The resulting ((TextEdit))s are an intermediate representation for making changes in source code text files. +They can be executed independently via ((ExecuteTextEdits)), or interactively via ((IDEServices)), or LanguageServer features. This top-down diff algorithm takes two arguments: 1. an _original_ parse tree for a text file, @@ -74,6 +127,8 @@ rules for source-to-souce transformation, and focus on the semantic effect. @pitfalls{ * If the first argument is not an original parse tree, then basic assumptions of the algorithm fail and it may produce erroneous text edits. * If the second argument is not derived from the original, then the algorithm will produce a single text edit to replace the entire source text. +* If the second argument was not produced from the first in the same JVM memory, it will not share many pointers to equal sub-trees +and the performance of the algorithm will degenerate quickly. * If the parse tree of the original does not reflect the current state of the text in the file, then the generated text edits will do harm. * If the original tree is not annotated with source locations, the algorithm fails. * Both parse trees must be type correct, e.g. the number of symbols in a production rule, must be equal to the number of elements of the argument list of ((Tree::appl)). @@ -164,13 +219,10 @@ list[TextEdit] treeDiff( when t != r; // When the productions are different, we've found an edit, and there is no need to recurse deeper. -list[TextEdit] treeDiff( +default list[TextEdit] treeDiff( t:appl(Production p:prod(_,_,_), list[Tree] _), r:appl(Production q:!p , list[Tree] _)) - = t@\loc? - ? [replace(t@\loc, learnIndentation("", ""))] - : /* literals and layout (without @\loc) are ignored */ []; - + = [replace(t@\loc, learnIndentation("", ""))]; // If list production are the same, then the element lists can still be of different length // and we switch to listDiff which has different heuristics than normal trees. @@ -191,33 +243,88 @@ default int seps(Symbol _) = 0; @synsopis{List diff is like text diff on lines; complex and easy to make slow} list[TextEdit] listDiff(loc _span, int seps, list[Tree] originals, list[Tree] replacements) { assert originals != replacements && originals == []; - = trimEqualElements(originals, replacements); - span = cover([orig@\loc | orig <- originals, orig@\loc?]); - - assert originals != replacements && originals != []; - = commonSpecialCases(span, seps, originals, replacements); + edits = []; + + // this algorithm isolates commonalities between the two lists + // by handling different special cases. It continues always with + // what is left to be different. By maximizing commonalities, + // the edits are minimized. Note that we float on source location parameters + // not only for the edit locations but also for sub-tree identity. + solve (originals, replacements) { + = trimEqualElements(originals, replacements); + span = cover([orig@\loc | orig <- originals, orig@\loc?]); + + = commonSpecialCases(span, seps, originals, replacements); + edits += specialEdits; + + equalSubList = largestEqualSubList(originals, replacements); + + if (equalSubList != [], + [*preO, *equalSubList, *postO] := originals, + [*preR, *equalSubList, *postR] := replacements) { + // TODO: what about the separators? + // we align the prefixes and the postfixes and + // continue recursively. + return edits + + listDiff(cover(preO), seps, preO, preR) + + listDiff(cover(postO), seps, postO, postR) + ; + } + } + + return edits; +} - return [*edits, *genericListDiff(span, originals, replacements)]; +@synopsis{Finds the largest sublist that occurs in both lists} +@description{ +Using list matching and backtracking, this algorithm detects which common +sublist is the largest. It assumes ((trimEqualElements)) has happened already, +and thus there are interesting differences left, even if we remove any equal +sublist. +} +list[Tree] largestEqualSubList(list[Tree] originals, list[Tree] replacements) { + assert := trimEqualElements(originals, replacements) : "both lists begin and end with unique elements"; + + bool largerList(list[Tree] a, list[Tree] b) = size(a) > size(b); + + equals = [eq | + [*_, pre, *eq, post, *_] := originals, size(eq) > 0, + [*_, !pre, *eq, !post, *_] := replacements + ]; + + return [largest, *_] := sort(equals, largerList) + ? largest + : [] // no equal sublists detected + ; } @synopsis{trips equal elements from the front and the back of both lists, if any.} -tuple[list[Tree], list[Tree]] trimEqualElements([Tree a, *Tree aTail], [ a, *Tree bTail]) - = ; +tuple[list[Tree], list[Tree]] trimEqualElements([Tree a, *Tree aPostfix], [ a, *Tree bPostfix]) + = ; -tuple[list[Tree], list[Tree]] trimEqualElements([*Tree aHead, Tree a], [*Tree bHead, a]) - = ; +tuple[list[Tree], list[Tree]] trimEqualElements([*Tree aPrefix, Tree a], [*Tree bPrefix, a]) + = ; default tuple[list[Tree], list[Tree]] trimEqualElements(list[Tree] a, list[Tree] b) = ; // only one element removed in front, then we are done tuple[list[TextEdit], list[Tree], list[Tree]] commonSpecialCases(loc span, 0, [Tree a, *Tree tail], [*tail]) - = <[replace(a@\loc, "", "")], [], []>; + = <[replace(a@\loc, "")], [], []>; // only one element removed in front, plus 1 separator, then we are done because everything is the same tuple[list[TextEdit], list[Tree], list[Tree]] commonSpecialCases(loc span, 1, [Tree a, Tree _sep, Tree tHead, *Tree tail], [tHead, *tail]) - = <[replace(fromUntil(a, tHead), "", "")], [], []>; + = <[replace(fromUntil(a, tHead), "")], [], []>; + +// only one element removed in front, plus 1 separator, then we are done because everything is the same +tuple[list[TextEdit], list[Tree], list[Tree]] commonSpecialCases(loc span, 3, + [Tree a, Tree _l1, Tree _sep, Tree _l2, Tree tHead, *Tree tail], [tHead, *tail]) + = <[replace(fromUntil(a, tHead), "")], [], []>; + + +@synopsis{convenience overload for shorter code} +private loc fromUntil(Tree from, Tree until) = fromUntil(fro@\loc, until@\loc); @synopsis{Compute location span that is common between an element and a succeeding element} @description{ @@ -229,92 +336,11 @@ up to `until`. ```` } private loc fromUntil(loc from, loc until) = from.top(from.offset, until.offset - from.offset); +private int end(loc src) = src.offset + src.length; -@synopsis{convenience overload for shorter code} -private loc fromUntil(Tree from, Tree until) = fromUntil(fro@\loc, until@\loc); - -@synopsis{Finds minimal edits to list elements, taking extra care of removing separators when so required.} -@description{ -To make this easy, we add source location information to each original separator first, and then -reuse the rest of the algorithm which normally ignores separators. -} -list[TextEdit] listDiff(loc _span, [], []) = []; - -// equal length, we assume only specific elements have changed. -list[TextEdit] listDiff(loc _span, list[Tree] elemsA, list[Tree] elemsB) = equalLengthDiff(elemsA, elemsB) - when size(elemsA) == size(elemsB); - -// additional elements, and possibly other elements have changed. -list[TextEdit] listDiff(loc span, list[Tree] elemsA, list[Tree] elemsB) = longerLengthDiff(span, elemsA, elemsB) - when size(elemsA) < size(elemsB); - -// fewer elements, and possibly other elements have changed. -list[TextEdit] listDiff(loc span, list[Tree] elemsA, list[Tree] elemsB) = shorterLengthDiff(span, elemsA, elemsB) - when size(elemsA) > size(elemsB); - -// this works only because we annotated the separators. -list[TextEdit] equalLengthDiff(list[Tree] elemsA, list[Tree] elemsB) - = [*treeDiff(a,b) | <- zip2(elemsA, elemsB)]; - -// added things to an empty list. this is also the final stage of a deep recursion -list[TextEdit] longerLengthDiff(loc span, [], list[Tree] elemsB) = [replace(span, yield(elemsB))]; - -// equal length lists can be forwarded (this happens when we already found the extra elements) -list[TextEdit] longerLengthDiff(loc span, list[Tree] elemsA, list[Tree] elemsB) - = equalLengthDiff(elemsA, elemsB) when size(elemsA) == size(elemsB); - -// always ignore identical trees, and continue with the rest -list[TextEdit] longerLengthDiff(loc span, [Tree a, *Tree elemsA], [a, *Tree elemsB]) - = longerLengthDiff(span[offset=a@\loc.offset][length=span.length-a@\loc.length], elemsA, elemsB); - -// a single elem is different and also new by definition because ("longerLengthDiff") -list[TextEdit] longerLengthDiff(loc span, [Tree a, *Tree elemsA], [Tree b:!a, *Tree elemsB]) - = [replace(span[length=0], "")] // we put b in front of a - + (size(elemsA) + 1 == size(elemsB) // and continue with the rest - ? equalLengthDiff([a, *elemsA], elemsB) // this could have been the last additional element - : longerLengthDiff(span, [a, *elemsA], elemsB)) // or we still have more to add - ; - -// we have to remove the elements that are replaced by an empty list -list[TextEdit] shorterLengthDiff(loc span, list[Tree] _, []) - = [replace(span, "")]; - -// always ignore identical trees, and continue with the rest -list[TextEdit] shorterLengthDiff(loc span, [Tree a, *Tree elemsA], [a, *Tree elemsB]) - = shorterLengthDiff(span[offset=a@\loc.offset][length=span.length-a@\loc.length], elemsA, elemsB); - -// a single elem is different and also superfluous by definition because ("shorterLengthDiff") -list[TextEdit] shorterLengthDiff(loc span, [Tree a, *Tree elemsA], [Tree b:!a, *Tree elemsB]) - = [replace(a@\loc, "")] // we replace a by b - + shorterLengthDiff(span, elemsA, elemsB) // and continue with the rest - // TODO: the lists could have become of equal length. Deal with that case. - ; - -private Production sepProd = prod(layouts("*separators*"),[],{}); +private loc cover(list[Tree] elems) = cover([e@\loc | e <- elems, e@\loc?]); @synopsis{yield a consecutive list of trees} private str yield(list[Tree] elems) = "<}>"; -@synopsis{Separator literals need location annotations because they have to be edited.} -private list[Tree] prepareSeparators([], int _) = []; - -private list[Tree] prepareSeparators([Tree t], int _) = [t]; - -// we group the 3 separators into a single tree with accurate position information. -private list[Tree] prepareSeparators([Tree head, Tree l1, Tree sep, Tree l2, *Tree rest], 3) - = [head, appl(sepProd, [l1, newSep, l2])[@\loc=span], *prepareSeparators(rest)] - when - span := head@\loc.top(end(head@\loc), size("")); - -// single separators get accurate position informaiton (even if they are layout) -private list[Tree] prepareSeparators([Tree head, Tree sep, *Tree rest], 1) - = [head, sep[\loc=span], *prepareSeparators(rest)] - when - span := head@\loc.top(end(head@\loc), size("")); - -// unseparated lists are ready -private list[Tree] prepareSeparators(list[Tree] elems, 0) = elems; - -private int end(loc src) = src.offset + src.length; - private str learnIndentation(str replacement, str original) = replacement; // TODO: learn minimal indentaton from original From 1525e738c9bc5f915fa484dbe3e616d5bf5ca6d6 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Mon, 7 Oct 2024 16:43:25 +0200 Subject: [PATCH 04/76] minor improvements. this is not finished yet --- .../analysis/diff/edits/HiFiTreeDiff.rsc | 45 ++++++++++--------- 1 file changed, 24 insertions(+), 21 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index 0b77130ecec..0e3292cd618 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -250,29 +250,32 @@ list[TextEdit] listDiff(loc _span, int seps, list[Tree] originals, list[Tree] re // what is left to be different. By maximizing commonalities, // the edits are minimized. Note that we float on source location parameters // not only for the edit locations but also for sub-tree identity. - solve (originals, replacements) { - = trimEqualElements(originals, replacements); - span = cover([orig@\loc | orig <- originals, orig@\loc?]); + + = trimEqualElements(originals, replacements); + span = cover([orig@\loc | orig <- originals, orig@\loc?]); - = commonSpecialCases(span, seps, originals, replacements); - edits += specialEdits; + = commonSpecialCases(span, seps, originals, replacements); + edits += specialEdits; - equalSubList = largestEqualSubList(originals, replacements); - - if (equalSubList != [], - [*preO, *equalSubList, *postO] := originals, - [*preR, *equalSubList, *postR] := replacements) { - // TODO: what about the separators? - // we align the prefixes and the postfixes and - // continue recursively. - return edits - + listDiff(cover(preO), seps, preO, preR) - + listDiff(cover(postO), seps, postO, postR) - ; - } + equalSubList = largestEqualSubList(originals, replacements); + + // by using the (or "a") largest common sublist as a pivot to divide-and-conquer + // to the left and right of it, we minimize the number of necessary + // edit actions for the entire list. + if (equalSubList != [], + [*preO, *equalSubList, *postO] := originals, + [*preR, *equalSubList, *postR] := replacements) { + // TODO: what about the separators? + // we align the prefixes and the postfixes and + // continue recursively. + return edits + + listDiff(cover(preO), seps, preO, preR) + + listDiff(cover(postO), seps, postO, postR) + ; + } + else { // nothing in common means we can replace the entire list + return edits + replace(span, learnIndentation(yield(replacements), yield(originals))); } - - return edits; } @synopsis{Finds the largest sublist that occurs in both lists} @@ -324,7 +327,7 @@ tuple[list[TextEdit], list[Tree], list[Tree]] commonSpecialCases(loc span, 3, @synopsis{convenience overload for shorter code} -private loc fromUntil(Tree from, Tree until) = fromUntil(fro@\loc, until@\loc); +private loc fromUntil(Tree from, Tree until) = fromUntil(from@\loc, until@\loc); @synopsis{Compute location span that is common between an element and a succeeding element} @description{ From 3196433fe647415695d302f41f8e48a27c54760d Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Thu, 10 Oct 2024 09:46:53 +0200 Subject: [PATCH 05/76] slow progress --- .../analysis/diff/edits/ExecuteTextEdits.rsc | 12 +++++-- .../analysis/diff/edits/HiFiTreeDiff.rsc | 34 +++++++++++++------ 2 files changed, 32 insertions(+), 14 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/ExecuteTextEdits.rsc b/src/org/rascalmpl/library/analysis/diff/edits/ExecuteTextEdits.rsc index e3417aea87d..90dcc727937 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/ExecuteTextEdits.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/ExecuteTextEdits.rsc @@ -24,16 +24,22 @@ void executeDocumentEdit(renamed(loc from, loc to)) { } void executeDocumentEdit(changed(loc file, list[TextEdit] edits)) { + str content = readFile(file); + + content = executeTextEdits(content, edits); + + writeFile(file.top, content); +} + +str executeTextEdits(str content, list[TextEdit] edits) { assert isSorted(edits, less=bool (TextEdit e1, TextEdit e2) { return e1.range.offset < e2.range.offset; }); - str content = readFile(file); - for (replace(loc range, str repl) <- reverse(edits)) { assert range.top == file.top; content = ""; } - writeFile(file.top, content); + return content; } diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index 0e3292cd618..16a5b402926 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -242,7 +242,6 @@ default int seps(Symbol _) = 0; @synsopis{List diff is like text diff on lines; complex and easy to make slow} list[TextEdit] listDiff(loc _span, int seps, list[Tree] originals, list[Tree] replacements) { - assert originals != replacements && originals == []; edits = []; // this algorithm isolates commonalities between the two lists @@ -257,7 +256,7 @@ list[TextEdit] listDiff(loc _span, int seps, list[Tree] originals, list[Tree] re = commonSpecialCases(span, seps, originals, replacements); edits += specialEdits; - equalSubList = largestEqualSubList(originals, replacements); + equalSubList = largestEqualSubList(span, originals, replacements); // by using the (or "a") largest common sublist as a pivot to divide-and-conquer // to the left and right of it, we minimize the number of necessary @@ -284,21 +283,27 @@ Using list matching and backtracking, this algorithm detects which common sublist is the largest. It assumes ((trimEqualElements)) has happened already, and thus there are interesting differences left, even if we remove any equal sublist. + +Note that this is not a general algorithm for Largest Common Subsequence (LCS), since it +uses particular properties of the relation between the original and the replacement list. +* New elements are never equal to old elements (due to source locations) +* Equal prefixes and postfixes may be assumed to be maximal sublists as well (see above). +* Candidate equal sublists always have consecutive source locations from the origin. +* etc. } -list[Tree] largestEqualSubList(list[Tree] originals, list[Tree] replacements) { - assert := trimEqualElements(originals, replacements) : "both lists begin and end with unique elements"; +list[Tree] largestEqualSubList(loc span, list[Tree] originals, list[Tree] replacements) { + // assert := trimEqualElements(originals, replacements) : "both lists begin and end with unique elements"; bool largerList(list[Tree] a, list[Tree] b) = size(a) > size(b); + + bool fromOriginalFile(loc span, Tree last) = span.top == (last@\loc?|unknown:///|).top; - equals = [eq | - [*_, pre, *eq, post, *_] := originals, size(eq) > 0, - [*_, !pre, *eq, !post, *_] := replacements + equals = [[*eq,q] | + [*_, pre, *eq, q, post, *_] := replacements, fromOriginalFile(span, q), + [*_, !pre, *eq, q, !post, *_] := originals ]; - return [largest, *_] := sort(equals, largerList) - ? largest - : [] // no equal sublists detected - ; + return sort(equals, largerList)[0] ? []; } @synopsis{trips equal elements from the front and the back of both lists, if any.} @@ -325,6 +330,13 @@ tuple[list[TextEdit], list[Tree], list[Tree]] commonSpecialCases(loc span, 3, [Tree a, Tree _l1, Tree _sep, Tree _l2, Tree tHead, *Tree tail], [tHead, *tail]) = <[replace(fromUntil(a, tHead), "")], [], []>; +// singleton replacement +tuple[list[TextEdit], list[Tree], list[Tree]] commonSpecialCases(loc span, int _, + [Tree a], [Tree b]) + = ; + +default tuple[list[TextEdit], list[Tree], list[Tree]] commonSpecialCases(loc span, int _, list[Tree] a, list[Tree] b) + = <[], a, b>; @synopsis{convenience overload for shorter code} private loc fromUntil(Tree from, Tree until) = fromUntil(from@\loc, until@\loc); From 3f05df428847e9c9195558674d4f3bbfc3539dd7 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Thu, 10 Oct 2024 11:40:32 +0200 Subject: [PATCH 06/76] added demo --- .../analysis/diff/edits/ExecuteTextEdits.rsc | 1 - .../analysis/diff/edits/HiFiTreeDiff.rsc | 10 ++--- .../rascalmpl/library/lang/pico/HiFiDemo.rsc | 44 +++++++++++++++++++ .../library/lang/pico/examples/flip.pico | 14 ++++++ 4 files changed, 63 insertions(+), 6 deletions(-) create mode 100644 src/org/rascalmpl/library/lang/pico/HiFiDemo.rsc create mode 100644 src/org/rascalmpl/library/lang/pico/examples/flip.pico diff --git a/src/org/rascalmpl/library/analysis/diff/edits/ExecuteTextEdits.rsc b/src/org/rascalmpl/library/analysis/diff/edits/ExecuteTextEdits.rsc index 90dcc727937..0d5388ce802 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/ExecuteTextEdits.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/ExecuteTextEdits.rsc @@ -37,7 +37,6 @@ str executeTextEdits(str content, list[TextEdit] edits) { }); for (replace(loc range, str repl) <- reverse(edits)) { - assert range.top == file.top; content = ""; } diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index 16a5b402926..6e1e718fca7 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -298,12 +298,12 @@ list[Tree] largestEqualSubList(loc span, list[Tree] originals, list[Tree] replac bool fromOriginalFile(loc span, Tree last) = span.top == (last@\loc?|unknown:///|).top; - equals = [[*eq,q] | - [*_, pre, *eq, q, post, *_] := replacements, fromOriginalFile(span, q), - [*_, !pre, *eq, q, !post, *_] := originals - ]; + if ([*_, pre, *Tree eq, post, *_] := replacements, + [*_, !pre, *eq, !post, *_] := originals) { + return eq; + } - return sort(equals, largerList)[0] ? []; + return []; } @synopsis{trips equal elements from the front and the back of both lists, if any.} diff --git a/src/org/rascalmpl/library/lang/pico/HiFiDemo.rsc b/src/org/rascalmpl/library/lang/pico/HiFiDemo.rsc new file mode 100644 index 00000000000..0720c95ea1a --- /dev/null +++ b/src/org/rascalmpl/library/lang/pico/HiFiDemo.rsc @@ -0,0 +1,44 @@ +@synopsis{Demonstrates HiFi source-to-source transformations through concrete syntax rewrites and text edits.} +module lang::pico::HiFiDemo + +import lang::pico::\syntax::Main; +import IO; +import ParseTree; +import analysis::diff::edits::HiFiTreeDiff; +import analysis::diff::edits::ExecuteTextEdits; + +@synopsis{Blindly swaps the branches of all the conditionals in a program} +@description{ +This rule is syntactically correct and has a clear semantics. The +layout of the resulting if-then-else-fi statement is also clear. +} +start[Program] flipConditionals(start[Program] program) = visit(program) { + case (Statement) `if then + ' <{Statement ";"}* ifBranch> + 'else + ' <{Statement ";"}* elseBranch> + 'fi` => + (Statement) `if then + ' <{Statement ";"}* elseBranch> + 'else + ' <{Statement ";"}* ifBranch> + 'fi` +}; + +void main() { + t = parse(#start[Program], |project://rascal/src/org/rascalmpl/library/lang/pico/examples/flip.pico|); + println("The original: + '"); + + u = flipConditionals(t); + println("Branches swapped, comments and indentation lost: + '"); + + edits = treeDiff(t, u); + println("Smaller text edits: + ' "); + + newContent = executeTextEdits("", edits); + println("Better output after executeTextEdits: + '"); +} diff --git a/src/org/rascalmpl/library/lang/pico/examples/flip.pico b/src/org/rascalmpl/library/lang/pico/examples/flip.pico new file mode 100644 index 00000000000..f235085ebcc --- /dev/null +++ b/src/org/rascalmpl/library/lang/pico/examples/flip.pico @@ -0,0 +1,14 @@ +begin + declare + a : natural, + b : natural; + a := 0; + b := 1; + if a then + % comment 1 % + b := a + else + % comment 2 % + a := b + fi +end \ No newline at end of file From ed091f7115db224724eca8232372610bf51b6b4c Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Sun, 13 Oct 2024 16:06:43 +0200 Subject: [PATCH 07/76] exposed IString.indent to String library module to allow users to reuse indentation in O(1) --- src/org/rascalmpl/library/Prelude.java | 4 ++++ src/org/rascalmpl/library/String.rsc | 16 ++++++++++++++++ 2 files changed, 20 insertions(+) diff --git a/src/org/rascalmpl/library/Prelude.java b/src/org/rascalmpl/library/Prelude.java index 61bcf1889c4..e13d0167704 100644 --- a/src/org/rascalmpl/library/Prelude.java +++ b/src/org/rascalmpl/library/Prelude.java @@ -3047,6 +3047,10 @@ public IValue stringChars(IList lst){ return values.string(chars); } + + public IString indent(IString indentation, IString content, IBool indentFirstLine) { + return content.indent(indentation, indentFirstLine.getValue()); + } public IValue charAt(IString s, IInteger i) throws IndexOutOfBoundsException //@doc{charAt -- return the character at position i in string s.} diff --git a/src/org/rascalmpl/library/String.rsc b/src/org/rascalmpl/library/String.rsc index de466de5272..8ce2acfedf8 100644 --- a/src/org/rascalmpl/library/String.rsc +++ b/src/org/rascalmpl/library/String.rsc @@ -627,3 +627,19 @@ str substitute(str src, map[loc,str] s) { order = sort([ k | k <- s ], bool(loc a, loc b) { return a.offset < b.offset; }); return ( src | subst1(it, x, s[x]) | x <- order ); } + +@synopsis{Indent a block of text} +@description{ +Every line in `content` will be indented using the characters +of `indentation`. +} +@benefits{ +* This operation executes in constant time, independent of the size of the content +or the indentation. +* Indent is the identity function if `indentation == ""` +} +@pitfalls{ +* This function works fine if `indentation` is not spaces or tabs; but it does not make much sense. +} +@javaClass{org.rascalmpl.library.Prelude} +java str indent(str indentation, str content, bool indentFirstLine=false); \ No newline at end of file From 2462eeb16a03a1d806fcf770a839895c3074d140 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Sun, 13 Oct 2024 16:06:57 +0200 Subject: [PATCH 08/76] slow progress on the diff algorithm --- .../analysis/diff/edits/HiFiTreeDiff.rsc | 18 +++++++++++++----- .../rascalmpl/library/lang/pico/HiFiDemo.rsc | 4 ++-- 2 files changed, 15 insertions(+), 7 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index 6e1e718fca7..39bda42f896 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -272,8 +272,10 @@ list[TextEdit] listDiff(loc _span, int seps, list[Tree] originals, list[Tree] re + listDiff(cover(postO), seps, postO, postR) ; } - else { // nothing in common means we can replace the entire list - return edits + replace(span, learnIndentation(yield(replacements), yield(originals))); + else { + // covered all the cases + assert originals := replacements; + return edits; } } @@ -308,10 +310,10 @@ list[Tree] largestEqualSubList(loc span, list[Tree] originals, list[Tree] replac @synopsis{trips equal elements from the front and the back of both lists, if any.} tuple[list[Tree], list[Tree]] trimEqualElements([Tree a, *Tree aPostfix], [ a, *Tree bPostfix]) - = ; + = trimEqualElements(aPostfix, bPostfix); tuple[list[Tree], list[Tree]] trimEqualElements([*Tree aPrefix, Tree a], [*Tree bPrefix, a]) - = ; + = trimEqualElements(aPrefix, bPrefix); default tuple[list[Tree], list[Tree]] trimEqualElements(list[Tree] a, list[Tree] b) = ; @@ -358,4 +360,10 @@ private loc cover(list[Tree] elems) = cover([e@\loc | e <- elems, e@\loc?]); @synopsis{yield a consecutive list of trees} private str yield(list[Tree] elems) = "<}>"; -private str learnIndentation(str replacement, str original) = replacement; // TODO: learn minimal indentaton from original +private str learnIndentation(str replacement, str original) { + list[str] indents(str text) = [indent | // <- split(text, "\n")]; + + str minIndent = sort(indents(original)[1..])[0]? ""; + + return indent(minIndent, replacement); +} diff --git a/src/org/rascalmpl/library/lang/pico/HiFiDemo.rsc b/src/org/rascalmpl/library/lang/pico/HiFiDemo.rsc index 0720c95ea1a..7be0812f81d 100644 --- a/src/org/rascalmpl/library/lang/pico/HiFiDemo.rsc +++ b/src/org/rascalmpl/library/lang/pico/HiFiDemo.rsc @@ -35,8 +35,8 @@ void main() { '"); edits = treeDiff(t, u); - println("Smaller text edits: - ' "); + println("Smaller text edits:"); + iprintln(edits); newContent = executeTextEdits("", edits); println("Better output after executeTextEdits: From 8abdbd66dba0dcff47eef9d055178a3a77c88acc Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Sun, 13 Oct 2024 16:29:41 +0200 Subject: [PATCH 09/76] more complex example, and debug prints --- .../analysis/diff/edits/HiFiTreeDiff.rsc | 19 ++++++++++++++----- .../library/lang/pico/examples/flip.pico | 11 +++++++++-- 2 files changed, 23 insertions(+), 7 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index 39bda42f896..9a3d3239853 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -87,6 +87,7 @@ import ParseTree; import List; import String; import Location; +import IO; @synopsis{Detects minimal differences between parse trees and makes them explicit as ((TextEdit)) instructions.} @description{ @@ -241,7 +242,10 @@ int seps(\iter-star-seps(_,list[Symbol] s)) = size(s); default int seps(Symbol _) = 0; @synsopis{List diff is like text diff on lines; complex and easy to make slow} -list[TextEdit] listDiff(loc _span, int seps, list[Tree] originals, list[Tree] replacements) { +list[TextEdit] listDiff(loc span, int seps, list[Tree] originals, list[Tree] replacements) { + println(" listDiff: + ' + ' "); edits = []; // this algorithm isolates commonalities between the two lists @@ -272,11 +276,12 @@ list[TextEdit] listDiff(loc _span, int seps, list[Tree] originals, list[Tree] re + listDiff(cover(postO), seps, postO, postR) ; } - else { - // covered all the cases - assert originals := replacements; + else if (originals := replacements) { return edits; } + else { + return edits + [replace(span, learnIndentation(yield(replacements), yield(originals)))]; + } } @synopsis{Finds the largest sublist that occurs in both lists} @@ -361,9 +366,13 @@ private loc cover(list[Tree] elems) = cover([e@\loc | e <- elems, e@\loc?]); private str yield(list[Tree] elems) = "<}>"; private str learnIndentation(str replacement, str original) { - list[str] indents(str text) = [indent | // <- split(text, "\n")]; + println("learning: + ' + ' "); + list[str] indents(str text) = [indent | /^[^\ \t]/ <- split(text, "\n")]; str minIndent = sort(indents(original)[1..])[0]? ""; + println("minIndent []"); return indent(minIndent, replacement); } diff --git a/src/org/rascalmpl/library/lang/pico/examples/flip.pico b/src/org/rascalmpl/library/lang/pico/examples/flip.pico index f235085ebcc..2bd7685a354 100644 --- a/src/org/rascalmpl/library/lang/pico/examples/flip.pico +++ b/src/org/rascalmpl/library/lang/pico/examples/flip.pico @@ -6,9 +6,16 @@ begin b := 1; if a then % comment 1 % - b := a + b := a; + x := 1 else % comment 2 % - a := b + a := b; + if b then + z := a + else + z := b + fi; + z := z fi end \ No newline at end of file From 4a55110f95f685b33db7cd0c4e39671fda10d61d Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Sun, 13 Oct 2024 17:48:42 +0200 Subject: [PATCH 10/76] finetunes stuff in indentation learner --- .../analysis/diff/edits/HiFiTreeDiff.rsc | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index 9a3d3239853..555287f7225 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -369,10 +369,21 @@ private str learnIndentation(str replacement, str original) { println("learning: ' ' "); - list[str] indents(str text) = [indent | /^[^\ \t]/ <- split(text, "\n")]; + list[str] indents(str text) = [indent | /^[^\ \t]/ <- split("\n", text)]; - str minIndent = sort(indents(original)[1..])[0]? ""; + origIndents = indents(original); + replLines = split("\n", replacement); - println("minIndent []"); - return indent(minIndent, replacement); + if (replLines == []) { + return ""; + } + + minIndent = sort(origIndents[1..])[0]? ""; + + stripped = [ /^$/ := line ? rest : line | line <- replLines[1..]]; + + indented = [replLines[0], *[ indent(minIndent, line, indentFirstLine=true) | line <- stripped]]; + + return " + '<}>"[..-1]; } From 97eb3529a634c66d1203e2822a9dd01b48a10519 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Tue, 15 Oct 2024 09:47:30 +0200 Subject: [PATCH 11/76] testing --- .../analysis/diff/edits/HiFiTreeDiff.rsc | 47 ++++++++++++------- .../rascalmpl/library/lang/pico/HiFiDemo.rsc | 4 ++ .../library/lang/pico/examples/flip.pico | 2 +- 3 files changed, 35 insertions(+), 18 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index 555287f7225..badc4333c34 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -216,14 +216,14 @@ list[TextEdit] treeDiff( list[TextEdit] treeDiff( t:appl(prod(lex(str l), _, _), list[Tree] _), r:appl(prod(lex(l) , _, _), list[Tree] _)) - = [replace(t@\loc, learnIndentation("", ""))] + = [replace(t@\loc, learnIndentation(t@\loc, "", ""))] when t != r; // When the productions are different, we've found an edit, and there is no need to recurse deeper. default list[TextEdit] treeDiff( t:appl(Production p:prod(_,_,_), list[Tree] _), r:appl(Production q:!p , list[Tree] _)) - = [replace(t@\loc, learnIndentation("", ""))]; + = [replace(t@\loc, learnIndentation(t@\loc, "", ""))]; // If list production are the same, then the element lists can still be of different length // and we switch to listDiff which has different heuristics than normal trees. @@ -243,9 +243,9 @@ default int seps(Symbol _) = 0; @synsopis{List diff is like text diff on lines; complex and easy to make slow} list[TextEdit] listDiff(loc span, int seps, list[Tree] originals, list[Tree] replacements) { - println(" listDiff: - ' - ' "); + // println(" listDiff: + // ' + // ' "); edits = []; // this algorithm isolates commonalities between the two lists @@ -280,7 +280,7 @@ list[TextEdit] listDiff(loc span, int seps, list[Tree] originals, list[Tree] rep return edits; } else { - return edits + [replace(span, learnIndentation(yield(replacements), yield(originals)))]; + return edits + [replace(span, learnIndentation(span, yield(replacements), yield(originals)))]; } } @@ -365,10 +365,15 @@ private loc cover(list[Tree] elems) = cover([e@\loc | e <- elems, e@\loc?]); @synopsis{yield a consecutive list of trees} private str yield(list[Tree] elems) = "<}>"; -private str learnIndentation(str replacement, str original) { - println("learning: - ' - ' "); +@synopsis{Make sure the subtitution is at least as far indented as the original} +@description{ +This algorithm ignores the first line, since the first line is always preceeded by the layout of a parent node. + +Then it measures the depth of indentation of every line in the original, and takes the minimum. +That minimum indentation is stripped off every line that already has that much indentation in the replacement, +and then _all_ lines are re-indented with the discovered minimum. +} +private str learnIndentation(loc span, str replacement, str original) { list[str] indents(str text) = [indent | /^[^\ \t]/ <- split("\n", text)]; origIndents = indents(original); @@ -378,12 +383,20 @@ private str learnIndentation(str replacement, str original) { return ""; } - minIndent = sort(origIndents[1..])[0]? ""; - - stripped = [ /^$/ := line ? rest : line | line <- replLines[1..]]; - - indented = [replLines[0], *[ indent(minIndent, line, indentFirstLine=true) | line <- stripped]]; + minIndent = ""; + if ([_] := origIndents) { + // only one line. have to invent indentation from span + minIndent = " <}>"; + } + else { + minIndent = sort(origIndents[1..])[0]? ""; + } + + println("min: []"); + stripped = [ /^$/ := line ? rest : line | line <- replLines]; - return " - '<}>"[..-1]; + println("stripped:"); + iprintln(stripped); + return indent(minIndent, " + '<}>"[..-1]); } diff --git a/src/org/rascalmpl/library/lang/pico/HiFiDemo.rsc b/src/org/rascalmpl/library/lang/pico/HiFiDemo.rsc index 7be0812f81d..3decb0f5c00 100644 --- a/src/org/rascalmpl/library/lang/pico/HiFiDemo.rsc +++ b/src/org/rascalmpl/library/lang/pico/HiFiDemo.rsc @@ -41,4 +41,8 @@ void main() { newContent = executeTextEdits("", edits); println("Better output after executeTextEdits: '"); + + newU = parse(#start[Program], newContent); + + assert u := newU : "the rewritten tree matches the newly parsed"; } diff --git a/src/org/rascalmpl/library/lang/pico/examples/flip.pico b/src/org/rascalmpl/library/lang/pico/examples/flip.pico index 2bd7685a354..63f58c62b40 100644 --- a/src/org/rascalmpl/library/lang/pico/examples/flip.pico +++ b/src/org/rascalmpl/library/lang/pico/examples/flip.pico @@ -7,7 +7,7 @@ begin if a then % comment 1 % b := a; - x := 1 + z := z else % comment 2 % a := b; From fd6ccbb59278498d2fb0857653d8b8135c380bfa Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Thu, 9 Jan 2025 11:56:51 +0100 Subject: [PATCH 12/76] fixed nasty bug in Type.intersection w.r.t. parameter types --- src/org/rascalmpl/types/NonTerminalType.java | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/org/rascalmpl/types/NonTerminalType.java b/src/org/rascalmpl/types/NonTerminalType.java index 20094a487ef..8ff2293f257 100644 --- a/src/org/rascalmpl/types/NonTerminalType.java +++ b/src/org/rascalmpl/types/NonTerminalType.java @@ -346,6 +346,9 @@ public boolean intersects(Type other) { if (other == RascalValueFactory.Tree) { return true; } + else if (other.isParameter()) { + return other.intersects(this); + } else if (other instanceof NonTerminalType) { return ((NonTerminalType) other).intersectsWithNonTerminal(this); } From 1c0a81d263438faf387c0340d8a4d32fc7903239 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Thu, 9 Jan 2025 12:00:52 +0100 Subject: [PATCH 13/76] started on testing HiFiTreeDiff --- .../analysis/diff/edits/HiFiTreeDiff.rsc | 3 - .../analysis/diff/edits/HiFiTreeDiffTests.rsc | 56 +++++++++++++++++++ 2 files changed, 56 insertions(+), 3 deletions(-) create mode 100644 src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index badc4333c34..248514b6a90 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -392,11 +392,8 @@ private str learnIndentation(loc span, str replacement, str original) { minIndent = sort(origIndents[1..])[0]? ""; } - println("min: []"); stripped = [ /^$/ := line ? rest : line | line <- replLines]; - println("stripped:"); - iprintln(stripped); return indent(minIndent, " '<}>"[..-1]); } diff --git a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc new file mode 100644 index 00000000000..191fb877b9a --- /dev/null +++ b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc @@ -0,0 +1,56 @@ +module lang::rascal::tests::library::analysis::diff::edits::HiFiTreeDiffTests + +extend analysis::diff::edits::ExecuteTextEdits; +extend analysis::diff::edits::HiFiTreeDiff; +extend lang::pico::\syntax::Main; + +import ParseTree; +import IO; + +public str simpleExample + = "begin + ' declare + ' a : natural, + ' b : natural; + ' a := a + b; + ' b := a - b; + ' a := a - b + 'end + '"; + +@synopsis{Specification of what it means for `treeDiff` to be syntactically correct} +@description{ +TreeDiff is syntactically correct if: +* The tree after rewriting _matches_ the tree after applying the edits tot the source text and parsing that. +* Note that _matching_ ignores case-insensitive literals and layout, indentation and comments +} +bool editsAreSyntacticallyCorrect(type[&T<:Tree] grammar, str example, (&T<:Tree)(&T<:Tree) transform) { + println("Transforming: + '"); + orig = parse(grammar, example); + transformed = transform(orig); + println("Transformed: + '"); + edits = treeDiff(orig, transformed); + println("Edits: + '"); + edited = executeTextEdits(example, edits); + println("Edited: + '"); + + // the edited text should produce a tree that matches the rewritten tree + return transformed := parse(grammar, edited); +} + +(&X<:Tree) identity(&X<:Tree x) = x; + +start[Program] swapAB(start[Program] p) = visit(p) { + case (Id) `a` => (Id) `b` + case (Id) `b` => (Id) `a` +}; + +test bool nulTestWithId() + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, identity); + +test bool simpleSwapper() + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, swapAB); From 9c644589b66ba15fc9dfaac46153079d82b432a6 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Thu, 9 Jan 2025 12:13:22 +0100 Subject: [PATCH 14/76] minor improvements --- .../analysis/diff/edits/HiFiTreeDiffTests.rsc | 47 +++++++++++++------ 1 file changed, 32 insertions(+), 15 deletions(-) diff --git a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc index 191fb877b9a..6ed4f8f9172 100644 --- a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc +++ b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc @@ -4,8 +4,9 @@ extend analysis::diff::edits::ExecuteTextEdits; extend analysis::diff::edits::HiFiTreeDiff; extend lang::pico::\syntax::Main; -import ParseTree; import IO; +import ParseTree; +import String; public str simpleExample = "begin @@ -25,23 +26,38 @@ TreeDiff is syntactically correct if: * Note that _matching_ ignores case-insensitive literals and layout, indentation and comments } bool editsAreSyntacticallyCorrect(type[&T<:Tree] grammar, str example, (&T<:Tree)(&T<:Tree) transform) { - println("Transforming: - '"); - orig = parse(grammar, example); + orig = parse(grammar, example); transformed = transform(orig); - println("Transformed: - '"); - edits = treeDiff(orig, transformed); - println("Edits: - '"); - edited = executeTextEdits(example, edits); - println("Edited: - '"); - - // the edited text should produce a tree that matches the rewritten tree + edits = treeDiff(orig, transformed); + edited = executeTextEdits(example, edits); + return transformed := parse(grammar, edited); } +@synopsis{Extract the leading spaces of each line of code} +list[str] indentationLevels(str example) + = [ i | /^[^\ ]*/ <- split("\n", example)]; + +@synopsis{In many cases, but not always, treeDiff maintains the indentation levels} +@description{ +Typically when a rewrite does not change the lines of code count, +and when the structure of the statements remains comparable, treeDiff +can guarantee that the indentation of a file remains unchanged, even if +significant changes to the code have been made. +} +@pitfalls{ +* This specification is not true for any transformation. Only apply it to +a test case if you can expect indentation-preservation for _the entire file_. +} +bool editsMaintainIndentationLevels(type[&T<:Tree] grammar, str example, (&T<:Tree)(&T<:Tree) transform) { + orig = parse(grammar, example); + transformed = transform(orig); + edits = treeDiff(orig, transformed); + edited = executeTextEdits(example, edits); + + return indentationLevels(example) == indentationLevels(edited); +} + (&X<:Tree) identity(&X<:Tree x) = x; start[Program] swapAB(start[Program] p) = visit(p) { @@ -53,4 +69,5 @@ test bool nulTestWithId() = editsAreSyntacticallyCorrect(#start[Program], simpleExample, identity); test bool simpleSwapper() - = editsAreSyntacticallyCorrect(#start[Program], simpleExample, swapAB); + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, swapAB) + && editsMaintainIndentationLevels(#start[Program], simpleExample, swapAB); From 71a1c00338a69c5af97807c55fc0556c95191cc0 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Thu, 9 Jan 2025 15:37:30 +0100 Subject: [PATCH 15/76] fixed bug in list diff --- .../analysis/diff/edits/HiFiTreeDiff.rsc | 32 ++++++++++------ .../analysis/diff/edits/HiFiTreeDiffTests.rsc | 37 ++++++++++++++++++- 2 files changed, 56 insertions(+), 13 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index 248514b6a90..22eca29d3e4 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -230,11 +230,11 @@ default list[TextEdit] treeDiff( list[TextEdit] treeDiff( Tree t:appl(Production p:regular(Symbol reg), list[Tree] aElems), appl(p, list[Tree] bElems)) - = listDiff(t@\loc, seps(reg), aElems, bElems); + = listDiff(t@\loc, seps(reg), aElems, bElems) when bprintln("diving into

"); // When the productions are equal, but the children may be different, we dig deeper for differences default list[TextEdit] treeDiff(appl(Production p, list[Tree] argsA), appl(p, list[Tree] argsB)) - = [*treeDiff(a, b) | <- zip2(argsA, argsB)]; + = [*treeDiff(a, b) | <- zip2(argsA, argsB)] when bprintln("diving into

"); @synopsis{decide how many separators we have} int seps(\iter-seps(_,list[Symbol] s)) = size(s); @@ -254,9 +254,9 @@ list[TextEdit] listDiff(loc span, int seps, list[Tree] originals, list[Tree] rep // the edits are minimized. Note that we float on source location parameters // not only for the edit locations but also for sub-tree identity. - = trimEqualElements(originals, replacements); - span = cover([orig@\loc | orig <- originals, orig@\loc?]); - + println("span before trim: , size originals "); + = trimEqualElements(span, originals, replacements); + println("span after trim: , size originals "); = commonSpecialCases(span, seps, originals, replacements); edits += specialEdits; @@ -314,14 +314,14 @@ list[Tree] largestEqualSubList(loc span, list[Tree] originals, list[Tree] replac } @synopsis{trips equal elements from the front and the back of both lists, if any.} -tuple[list[Tree], list[Tree]] trimEqualElements([Tree a, *Tree aPostfix], [ a, *Tree bPostfix]) - = trimEqualElements(aPostfix, bPostfix); +tuple[loc, list[Tree], list[Tree]] trimEqualElements(loc span, [Tree a, *Tree aPostfix], [ a, *Tree bPostfix]) + = trimEqualElements(endCover(span, aPostfix), aPostfix, bPostfix); -tuple[list[Tree], list[Tree]] trimEqualElements([*Tree aPrefix, Tree a], [*Tree bPrefix, a]) - = trimEqualElements(aPrefix, bPrefix); +tuple[loc, list[Tree], list[Tree]] trimEqualElements([*Tree aPrefix, Tree a], [*Tree bPrefix, a]) + = trimEqualElements(beginCover(span, aPrefix), aPrefix, bPrefix); -default tuple[list[Tree], list[Tree]] trimEqualElements(list[Tree] a, list[Tree] b) - = ; +default tuple[loc, list[Tree], list[Tree]] trimEqualElements(loc span, list[Tree] a, list[Tree] b) + = ; // only one element removed in front, then we are done tuple[list[TextEdit], list[Tree], list[Tree]] commonSpecialCases(loc span, 0, [Tree a, *Tree tail], [*tail]) @@ -360,7 +360,15 @@ up to `until`. private loc fromUntil(loc from, loc until) = from.top(from.offset, until.offset - from.offset); private int end(loc src) = src.offset + src.length; -private loc cover(list[Tree] elems) = cover([e@\loc | e <- elems, e@\loc?]); +private loc endCover(loc span, []) = span(span.offset + span.length, 0); +private loc endCover(loc span, [Tree x]) = x@\loc; +private default loc endCover(loc span, list[Tree] l) = cover(l); + +private loc beginCover(loc span, []) = span(span.offset, 0); +private loc beginCover(loc span, [Tree x]) = x@\loc; +private default loc beginCover(loc span, list[Tree] l) = cover(l); + +private loc cover(list[Tree] elems:[_, *_]) = cover([e@\loc | Tree e <- elems, e@\loc?]); @synopsis{yield a consecutive list of trees} private str yield(list[Tree] elems) = "<}>"; diff --git a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc index 6ed4f8f9172..0a073e6dd62 100644 --- a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc +++ b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc @@ -29,9 +29,18 @@ bool editsAreSyntacticallyCorrect(type[&T<:Tree] grammar, str example, (&T<:Tree orig = parse(grammar, example); transformed = transform(orig); edits = treeDiff(orig, transformed); + println("derived edits:"); + iprintln(edits); edited = executeTextEdits(example, edits); - return transformed := parse(grammar, edited); + try { + return transformed := parse(grammar, edited); + } + catch ParseError(loc l): { + println("Parse error in:"); + println(edited); + return false; + } } @synopsis{Extract the leading spaces of each line of code} @@ -65,9 +74,35 @@ start[Program] swapAB(start[Program] p) = visit(p) { case (Id) `b` => (Id) `a` }; +start[Program] addDeclarationToEnd(start[Program] p) = visit(p) { + case (Program) `begin declare <{IdType ","}* decls>; <{Statement ";"}* body> end` + => (Program) `begin + ' declare + ' <{IdType ","}* decls>, + ' c : natural; + ' <{Statement ";"}* body> + 'end` +}; + +start[Program] addDeclarationToStart(start[Program] p) = visit(p) { + case (Program) `begin declare <{IdType ","}* decls>; <{Statement ";"}* body> end` + => (Program) `begin + ' declare + ' c : natural, + ' <{IdType ","}* decls>; + ' <{Statement ";"}* body> + 'end` +}; + test bool nulTestWithId() = editsAreSyntacticallyCorrect(#start[Program], simpleExample, identity); test bool simpleSwapper() = editsAreSyntacticallyCorrect(#start[Program], simpleExample, swapAB) && editsMaintainIndentationLevels(#start[Program], simpleExample, swapAB); + +test bool addDeclarationToEndTest() + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToEnd); + +test bool addDeclarationToStartTest() + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToStart); From 26777955b4652b8a585f0b1f285c5e4dc448692c Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Thu, 9 Jan 2025 15:40:45 +0100 Subject: [PATCH 16/76] oops --- src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index 22eca29d3e4..8f423b4d073 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -317,7 +317,7 @@ list[Tree] largestEqualSubList(loc span, list[Tree] originals, list[Tree] replac tuple[loc, list[Tree], list[Tree]] trimEqualElements(loc span, [Tree a, *Tree aPostfix], [ a, *Tree bPostfix]) = trimEqualElements(endCover(span, aPostfix), aPostfix, bPostfix); -tuple[loc, list[Tree], list[Tree]] trimEqualElements([*Tree aPrefix, Tree a], [*Tree bPrefix, a]) +tuple[loc, list[Tree], list[Tree]] trimEqualElements(loc span, [*Tree aPrefix, Tree a], [*Tree bPrefix, a]) = trimEqualElements(beginCover(span, aPrefix), aPrefix, bPrefix); default tuple[loc, list[Tree], list[Tree]] trimEqualElements(loc span, list[Tree] a, list[Tree] b) From 091b0b942c8f7e14dcd63c3383682205f5cff2e0 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Thu, 9 Jan 2025 19:39:43 +0100 Subject: [PATCH 17/76] simplified and repaired equal sublist detection --- .../analysis/diff/edits/HiFiTreeDiff.rsc | 28 ++++++++----------- .../analysis/diff/edits/HiFiTreeDiffTests.rsc | 14 ++++++++++ 2 files changed, 25 insertions(+), 17 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index 8f423b4d073..e10ad351b27 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -259,8 +259,12 @@ list[TextEdit] listDiff(loc span, int seps, list[Tree] originals, list[Tree] rep println("span after trim: , size originals "); = commonSpecialCases(span, seps, originals, replacements); edits += specialEdits; + println("special edits:"); + iprintln(edits); - equalSubList = largestEqualSubList(span, originals, replacements); + equalSubList = largestEqualSubList(originals, replacements); + println("equal sublist:"); + println(yield(equalSubList)); // by using the (or "a") largest common sublist as a pivot to divide-and-conquer // to the left and right of it, we minimize the number of necessary @@ -272,8 +276,8 @@ list[TextEdit] listDiff(loc span, int seps, list[Tree] originals, list[Tree] rep // we align the prefixes and the postfixes and // continue recursively. return edits - + listDiff(cover(preO), seps, preO, preR) - + listDiff(cover(postO), seps, postO, postR) + + listDiff(beginCover(span, preO), seps, preO, preR) + + listDiff(endCover(span, postO), seps, postO, postR) ; } else if (originals := replacements) { @@ -298,20 +302,10 @@ uses particular properties of the relation between the original and the replacem * Candidate equal sublists always have consecutive source locations from the origin. * etc. } -list[Tree] largestEqualSubList(loc span, list[Tree] originals, list[Tree] replacements) { - // assert := trimEqualElements(originals, replacements) : "both lists begin and end with unique elements"; - - bool largerList(list[Tree] a, list[Tree] b) = size(a) > size(b); - - bool fromOriginalFile(loc span, Tree last) = span.top == (last@\loc?|unknown:///|).top; - - if ([*_, pre, *Tree eq, post, *_] := replacements, - [*_, !pre, *eq, !post, *_] := originals) { - return eq; - } - - return []; -} +list[Tree] largestEqualSubList([*Tree sub], [*_, *sub, *_]) = sub; +list[Tree] largestEqualSubList([*_, *sub, *_], [*Tree sub]) = sub; +list[Tree] largestEqualSubList([*_, *sub, *_], [*_, *Tree sub, *_]) = sub; +default list[Tree] largestEqualSubList(list[Tree] _orig, list[Tree] _repl) = []; @synopsis{trips equal elements from the front and the back of both lists, if any.} tuple[loc, list[Tree], list[Tree]] trimEqualElements(loc span, [Tree a, *Tree aPostfix], [ a, *Tree bPostfix]) diff --git a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc index 0a073e6dd62..80060cf59ae 100644 --- a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc +++ b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc @@ -94,6 +94,17 @@ start[Program] addDeclarationToStart(start[Program] p) = visit(p) { 'end` }; +start[Program] addDeclarationToStartAndEnd(start[Program] p) = visit(p) { + case (Program) `begin declare <{IdType ","}* decls>; <{Statement ";"}* body> end` + => (Program) `begin + ' declare + ' x : natural, + ' <{IdType ","}* decls>, + ' y : natural; + ' <{Statement ";"}* body> + 'end` +}; + test bool nulTestWithId() = editsAreSyntacticallyCorrect(#start[Program], simpleExample, identity); @@ -106,3 +117,6 @@ test bool addDeclarationToEndTest() test bool addDeclarationToStartTest() = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToStart); + +test bool addDeclarationToStartAndEndTest() + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToStartAndEnd); From ed1ad0335794659d9ece5ecceef121b5253a2c23 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Fri, 10 Jan 2025 16:29:32 +0100 Subject: [PATCH 18/76] finding more nested similarity under list elements --- .../analysis/diff/edits/HiFiTreeDiff.rsc | 22 ++++++++++++++----- 1 file changed, 16 insertions(+), 6 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index e10ad351b27..d882fbfb209 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -283,8 +283,15 @@ list[TextEdit] listDiff(loc span, int seps, list[Tree] originals, list[Tree] rep else if (originals := replacements) { return edits; } - else { - return edits + [replace(span, learnIndentation(span, yield(replacements), yield(originals)))]; + else if (size(originals) == size(replacements)) { + return edits + + [*treeDiff(a, b) | <- zip2(originals, replacements)]; + ; + } else { + // TODO: make cases for shortering or lenghtening a list but + // mixing the common prefix with `treeDiff` to find more nested sharing + return edits + + [replace(span, learnIndentation(span, yield(replacements), yield(originals)))]; } } @@ -304,17 +311,20 @@ uses particular properties of the relation between the original and the replacem } list[Tree] largestEqualSubList([*Tree sub], [*_, *sub, *_]) = sub; list[Tree] largestEqualSubList([*_, *sub, *_], [*Tree sub]) = sub; -list[Tree] largestEqualSubList([*_, *sub, *_], [*_, *Tree sub, *_]) = sub; +list[Tree] largestEqualSubList([*_, p, *sub, q, *_], [*_, !p, *Tree sub, !q, *_]) = sub; default list[Tree] largestEqualSubList(list[Tree] _orig, list[Tree] _repl) = []; @synopsis{trips equal elements from the front and the back of both lists, if any.} -tuple[loc, list[Tree], list[Tree]] trimEqualElements(loc span, [Tree a, *Tree aPostfix], [ a, *Tree bPostfix]) +tuple[loc, list[Tree], list[Tree]] trimEqualElements(loc span, + [Tree a, *Tree aPostfix], [ a, *Tree bPostfix]) = trimEqualElements(endCover(span, aPostfix), aPostfix, bPostfix); -tuple[loc, list[Tree], list[Tree]] trimEqualElements(loc span, [*Tree aPrefix, Tree a], [*Tree bPrefix, a]) +tuple[loc, list[Tree], list[Tree]] trimEqualElements(loc span, + [*Tree aPrefix, Tree a], [*Tree bPrefix, a]) = trimEqualElements(beginCover(span, aPrefix), aPrefix, bPrefix); -default tuple[loc, list[Tree], list[Tree]] trimEqualElements(loc span, list[Tree] a, list[Tree] b) +default tuple[loc, list[Tree], list[Tree]] trimEqualElements(loc span, + list[Tree] a, list[Tree] b) = ; // only one element removed in front, then we are done From cf798ec8a0e5372e05b459d59c057a1105436f3a Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Sat, 11 Jan 2025 11:40:08 +0100 Subject: [PATCH 19/76] Finishes HiFiTreeDiff algorithm This finishes the complete algorithm for lists for the first time. The algorithm works in these steps: * Trim equal elements from the head and the tail of both lists * Detect common edits to lists with fast list patterns; this is an optional optimization * Find the latest common sublist and split both lists in three parts: two different prefixes, two equal middle parts and two different post fixes. Recurse on the prefixes and the postfixes and concatenate their edits lists. * Finally we end up with two empty lists or two lists without common elements; we collect the differences of each element position pairwise. Lists that became shorter get an additional edit to cut off the list, while lists that became shorter get one additional edit to add the new elements. The new elements inherit indentation from the pre-existing elements. For these changes additional tests still must be added later. --- .../analysis/diff/edits/HiFiTreeDiff.rsc | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index d882fbfb209..f9117ca720e 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -280,18 +280,21 @@ list[TextEdit] listDiff(loc span, int seps, list[Tree] originals, list[Tree] rep + listDiff(endCover(span, postO), seps, postO, postR) ; } - else if (originals := replacements) { + else if (originals == [], replacements == []) { return edits; } - else if (size(originals) == size(replacements)) { + else { + // here we know there are no common elements anymore, only a common amount of different elements + common = min(size(originals), size(replacements)); + return edits - + [*treeDiff(a, b) | <- zip2(originals, replacements)]; + // first the minimal length pairwise replacements, essential for finding accidental commonalities + + [*treeDiff(a, b) | <- zip2(originals[..common], replacements[..common])]; + // then we either remove the tail that became shorter: + + [replace(cover(end(last), cover(originals[cover+1..])), "") | size(originals) > size(replacements), [*_, last] := originals[..common]] + // or we add new elements to the end, while inheriting indentation from the originals: + + [replace(end(last), learnIndentation(span, yield(replacements[common+1..]), yield(originals))) | size(originals) < size(replacements)] ; - } else { - // TODO: make cases for shortering or lenghtening a list but - // mixing the common prefix with `treeDiff` to find more nested sharing - return edits - + [replace(span, learnIndentation(span, yield(replacements), yield(originals)))]; } } From c84a9e2e62f93deaf98364e6fdba5f4d08a61424 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Sat, 11 Jan 2025 15:46:48 +0100 Subject: [PATCH 20/76] fixed omision in ResultFactory for ComposedFunctions --- src/org/rascalmpl/interpreter/result/ResultFactory.java | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/org/rascalmpl/interpreter/result/ResultFactory.java b/src/org/rascalmpl/interpreter/result/ResultFactory.java index 4607b274ce0..d30ecc94ef5 100644 --- a/src/org/rascalmpl/interpreter/result/ResultFactory.java +++ b/src/org/rascalmpl/interpreter/result/ResultFactory.java @@ -209,6 +209,9 @@ else if (value instanceof OverloadedFunction) { return (OverloadedFunction) value; } } + else if (value instanceof ComposedFunctionResult) { + return (Result) value; + } else { // otherwise this is an abstract ICalleableValue // for which no further operations are defined? From af081f597e811b78f178224cd96b417a9d85b910 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Sat, 11 Jan 2025 16:22:20 +0100 Subject: [PATCH 21/76] added more tests, fixed some issues --- .../analysis/diff/edits/HiFiTreeDiff.rsc | 34 ++++++++----------- .../analysis/diff/edits/HiFiTreeDiffTests.rsc | 26 ++++++-------- 2 files changed, 26 insertions(+), 34 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index f9117ca720e..742326421ee 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -88,6 +88,7 @@ import List; import String; import Location; import IO; +import util::Math; @synopsis{Detects minimal differences between parse trees and makes them explicit as ((TextEdit)) instructions.} @description{ @@ -230,11 +231,11 @@ default list[TextEdit] treeDiff( list[TextEdit] treeDiff( Tree t:appl(Production p:regular(Symbol reg), list[Tree] aElems), appl(p, list[Tree] bElems)) - = listDiff(t@\loc, seps(reg), aElems, bElems) when bprintln("diving into

"); + = listDiff(t@\loc, seps(reg), aElems, bElems); // When the productions are equal, but the children may be different, we dig deeper for differences default list[TextEdit] treeDiff(appl(Production p, list[Tree] argsA), appl(p, list[Tree] argsB)) - = [*treeDiff(a, b) | <- zip2(argsA, argsB)] when bprintln("diving into

"); + = [*treeDiff(a, b) | <- zip2(argsA, argsB)]; @synopsis{decide how many separators we have} int seps(\iter-seps(_,list[Symbol] s)) = size(s); @@ -243,9 +244,6 @@ default int seps(Symbol _) = 0; @synsopis{List diff is like text diff on lines; complex and easy to make slow} list[TextEdit] listDiff(loc span, int seps, list[Tree] originals, list[Tree] replacements) { - // println(" listDiff: - // ' - // ' "); edits = []; // this algorithm isolates commonalities between the two lists @@ -254,18 +252,13 @@ list[TextEdit] listDiff(loc span, int seps, list[Tree] originals, list[Tree] rep // the edits are minimized. Note that we float on source location parameters // not only for the edit locations but also for sub-tree identity. - println("span before trim: , size originals "); = trimEqualElements(span, originals, replacements); - println("span after trim: , size originals "); + = commonSpecialCases(span, seps, originals, replacements); edits += specialEdits; - println("special edits:"); - iprintln(edits); equalSubList = largestEqualSubList(originals, replacements); - println("equal sublist:"); - println(yield(equalSubList)); - + // by using the (or "a") largest common sublist as a pivot to divide-and-conquer // to the left and right of it, we minimize the number of necessary // edit actions for the entire list. @@ -275,6 +268,7 @@ list[TextEdit] listDiff(loc span, int seps, list[Tree] originals, list[Tree] rep // TODO: what about the separators? // we align the prefixes and the postfixes and // continue recursively. + return edits + listDiff(beginCover(span, preO), seps, preO, preR) + listDiff(endCover(span, postO), seps, postO, postR) @@ -285,15 +279,15 @@ list[TextEdit] listDiff(loc span, int seps, list[Tree] originals, list[Tree] rep } else { // here we know there are no common elements anymore, only a common amount of different elements - common = min(size(originals), size(replacements)); - + common = min([size(originals), size(replacements)]); + return edits // first the minimal length pairwise replacements, essential for finding accidental commonalities - + [*treeDiff(a, b) | <- zip2(originals[..common], replacements[..common])]; + + [*treeDiff(a, b) | <- zip2(originals[..common], replacements[..common])] // then we either remove the tail that became shorter: - + [replace(cover(end(last), cover(originals[cover+1..])), "") | size(originals) > size(replacements), [*_, last] := originals[..common]] + + [replace(cover([after(last@\loc), cover(originals[common+1..])]), "") | size(originals) > size(replacements), [*_, last] := originals[..common]] // or we add new elements to the end, while inheriting indentation from the originals: - + [replace(end(last), learnIndentation(span, yield(replacements[common+1..]), yield(originals))) | size(originals) < size(replacements)] + + [replace(after(span), learnIndentation(span, yield(replacements[common..]), yield(originals))) | size(originals) < size(replacements)] ; } } @@ -313,8 +307,8 @@ uses particular properties of the relation between the original and the replacem * etc. } list[Tree] largestEqualSubList([*Tree sub], [*_, *sub, *_]) = sub; -list[Tree] largestEqualSubList([*_, *sub, *_], [*Tree sub]) = sub; -list[Tree] largestEqualSubList([*_, p, *sub, q, *_], [*_, !p, *Tree sub, !q, *_]) = sub; +list[Tree] largestEqualSubList([*_, *Tree sub, *_], [*sub]) = sub; +list[Tree] largestEqualSubList([*_, p, *Tree sub, q, *_], [*_, !p, *sub, !q, *_]) = sub; default list[Tree] largestEqualSubList(list[Tree] _orig, list[Tree] _repl) = []; @synopsis{trips equal elements from the front and the back of both lists, if any.} @@ -367,6 +361,8 @@ up to `until`. private loc fromUntil(loc from, loc until) = from.top(from.offset, until.offset - from.offset); private int end(loc src) = src.offset + src.length; +private loc after(loc src) = src(end(src), 0); + private loc endCover(loc span, []) = span(span.offset + span.length, 0); private loc endCover(loc span, [Tree x]) = x@\loc; private default loc endCover(loc span, list[Tree] l) = cover(l); diff --git a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc index 80060cf59ae..07475915d7a 100644 --- a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc +++ b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc @@ -29,15 +29,13 @@ bool editsAreSyntacticallyCorrect(type[&T<:Tree] grammar, str example, (&T<:Tree orig = parse(grammar, example); transformed = transform(orig); edits = treeDiff(orig, transformed); - println("derived edits:"); - iprintln(edits); edited = executeTextEdits(example, edits); try { return transformed := parse(grammar, edited); } catch ParseError(loc l): { - println("Parse error in:"); + println(" caused a parse error in:"); println(edited); return false; } @@ -94,17 +92,6 @@ start[Program] addDeclarationToStart(start[Program] p) = visit(p) { 'end` }; -start[Program] addDeclarationToStartAndEnd(start[Program] p) = visit(p) { - case (Program) `begin declare <{IdType ","}* decls>; <{Statement ";"}* body> end` - => (Program) `begin - ' declare - ' x : natural, - ' <{IdType ","}* decls>, - ' y : natural; - ' <{Statement ";"}* body> - 'end` -}; - test bool nulTestWithId() = editsAreSyntacticallyCorrect(#start[Program], simpleExample, identity); @@ -119,4 +106,13 @@ test bool addDeclarationToStartTest() = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToStart); test bool addDeclarationToStartAndEndTest() - = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToStartAndEnd); + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToStart o addDeclarationToEnd); + +test bool addDeclarationToEndAndSwapABTest() + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToEnd o swapAB); + +test bool addDeclarationToStartAndSwapABTest() + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToStart o swapAB); + +test bool addDeclarationToStartAndEndAndSwapABTest() + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToStart o addDeclarationToEnd o swapAB); From 94c9adce1df6ebf27e46ed8d02f82494ce13c993 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Sat, 11 Jan 2025 16:34:19 +0100 Subject: [PATCH 22/76] added failing test --- .../rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc | 4 ++-- .../library/analysis/diff/edits/HiFiTreeDiffTests.rsc | 7 +++++++ 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index 742326421ee..3697180ef0c 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -224,7 +224,7 @@ list[TextEdit] treeDiff( default list[TextEdit] treeDiff( t:appl(Production p:prod(_,_,_), list[Tree] _), r:appl(Production q:!p , list[Tree] _)) - = [replace(t@\loc, learnIndentation(t@\loc, "", ""))]; + = [replace(t@\loc, learnIndentation(t@\loc, "", ""))] when bprintln(t); // If list production are the same, then the element lists can still be of different length // and we switch to listDiff which has different heuristics than normal trees. @@ -235,7 +235,7 @@ list[TextEdit] treeDiff( // When the productions are equal, but the children may be different, we dig deeper for differences default list[TextEdit] treeDiff(appl(Production p, list[Tree] argsA), appl(p, list[Tree] argsB)) - = [*treeDiff(a, b) | <- zip2(argsA, argsB)]; + = [*treeDiff(a, b) | <- zip2(argsA, argsB)] when bprintln("into

on both sides"); @synopsis{decide how many separators we have} int seps(\iter-seps(_,list[Symbol] s)) = size(s); diff --git a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc index 07475915d7a..0c7dd7c826b 100644 --- a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc +++ b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc @@ -72,6 +72,10 @@ start[Program] swapAB(start[Program] p) = visit(p) { case (Id) `b` => (Id) `a` }; +start[Program] naturalToString(start[Program] p) = visit(p) { + case (Type) `natural` => (Type) `string` +}; + start[Program] addDeclarationToEnd(start[Program] p) = visit(p) { case (Program) `begin declare <{IdType ","}* decls>; <{Statement ";"}* body> end` => (Program) `begin @@ -116,3 +120,6 @@ test bool addDeclarationToStartAndSwapABTest() test bool addDeclarationToStartAndEndAndSwapABTest() = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToStart o addDeclarationToEnd o swapAB); + +test bool naturalToStringTest() + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, naturalToString); From 9856f1cf99a0f06fc357ec96cd77ac47e8fcc0e0 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Mon, 13 Jan 2025 09:35:02 +0100 Subject: [PATCH 23/76] debugging --- .../library/analysis/diff/edits/HiFiTreeDiff.rsc | 11 +++++++---- .../analysis/diff/edits/HiFiTreeDiffTests.rsc | 12 +++++++++++- 2 files changed, 18 insertions(+), 5 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index 3697180ef0c..4666bf585a2 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -185,14 +185,14 @@ list[TextEdit] treeDiff(Tree a, a) = []; // skip production labels of original rules when diffing list[TextEdit] treeDiff( - appl(prod(label(_, Symbol s), syms, attrs), list[Tree] args), + appl(prod(label(_, Symbol s), list[Symbol] syms, set[Attr] attrs), list[Tree] args), Tree u) = treeDiff(appl(prod(s, syms, attrs), args), u); // skip production labels of replacement rules when diffing list[TextEdit] treeDiff( Tree t, - appl(prod(label(_, Symbol s), syms, attrs), list[Tree] args)) + appl(prod(label(_, Symbol s), list[Symbol] syms, set[Attr] attrs), list[Tree] args)) = treeDiff(t, appl(prod(s, syms, attrs), args)); // matched layout trees generate empty diffs such that the original is maintained @@ -224,7 +224,10 @@ list[TextEdit] treeDiff( default list[TextEdit] treeDiff( t:appl(Production p:prod(_,_,_), list[Tree] _), r:appl(Production q:!p , list[Tree] _)) - = [replace(t@\loc, learnIndentation(t@\loc, "", ""))] when bprintln(t); + { + rprintln(t); + return [replace(t@\loc, learnIndentation(t@\loc, "", ""))]; + } // If list production are the same, then the element lists can still be of different length // and we switch to listDiff which has different heuristics than normal trees. @@ -234,7 +237,7 @@ list[TextEdit] treeDiff( = listDiff(t@\loc, seps(reg), aElems, bElems); // When the productions are equal, but the children may be different, we dig deeper for differences -default list[TextEdit] treeDiff(appl(Production p, list[Tree] argsA), appl(p, list[Tree] argsB)) +default list[TextEdit] treeDiff(t:appl(Production p, list[Tree] argsA), appl(p, list[Tree] argsB)) = [*treeDiff(a, b) | <- zip2(argsA, argsB)] when bprintln("into

on both sides"); @synopsis{decide how many separators we have} diff --git a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc index 0c7dd7c826b..9f8a4e4ca3b 100644 --- a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc +++ b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc @@ -30,9 +30,19 @@ bool editsAreSyntacticallyCorrect(type[&T<:Tree] grammar, str example, (&T<:Tree transformed = transform(orig); edits = treeDiff(orig, transformed); edited = executeTextEdits(example, edits); + println(" leads to:"); + iprintln(edits); try { - return transformed := parse(grammar, edited); + if (transformed := parse(grammar, edited)) { + return true; + } + else { + println("The edited result is not the same:"); + println(edited); + println("As the transformed:"); + println(transformed); + } } catch ParseError(loc l): { println(" caused a parse error in:"); From cbfaa68fbcc0de8690bed9b6dbac44ff81c21c40 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Mon, 13 Jan 2025 09:50:03 +0100 Subject: [PATCH 24/76] more debugging --- .../library/analysis/diff/edits/HiFiTreeDiff.rsc | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index 4666bf585a2..fff32bf5c0b 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -185,15 +185,15 @@ list[TextEdit] treeDiff(Tree a, a) = []; // skip production labels of original rules when diffing list[TextEdit] treeDiff( - appl(prod(label(_, Symbol s), list[Symbol] syms, set[Attr] attrs), list[Tree] args), + Tree t:appl(prod(label(_, Symbol s), list[Symbol] syms, set[Attr] attrs), list[Tree] args), Tree u) - = treeDiff(appl(prod(s, syms, attrs), args), u); + = treeDiff(appl(prod(s, syms, attrs), args)[@\loc=t@\loc?|bla:///|], u); // skip production labels of replacement rules when diffing list[TextEdit] treeDiff( Tree t, - appl(prod(label(_, Symbol s), list[Symbol] syms, set[Attr] attrs), list[Tree] args)) - = treeDiff(t, appl(prod(s, syms, attrs), args)); + Tree u:appl(prod(label(_, Symbol s), list[Symbol] syms, set[Attr] attrs), list[Tree] args)) + = treeDiff(t, appl(prod(s, syms, attrs), args)[@\loc=u@\loc?|bla:///|]); // matched layout trees generate empty diffs such that the original is maintained list[TextEdit] treeDiff( From ec718d1cb89d9f77f2fcd17299048ceae7ec4ac0 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Mon, 13 Jan 2025 09:51:19 +0100 Subject: [PATCH 25/76] one more test --- .../tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc index 9f8a4e4ca3b..4812d45156d 100644 --- a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc +++ b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc @@ -133,3 +133,6 @@ test bool addDeclarationToStartAndEndAndSwapABTest() test bool naturalToStringTest() = editsAreSyntacticallyCorrect(#start[Program], simpleExample, naturalToString); + +test bool naturalToStringAndAtoBTest() + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, naturalToString o swapAB); From c8c267d4f5700f4a33b19e9279bf513273411939 Mon Sep 17 00:00:00 2001 From: Toine Hartman Date: Tue, 15 Jul 2025 13:06:54 +0200 Subject: [PATCH 26/76] Add missing return. --- .../tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc | 1 + 1 file changed, 1 insertion(+) diff --git a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc index 4812d45156d..7c7f5d41f42 100644 --- a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc +++ b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc @@ -42,6 +42,7 @@ bool editsAreSyntacticallyCorrect(type[&T<:Tree] grammar, str example, (&T<:Tree println(edited); println("As the transformed:"); println(transformed); + return false; } } catch ParseError(loc l): { From 5e8992492fc7006540ae0cca3e9c1753f0d98398 Mon Sep 17 00:00:00 2001 From: Toine Hartman Date: Tue, 15 Jul 2025 14:57:01 +0200 Subject: [PATCH 27/76] Implement layoutDiff. --- .../analysis/diff/edits/HiFiTreeDiff.rsc | 45 ++++++++++++++++ .../analysis/diff/edits/HiFiTreeDiffTests.rsc | 53 +++++++++++++------ 2 files changed, 83 insertions(+), 15 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index fff32bf5c0b..535addeaca7 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -240,6 +240,51 @@ list[TextEdit] treeDiff( default list[TextEdit] treeDiff(t:appl(Production p, list[Tree] argsA), appl(p, list[Tree] argsB)) = [*treeDiff(a, b) | <- zip2(argsA, argsB)] when bprintln("into

on both sides"); + +// Equal trees +list[TextEdit] layoutDiff(Tree a, Tree b, bool copyComments = false) + = [] when a == b; + +// layout difference +list[TextEdit] layoutDiff( + t:appl(prod(layouts(str l), _, _), list[Tree] _), + r:appl(prod(layouts(l), _, _), list[Tree] _), + bool copyComments = false) + = [replace(t@\loc, learnIndentation(t@\loc, "", ""))]; + +// matched layout trees generate empty diffs such that the original is maintained +default list[TextEdit] layoutDiff( + appl(prod(layouts(_), _, _), list[Tree] _), + appl(prod(layouts(_), _, _), list[Tree] _), + bool copyComments = false) + = []; + +// matched literal trees generate empty diffs +list[TextEdit] layoutDiff( + appl(prod(lit(str l), _, _), list[Tree] _), + appl(prod(lit(l) , _, _), list[Tree] _), + bool copyComments = false) + = []; + +// matched case-insensitive literal trees generate empty diffs such that the original is maintained +list[TextEdit] layoutDiff( + appl(prod(cilit(str l), _, _), list[Tree] _), + appl(prod(cilit(l) , _, _), list[Tree] _), + bool copyComments = false) + = []; + +list[TextEdit] layoutDiff( + t:appl(prod(lex(str l), _, _), list[Tree] _), + r:appl(prod(lex(l) , _, _), list[Tree] _), + bool copyComments = false) + = [replace(t@\loc, learnIndentation(t@\loc, "", ""))]; + +default list[TextEdit] layoutDiff( + appl(Production p, list[Tree] argsA), + appl(p, list[Tree] argsB), + bool copyComments = false) + = [*layoutDiff(a, b, copyComments=copyComments) | <- zip2(argsA, argsB)]; + @synopsis{decide how many separators we have} int seps(\iter-seps(_,list[Symbol] s)) = size(s); int seps(\iter-star-seps(_,list[Symbol] s)) = size(s); diff --git a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc index 7c7f5d41f42..12cad41f768 100644 --- a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc +++ b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc @@ -25,10 +25,10 @@ TreeDiff is syntactically correct if: * The tree after rewriting _matches_ the tree after applying the edits tot the source text and parsing that. * Note that _matching_ ignores case-insensitive literals and layout, indentation and comments } -bool editsAreSyntacticallyCorrect(type[&T<:Tree] grammar, str example, (&T<:Tree)(&T<:Tree) transform) { +bool editsAreSyntacticallyCorrect(type[&T<:Tree] grammar, str example, (&T<:Tree)(&T<:Tree) transform, list[TextEdit](Tree, Tree) diff) { orig = parse(grammar, example); transformed = transform(orig); - edits = treeDiff(orig, transformed); + edits = diff(orig, transformed); edited = executeTextEdits(example, edits); println(" leads to:"); iprintln(edits); @@ -67,10 +67,10 @@ significant changes to the code have been made. * This specification is not true for any transformation. Only apply it to a test case if you can expect indentation-preservation for _the entire file_. } -bool editsMaintainIndentationLevels(type[&T<:Tree] grammar, str example, (&T<:Tree)(&T<:Tree) transform) { +bool editsMaintainIndentationLevels(type[&T<:Tree] grammar, str example, (&T<:Tree)(&T<:Tree) transform, list[TextEdit](Tree, Tree) diff) { orig = parse(grammar, example); transformed = transform(orig); - edits = treeDiff(orig, transformed); + edits = diff(orig, transformed); edited = executeTextEdits(example, edits); return indentationLevels(example) == indentationLevels(edited); @@ -107,33 +107,56 @@ start[Program] addDeclarationToStart(start[Program] p) = visit(p) { 'end` }; +start[Program](start[Program]) indent(str indentation = " ", bool indentFirstLine = true) { + return start[Program](start[Program] p) { + return parse(#start[Program], indent(indentation, "

", indentFirstLine=indentFirstLine)); + }; +} + +start[Program] insertSpacesInDeclaration(start[Program] p) = visit(p) { + case (IdType) ` : ` + => (IdType) ` : ` +}; + test bool nulTestWithId() - = editsAreSyntacticallyCorrect(#start[Program], simpleExample, identity); + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, identity, treeDiff); test bool simpleSwapper() - = editsAreSyntacticallyCorrect(#start[Program], simpleExample, swapAB) - && editsMaintainIndentationLevels(#start[Program], simpleExample, swapAB); + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, swapAB, treeDiff) + && editsMaintainIndentationLevels(#start[Program], simpleExample, swapAB, treeDiff); test bool addDeclarationToEndTest() - = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToEnd); + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToEnd, treeDiff); test bool addDeclarationToStartTest() - = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToStart); + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToStart, treeDiff); test bool addDeclarationToStartAndEndTest() - = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToStart o addDeclarationToEnd); + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToStart o addDeclarationToEnd, treeDiff); test bool addDeclarationToEndAndSwapABTest() - = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToEnd o swapAB); + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToEnd o swapAB, treeDiff); test bool addDeclarationToStartAndSwapABTest() - = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToStart o swapAB); + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToStart o swapAB, treeDiff); test bool addDeclarationToStartAndEndAndSwapABTest() - = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToStart o addDeclarationToEnd o swapAB); + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToStart o addDeclarationToEnd o swapAB, treeDiff); test bool naturalToStringTest() - = editsAreSyntacticallyCorrect(#start[Program], simpleExample, naturalToString); + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, naturalToString, treeDiff); test bool naturalToStringAndAtoBTest() - = editsAreSyntacticallyCorrect(#start[Program], simpleExample, naturalToString o swapAB); + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, naturalToString o swapAB, treeDiff); + +test bool nulTestWithIdLayout() + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, identity, layoutDiff) + && editsMaintainIndentationLevels(#start[Program], simpleExample, indent(), layoutDiff); + +test bool indentAllLayout() + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, indent(), layoutDiff) + && !editsMaintainIndentationLevels(#start[Program], simpleExample, indent(), layoutDiff); + +test bool insertSpacesInDeclarationLayout() + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, insertSpacesInDeclaration, layoutDiff) + && editsMaintainIndentationLevels(#start[Program], simpleExample, indent(), layoutDiff); From 5437b6313a6f6cc2cc43f07e9da422f7ed2aaabc Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Wed, 6 Aug 2025 10:41:45 +0200 Subject: [PATCH 28/76] factored layoutDiff into its own module --- .../analysis/diff/edits/HiFiLayoutDiff.rsc | 70 ++++++++++++++++ .../analysis/diff/edits/HiFiTreeDiff.rsc | 84 +++++++------------ 2 files changed, 98 insertions(+), 56 deletions(-) create mode 100644 src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc new file mode 100644 index 00000000000..1b71ccdcf56 --- /dev/null +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc @@ -0,0 +1,70 @@ +@synopsis{Compare equal-modulo-layout parse trees and extract the exact whitespace text edits that will format the original file.} +@description{ +This algorithm is the final component of a declarative high fidelity source code formatting pipeline. + +We have the following assumptions: +1. One original text file exists. +2. One ((ParseTree)) of the original file to be formatted, containing all orginal layout and source code comments and case-insensitive literals in the exact order of the original text file. In other words, +nothing may have happened to the parse tree after parsing. +3. One ((ParseTree)) of the _same_ file, but formatted (using a formatting algorithm like ((Tree2Box)) `|` ((Box2Text)), or string templates, and then re-parsing). This is typically obtained by +translating the tree to a `str` using some formatting tools, and then reparsing the file. +4. Typically comments and specific capitalization of case-insensitive literals have been lost in step 3. +5. We use ((analysis::diff::edits::TextEdits)) to communicate the effect of formatting to the IDE context. +} +@pitfalls{ +* if `originalTree !:= formattedTree` the algorithm will produce junk. It will break the syntactical correctness of the source code and forget source code comments. +} +@benefits{ +* Recovers source code comments which have been lost during earlier steps in the formatting pipeline. This makes losing source code comments an independent concern of a declarative formatter. +* Recovers the original capitalization of case-insensitive literals which may have been lost during earlier steps in the formatting pipeline. +* Is agnostic towards the design of earlier steps in the formatting pipeline, so lang as `formattedTree := originalTree`. This means that +the pipeline may change layout (whitespace and comments and capitalization of case-insensitive literals), but nothing else. +} +module analysis::diff::edits::HiFiLayoutDiff + +extend analysis::diff::edits::HiFiTreeDiff; + +@synopsis{Extract TextEdits for the differences in whitespace between two otherwise identical ((ParseTree))s.} +// Equal trees +list[TextEdit] layoutDiff(Tree a, Tree b, bool copyComments = false) + = [] when a == b; + +// layout difference +list[TextEdit] layoutDiff( + t:appl(prod(layouts(str l), _, _), list[Tree] _), + r:appl(prod(layouts(l), _, _), list[Tree] _), + bool copyComments = false) + = [replace(t@\loc, learnIndentation(t@\loc, "", ""))]; + +// matched layout trees generate empty diffs such that the original is maintained +default list[TextEdit] layoutDiff( + appl(prod(layouts(_), _, _), list[Tree] _), + appl(prod(layouts(_), _, _), list[Tree] _), + bool copyComments = false) + = []; + +// matched literal trees generate empty diffs +list[TextEdit] layoutDiff( + appl(prod(lit(str l), _, _), list[Tree] _), + appl(prod(lit(l) , _, _), list[Tree] _), + bool copyComments = false) + = []; + +// matched case-insensitive literal trees generate empty diffs such that the original is maintained +list[TextEdit] layoutDiff( + appl(prod(cilit(str l), _, _), list[Tree] _), + appl(prod(cilit(l) , _, _), list[Tree] _), + bool copyComments = false) + = []; + +list[TextEdit] layoutDiff( + t:appl(prod(lex(str l), _, _), list[Tree] _), + r:appl(prod(lex(l) , _, _), list[Tree] _), + bool copyComments = false) + = [replace(t@\loc, learnIndentation(t@\loc, "", ""))]; + +default list[TextEdit] layoutDiff( + appl(Production p, list[Tree] argsA), + appl(p, list[Tree] argsB), + bool copyComments = false) + = [*layoutDiff(a, b, copyComments=copyComments) | <- zip2(argsA, argsB)]; \ No newline at end of file diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index 535addeaca7..8ab3faaf021 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -180,16 +180,16 @@ executeDocumentEdit(edit); readFile(tmp://example.pico|); ``` } -// equal trees generate empty diffs (note this already ignores whitespace differences) +// equal trees generate empty diffs (note this already ignores whitespace differences because non-linear matching ignores layout nodes) list[TextEdit] treeDiff(Tree a, a) = []; -// skip production labels of original rules when diffing +// skip production labels of original rules when diffing, to be able to focus on the Symbol constructor for downstream case-distinction list[TextEdit] treeDiff( Tree t:appl(prod(label(_, Symbol s), list[Symbol] syms, set[Attr] attrs), list[Tree] args), Tree u) = treeDiff(appl(prod(s, syms, attrs), args)[@\loc=t@\loc?|bla:///|], u); -// skip production labels of replacement rules when diffing +// skip production labels of original rules when diffing, to be able to focus on the Symbol constructor for downstream case-distinction list[TextEdit] treeDiff( Tree t, Tree u:appl(prod(label(_, Symbol s), list[Symbol] syms, set[Attr] attrs), list[Tree] args)) @@ -213,7 +213,7 @@ list[TextEdit] treeDiff( appl(prod(cilit(l) , _, _), list[Tree] _)) = []; -// different lexicals generate small diffs even if the parent is equal +// different lexicals generate small diffs even if the parent is equal. This avoids extremely small edits within the boundaries of single identifiers. list[TextEdit] treeDiff( t:appl(prod(lex(str l), _, _), list[Tree] _), r:appl(prod(lex(l) , _, _), list[Tree] _)) @@ -225,12 +225,12 @@ default list[TextEdit] treeDiff( t:appl(Production p:prod(_,_,_), list[Tree] _), r:appl(Production q:!p , list[Tree] _)) { - rprintln(t); + rprintln(t); // TODO remove debug statement return [replace(t@\loc, learnIndentation(t@\loc, "", ""))]; } // If list production are the same, then the element lists can still be of different length -// and we switch to listDiff which has different heuristics than normal trees. +// and we switch to listDiff which has different heuristics than normal trees to detect large identical sublists. list[TextEdit] treeDiff( Tree t:appl(Production p:regular(Symbol reg), list[Tree] aElems), appl(p, list[Tree] bElems)) @@ -238,59 +238,32 @@ list[TextEdit] treeDiff( // When the productions are equal, but the children may be different, we dig deeper for differences default list[TextEdit] treeDiff(t:appl(Production p, list[Tree] argsA), appl(p, list[Tree] argsB)) - = [*treeDiff(a, b) | <- zip2(argsA, argsB)] when bprintln("into

on both sides"); + = [*treeDiff(a, b) | <- zip2(argsA, argsB)] when bprintln("into

on both sides"); // TODO remove debug print -// Equal trees -list[TextEdit] layoutDiff(Tree a, Tree b, bool copyComments = false) - = [] when a == b; -// layout difference -list[TextEdit] layoutDiff( - t:appl(prod(layouts(str l), _, _), list[Tree] _), - r:appl(prod(layouts(l), _, _), list[Tree] _), - bool copyComments = false) - = [replace(t@\loc, learnIndentation(t@\loc, "", ""))]; - -// matched layout trees generate empty diffs such that the original is maintained -default list[TextEdit] layoutDiff( - appl(prod(layouts(_), _, _), list[Tree] _), - appl(prod(layouts(_), _, _), list[Tree] _), - bool copyComments = false) - = []; - -// matched literal trees generate empty diffs -list[TextEdit] layoutDiff( - appl(prod(lit(str l), _, _), list[Tree] _), - appl(prod(lit(l) , _, _), list[Tree] _), - bool copyComments = false) - = []; - -// matched case-insensitive literal trees generate empty diffs such that the original is maintained -list[TextEdit] layoutDiff( - appl(prod(cilit(str l), _, _), list[Tree] _), - appl(prod(cilit(l) , _, _), list[Tree] _), - bool copyComments = false) - = []; - -list[TextEdit] layoutDiff( - t:appl(prod(lex(str l), _, _), list[Tree] _), - r:appl(prod(lex(l) , _, _), list[Tree] _), - bool copyComments = false) - = [replace(t@\loc, learnIndentation(t@\loc, "", ""))]; - -default list[TextEdit] layoutDiff( - appl(Production p, list[Tree] argsA), - appl(p, list[Tree] argsB), - bool copyComments = false) - = [*layoutDiff(a, b, copyComments=copyComments) | <- zip2(argsA, argsB)]; @synopsis{decide how many separators we have} -int seps(\iter-seps(_,list[Symbol] s)) = size(s); -int seps(\iter-star-seps(_,list[Symbol] s)) = size(s); -default int seps(Symbol _) = 0; +int seps(\iter-seps(_, list[Symbol] s)) = size(s); +int seps(\iter-star-seps(_, list[Symbol] s)) = size(s); +default int seps(Symbol _) = 0; -@synsopis{List diff is like text diff on lines; complex and easy to make slow} +@synopsis{List diff is like text diff on lines; complex and easy to make slow} +@description{ +This algorithm uses heuristics to avoid searching for the largest common sublist all too often. + +1. Since many patches to parse tree lists typically only change a prefix or a postfix, and we +can detect this quickly, we first extract patches for those instances. +2. However, it is also very fast to detect unchanged prefixes and postfixes, so by focusing +on the changes parts in the middle we generate more instances of case 1. +3. What we are left with is either an empty list and we are done, or a more complex situation +where we apply the "largestEqualSubList" algorithm, which splits the list in three parts: + * two unequal prefixes + * two equal sublists in the middle + * two unequal postfixes +4. the algorithm then concatenates the diffs by recursing to step 1 on the prefixes and the diffs by recursing to step 1. on the postfixes +5. two empty lists terminate the recursion, +} list[TextEdit] listDiff(loc span, int seps, list[Tree] originals, list[Tree] replacements) { edits = []; @@ -348,11 +321,10 @@ and thus there are interesting differences left, even if we remove any equal sublist. Note that this is not a general algorithm for Largest Common Subsequence (LCS), since it -uses particular properties of the relation between the original and the replacement list. +uses particular properties of the relation between the original and the replacement list: * New elements are never equal to old elements (due to source locations) * Equal prefixes and postfixes may be assumed to be maximal sublists as well (see above). * Candidate equal sublists always have consecutive source locations from the origin. -* etc. } list[Tree] largestEqualSubList([*Tree sub], [*_, *sub, *_]) = sub; list[Tree] largestEqualSubList([*_, *Tree sub, *_], [*sub]) = sub; @@ -399,7 +371,7 @@ private loc fromUntil(Tree from, Tree until) = fromUntil(from@\loc, until@\loc); @synopsis{Compute location span that is common between an element and a succeeding element} @description{ -The resulting loc is including the `from` but exclusing the `until`. It goes right +The resulting loc is including the `from` but excluding the `until`. It goes right up to `until`. ```ascii-art [from] gap [until] From 51821f5ea0756a05b8fab13e72950ca0384d01ab Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Wed, 6 Aug 2025 10:43:09 +0200 Subject: [PATCH 29/76] added TODO --- src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc | 1 + 1 file changed, 1 insertion(+) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc index 1b71ccdcf56..b1a78382b7a 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc @@ -30,6 +30,7 @@ list[TextEdit] layoutDiff(Tree a, Tree b, bool copyComments = false) = [] when a == b; // layout difference +// TODO: layout nodes typically do not have @\loc annotations, so we have to get them from somewhere list[TextEdit] layoutDiff( t:appl(prod(layouts(str l), _, _), list[Tree] _), r:appl(prod(layouts(l), _, _), list[Tree] _), From bfa49472dd0f3230d03282e9933f651c12728a3c Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Wed, 6 Aug 2025 10:49:37 +0200 Subject: [PATCH 30/76] simplifying and fixing and commenting layoutDiff --- .../analysis/diff/edits/HiFiLayoutDiff.rsc | 25 ++++++------------- 1 file changed, 8 insertions(+), 17 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc index b1a78382b7a..5359d9ad876 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc @@ -35,14 +35,7 @@ list[TextEdit] layoutDiff( t:appl(prod(layouts(str l), _, _), list[Tree] _), r:appl(prod(layouts(l), _, _), list[Tree] _), bool copyComments = false) - = [replace(t@\loc, learnIndentation(t@\loc, "", ""))]; - -// matched layout trees generate empty diffs such that the original is maintained -default list[TextEdit] layoutDiff( - appl(prod(layouts(_), _, _), list[Tree] _), - appl(prod(layouts(_), _, _), list[Tree] _), - bool copyComments = false) - = []; + = [replace(t@\loc, learnComments(t@\loc, "", ""))]; // matched literal trees generate empty diffs list[TextEdit] layoutDiff( @@ -58,14 +51,12 @@ list[TextEdit] layoutDiff( bool copyComments = false) = []; -list[TextEdit] layoutDiff( - t:appl(prod(lex(str l), _, _), list[Tree] _), - r:appl(prod(lex(l) , _, _), list[Tree] _), - bool copyComments = false) - = [replace(t@\loc, learnIndentation(t@\loc, "", ""))]; - +// recurse through the parse tree in the right order to collect layout edits +// this default fails when the two compared trees are unequal-modulo-layout, such that +// this precondition is checked and failure to comply is detected as early (high) as possible. default list[TextEdit] layoutDiff( - appl(Production p, list[Tree] argsA), - appl(p, list[Tree] argsB), + Tree t:appl(Production p, list[Tree] argsA), + t:appl(p, list[Tree] argsB), // note the non-linear equality-modulo-layout check here bool copyComments = false) - = [*layoutDiff(a, b, copyComments=copyComments) | <- zip2(argsA, argsB)]; \ No newline at end of file + = [*layoutDiff(a, b, copyComments=copyComments) | <- zip2(argsA, argsB)]; + \ No newline at end of file From ee4eec29f9499ab6392da2d4d5d34c20dc027c92 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Wed, 6 Aug 2025 11:15:44 +0200 Subject: [PATCH 31/76] minor steps --- .../analysis/diff/edits/HiFiLayoutDiff.rsc | 55 +++++++++++++++---- 1 file changed, 44 insertions(+), 11 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc index 5359d9ad876..f4bbe1850ff 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc @@ -23,29 +23,48 @@ the pipeline may change layout (whitespace and comments and capitalization of ca module analysis::diff::edits::HiFiLayoutDiff extend analysis::diff::edits::HiFiTreeDiff; +import ParseTree; // this should not be necessary because imported by HiFiTreeDiff +import String; // this should not be be necessary because imported by HiFiTreeDiff + @synopsis{Extract TextEdits for the differences in whitespace between two otherwise identical ((ParseTree))s.} -// Equal trees +@description{ +This is the top-level wrapper that starts a recursion over the entire parse tree. +We need to keep the span of the current node in order to fill in the possible gaps +where sub-trees are not annotated with source locations. +} list[TextEdit] layoutDiff(Tree a, Tree b, bool copyComments = false) + = layoutDiff(a@\loc, a, b, copyComments=copyComments); + + + +// Equal trees +list[TextEdit] layoutDiff(loc _span, Tree a, Tree b, bool copyComments = false) = [] when a == b; -// layout difference -// TODO: layout nodes typically do not have @\loc annotations, so we have to get them from somewhere -list[TextEdit] layoutDiff( +// layout differences are detected, so here we produce a `replace` node: +list[TextEdit] layoutDiff(loc span, + t:appl(prod(layouts(str l), _, _), list[Tree] _), + u:appl(prod(layouts(l), _, _), list[Tree] _), + bool copyComments = false) + = [replace(span, learnComments(t@\loc, "", ""))] when t != u; + +// the layout was the same as before +list[TextEdit] layoutDiff(loc span, t:appl(prod(layouts(str l), _, _), list[Tree] _), - r:appl(prod(layouts(l), _, _), list[Tree] _), + t, bool copyComments = false) - = [replace(t@\loc, learnComments(t@\loc, "", ""))]; + = []; // matched literal trees generate empty diffs -list[TextEdit] layoutDiff( +list[TextEdit] layoutDiff(loc _span, appl(prod(lit(str l), _, _), list[Tree] _), appl(prod(lit(l) , _, _), list[Tree] _), bool copyComments = false) = []; // matched case-insensitive literal trees generate empty diffs such that the original is maintained -list[TextEdit] layoutDiff( +list[TextEdit] layoutDiff(loc _span, appl(prod(cilit(str l), _, _), list[Tree] _), appl(prod(cilit(l) , _, _), list[Tree] _), bool copyComments = false) @@ -54,9 +73,23 @@ list[TextEdit] layoutDiff( // recurse through the parse tree in the right order to collect layout edits // this default fails when the two compared trees are unequal-modulo-layout, such that // this precondition is checked and failure to comply is detected as early (high) as possible. -default list[TextEdit] layoutDiff( +default list[TextEdit] layoutDiff(loc span, Tree t:appl(Production p, list[Tree] argsA), t:appl(p, list[Tree] argsB), // note the non-linear equality-modulo-layout check here bool copyComments = false) - = [*layoutDiff(a, b, copyComments=copyComments) | <- zip2(argsA, argsB)]; - \ No newline at end of file + = [*layoutDiff(|todo:///|, a, b, copyComments=copyComments) | <- zip2(argsA, argsB)]; // TODO: here we have to recover the loc of the outermost layout node + +@synopsis{Make sure the new layout still contains all the source code comments of the original layout} +@description{ +This algorithm uses a heuristic to detect source code comments inside layout substrings. If the original +layout contains comments, but the replacement layout does not, we re-introduce the comments at the +expected level of indentation. +} +private str learnComments(loc span, str replacement, str original, bool copyComments = false) { + if (!copyComments) { + return replacement; + } + else { + throw "not yet implemented"; + } +} \ No newline at end of file From 45b3e90c913d720150614bac0b3dfa6c1b53e5e5 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Wed, 6 Aug 2025 12:26:49 +0200 Subject: [PATCH 32/76] added TODO --- .../library/analysis/diff/edits/HiFiLayoutDiff.rsc | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc index f4bbe1850ff..2423ce1f757 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc @@ -36,8 +36,6 @@ where sub-trees are not annotated with source locations. list[TextEdit] layoutDiff(Tree a, Tree b, bool copyComments = false) = layoutDiff(a@\loc, a, b, copyComments=copyComments); - - // Equal trees list[TextEdit] layoutDiff(loc _span, Tree a, Tree b, bool copyComments = false) = [] when a == b; @@ -47,7 +45,7 @@ list[TextEdit] layoutDiff(loc span, t:appl(prod(layouts(str l), _, _), list[Tree] _), u:appl(prod(layouts(l), _, _), list[Tree] _), bool copyComments = false) - = [replace(span, learnComments(t@\loc, "", ""))] when t != u; + = [replace(span, learnComments(t@\loc, "", ""))] when eq(t, u); // the layout was the same as before list[TextEdit] layoutDiff(loc span, @@ -90,6 +88,10 @@ private str learnComments(loc span, str replacement, str original, bool copyComm return replacement; } else { + // TODO: 1. detect "non-whitespace" in `original` + // 2. strip leading indentation from the non-whitespace if multiple lines are detected + // 3. re-indent the multiple lines + // 4. integrate the new comments with the new whitespace in a smart manner throw "not yet implemented"; } } \ No newline at end of file From e88496186040a0cee171f1df80536476f03c642f Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Fri, 8 Aug 2025 12:14:19 +0200 Subject: [PATCH 33/76] removed unused import --- src/org/rascalmpl/values/parsetrees/TreeAdapter.java | 1 - 1 file changed, 1 deletion(-) diff --git a/src/org/rascalmpl/values/parsetrees/TreeAdapter.java b/src/org/rascalmpl/values/parsetrees/TreeAdapter.java index fb53274cb7c..5f633d0dc68 100644 --- a/src/org/rascalmpl/values/parsetrees/TreeAdapter.java +++ b/src/org/rascalmpl/values/parsetrees/TreeAdapter.java @@ -25,7 +25,6 @@ import org.jline.jansi.Ansi.Color; import org.rascalmpl.exceptions.ImplementationError; import org.rascalmpl.interpreter.utils.LimitedResultWriter; -import org.rascalmpl.values.IRascalValueFactory; import org.rascalmpl.values.RascalValueFactory; import org.rascalmpl.values.ValueFactoryFactory; import org.rascalmpl.values.parsetrees.visitors.TreeVisitor; From 7b16301c8297862a3bf06756fbaa429371497883 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Fri, 8 Aug 2025 12:14:43 +0200 Subject: [PATCH 34/76] added reposition function to ParseTree --- src/org/rascalmpl/library/ParseTree.rsc | 166 +++++++++++++++++++++++- 1 file changed, 164 insertions(+), 2 deletions(-) diff --git a/src/org/rascalmpl/library/ParseTree.rsc b/src/org/rascalmpl/library/ParseTree.rsc index 7ab0a0da50b..6b2e1651d16 100644 --- a/src/org/rascalmpl/library/ParseTree.rsc +++ b/src/org/rascalmpl/library/ParseTree.rsc @@ -139,9 +139,11 @@ run-time already uses `.src` while the source code still uses `@\loc`. module ParseTree -extend Type; -extend Message; extend List; +extend Message; +extend Type; + +import Node; @synopsis{The Tree data type as produced by the parser.} @description{ @@ -811,3 +813,163 @@ bool isNonTerminalType(Symbol::\parameterized-sort(str _, list[Symbol] _)) = tru bool isNonTerminalType(Symbol::\parameterized-lex(str _, list[Symbol] _)) = true; bool isNonTerminalType(Symbol::\start(Symbol s)) = isNonTerminalType(s); default bool isNonTerminalType(Symbol s) = false; + +@synopsis{Re-compute and overwrite origin locations for all sub-trees of a ((Tree))} +@description{ +This function takes a ((Tree)) and overwrites the old \loc annotations of every subtree +with fresh locations. The new locations are as-if the file was parsed again from the unparsed result: +the locations describe the left-to-right order of the sub-trees again exactly, and they are all +from the same top-level location (read "file"). + +Typically, with the default options, this algorithm changes _nothing_ in a ((Tree)) which +has just been produced by the parser. It will rebuild the tree and recompute the exact +locations as they were originally. However, there are many reasons why the (location) fields +in a ((Tree)) are not at all anymore what they were just after parsing: +1. subtrees may have been removed +2. subtrees may have been relocated to different parts of the tree; +2. subtrees may have been introduced from other source files +3. subtrees may have been introduced from concrete syntax expressions in Rascal code. +4. other algorithms may have added more keyword fields, for example fully resolved qualified names, +resolved types, error messages or future computations (closures). +5. location fields themselves may have been lost accidentally when rewriting trees with ((Statement-Visit)) +6. etc. + +Some downstream algorithms (e.g. ((HiFiLayoutDiff)) ) require source locations to be consistent with the current actual position +of every source tree. ((reposition)) provides this contract. Even if one of the above transformations have happened, +after ((reposition)) every node has an accurate position with respect to the hypothetical file contents that would be generated +if the tree is unparsed (written to a string or a file). + +Next to this feature, ((reposition)) may add locations to ((Tree)) nodes which were not annotated +initially by the ((parser)): layout nodes, literal nodes, and sub-lexical nodes. Some algorithms on +parse trees (like formatting), require more detailed location information than provided by the ((parser)): +* markLexical=true, ensures the sub-structure of lexicals is annotated as well. +* markLayout=true, ensures annotating layout nodes and their sub-structure as well. +* markLit=true, ensures literal trees and case-insensitive literal trees are annotated as well. +* markAmb=true, ensures ambiguity nodes are annotated. NB: the sub-structure of a cluster is always annotated according to the other flags. +* etc. every kind of node has a "mark" flag for completeness sake. + +Finally, ((reposition)) can be used to removed superfluous locations from ((Tree)) nodes. Every node which +originally had a position will lose it unless ((reposition)) is configured to recompute it. + +By default ((reposition)) simulates the behavior of a ((parser)) exactly. Reparsing the +yield of a tree should always produce the exact same locations as ((reposition)) does. +} +@benefits{ +* Unlike reparsing, ((reposition)) will maintain all other keyword parameters of ((Tree)) nodes, like resolved qualified names and type attributes. +* Can be used to erase superfluous annotations for memory efficiency, while keeping the essential ones. +* +} +&T <: Tree reposition( + &T <: Tree tree, + loc file = tree@\loc.top, + bool \markSyntax = true, + bool \markLexical = true, + bool \markSubLexical = false, + bool \markRegular = true, + bool \markLayout = false, + bool \markSubLayout = false, + bool \markLit = false, + bool \markSubLit = false, + bool \markAmb = false, + bool \markCycle = false, + bool \markChar = false + ) { + // the cur variables are shared state by the `rec` local function that recurses over the entire tree + int curOffset = 0; + int curLine = 1; + int curColumn = 0; + + @synopsis{Check if this rule is configured to be annotated} + default bool doAnno(Production _) = false; + bool doAnno(prod(\lex(_), _, _)) = markLexical; + bool doAnno(prod(\label(_, \lex(_)), _, _)) = markLexical; + bool doAnno(prod(\layouts(_), _, _)) = markLayout; + bool doAnno(prod(\label(_, \layouts(_)), _, _)) = markLayout; + bool doAnno(prod(\sort(_), _, _)) = markSyntax; + bool doAnno(prod(\label(_, \sort(_)), _, _)) = markSyntax; + bool doAnno(\regular(_)) = markRegular; + bool doAnno(prod(\lit(_), _, _)) = markLit; + bool doAnno(prod(\cilit(_), _, _)) = markLit; + + @synopsis{Check if sub-structure of this rule is configured to be annotated} + default bool doSub(Production _) = true; + bool doSub(prod(\lex(_), _, _)) = \markSubLexical; + bool doSub(prod(\label(_, lex(_)), _, _)) = \markSubLexical; + bool doSub(prod(\layouts(_), _, _)) = \markSubLayout; + bool doSub(prod(\label(_, \layouts(_)), _, _)) = \markSubLayout; + bool doSub(prod(\lit(_), _, _)) = \markSubLit; + bool doSub(prod(\cilit(_), _, _)) = \markSubLit; + + // the character nodes drive the actual current position: offset, line and column + Tree rec(Tree t:char(int ch), bool _sub) { + beginOffset = curOffset; + beginLine = curLine; + beginColumn = curColumn; + + curOffset += 1; + + switch (t) { + case [\r] _: { + curColumn = 0; + } + + case [\n] _: { + curLine += 1; + curColumn = 0; + } + } + + return markChar + ? char(ch)[@\loc=file(beginOffset, 1, , )] + : char(ch) + ; + } + + // cycles take no space + Tree rec(cycle(Symbol s, int up), bool _sub) = markCycle + ? cycle(s, up)[@\loc=file(curOffset, 0, , )] + : cycle(s, up) + ; + + // application nodes always have children to traverse, to get to the individual characters eventually + // different types of nodes lead to annotation, or not, depending on the parameters of ((reposition)) + Tree rec(appl(Production prod, list[Tree] args), bool sub) { + beginOffset = curOffset; + beginLine = curLine; + beginColumn = curColumn; + + // once `sub` is false, going down, we can never turn it on again + newArgs = [mergeRec(a, sub && doSub(prod)) | a <- args]; + + return sub && doAnno(prod) + ? appl(prod, newArgs)[@\loc=file(beginOffset, curOffset - beginOffset, , )] + : appl(prod, newArgs) + ; + } + + // ambiguity nodes are simply choices between alternatives which each receive their own positions. + Tree rec(t:amb(set[Tree] alts), bool sub) { + if (newAlts:{Tree x, *_} := {mergeRec(a, sub) | a <- alts}) { + // inherit the outermost positions from one of the alternatives, since they are all the same by definition. + return markAmb && x@\loc? + ? amb(newAlts)[@\loc=x@\loc] + : amb(newAlts) + ; + } + + // this never happens because there is always at least two alternatives in a cluster + fail; + } + + @synopsis{Recurse, but not without recovering all other keyword parameters except "src" a.k.a. @\loc from the original.} + Tree mergeRec(Tree t, bool sub) { + oldParams = getKeywordParameters(t); + t = rec(t, sub); + newParams = getKeywordParameters(t); + mergedParams = (oldParams - ("src" : |unknown:///|)) + newParams; + return setKeywordParameters(t, mergedParams); + } + + // we start recursion at the top, not forgetting to merge its other keyword fields + return mergeRec(tree, true); +} \ No newline at end of file From 373179ffd4c051051eee983e2815cc3f75147fc7 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Fri, 8 Aug 2025 12:51:14 +0200 Subject: [PATCH 35/76] finished first reasonable version of layoutDiff --- .../analysis/diff/edits/HiFiLayoutDiff.rsc | 86 ++++++++++--------- 1 file changed, 44 insertions(+), 42 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc index 2423ce1f757..9a4bf2afa00 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc @@ -26,56 +26,55 @@ extend analysis::diff::edits::HiFiTreeDiff; import ParseTree; // this should not be necessary because imported by HiFiTreeDiff import String; // this should not be be necessary because imported by HiFiTreeDiff - @synopsis{Extract TextEdits for the differences in whitespace between two otherwise identical ((ParseTree))s.} @description{ -This is the top-level wrapper that starts a recursion over the entire parse tree. -We need to keep the span of the current node in order to fill in the possible gaps -where sub-trees are not annotated with source locations. +See ((HiFiLayoutDiff)). } -list[TextEdit] layoutDiff(Tree a, Tree b, bool copyComments = false) - = layoutDiff(a@\loc, a, b, copyComments=copyComments); +list[TextEdit] layoutDiff(Tree original, Tree formatted, bool copyComments = false) { + assert original := formatted : "nothing except layout and keyword fields may be different for layoutDiff to work correctly."; + + @synopsis{rec is the recursive workhorse, doing a pairwise recursion over the original and the formatted tree} + @description{ + We recursively skip over every "equal" pairs of nodes, until we detect two different _layout_ nodes. The original location + of that node is used to construct a replace ((TextEdit)), and optionally the original layout is inspected for + source code comments which may have been lost. Literals are skipped explicitly to avoid arbitrary edits for + case insensitive literals, and to safe some time. + } -// Equal trees -list[TextEdit] layoutDiff(loc _span, Tree a, Tree b, bool copyComments = false) - = [] when a == b; + // if layout differences are detected, here we produce a `replace` node: + list[TextEdit] rec( + t:appl(prod(Symbol tS, _, _), list[Tree] tArgs), // layout is not necessarily parsed with the same rules (i.e. comments are lost!) + u:appl(prod(Symbol uS, _, _), list[Tree] uArgs)) + = [replace(t@\loc, copyComments ? learnComments(t@\loc, "", "") : "") | tArgs != uArgs] + when + delabel(tS) is layouts, + delabel(uS) is layouts, + tArgs != uArgs; -// layout differences are detected, so here we produce a `replace` node: -list[TextEdit] layoutDiff(loc span, - t:appl(prod(layouts(str l), _, _), list[Tree] _), - u:appl(prod(layouts(l), _, _), list[Tree] _), - bool copyComments = false) - = [replace(span, learnComments(t@\loc, "", ""))] when eq(t, u); + // matched literal trees generate empty diffs + list[TextEdit] rec( + appl(prod(lit(_), _, _), list[Tree] _), + appl(prod(lit(_), _, _), list[Tree] _)) + = []; -// the layout was the same as before -list[TextEdit] layoutDiff(loc span, - t:appl(prod(layouts(str l), _, _), list[Tree] _), - t, - bool copyComments = false) - = []; + // matched case-insensitive literal trees generate empty diffs such that the original is maintained + list[TextEdit] rec( + appl(prod(cilit(_), _, _), list[Tree] _), + appl(prod(cilit(_), _, _), list[Tree] _)) + = []; -// matched literal trees generate empty diffs -list[TextEdit] layoutDiff(loc _span, - appl(prod(lit(str l), _, _), list[Tree] _), - appl(prod(lit(l) , _, _), list[Tree] _), - bool copyComments = false) - = []; + // recurse through the entire parse tree to collect layout edits: + default list[TextEdit] rec( + Tree t:appl(Production p, list[Tree] argsA), + appl(p /* must be the same by the above assert */, list[Tree] argsB)) + = [*rec(a, b) | <- zip2(argsA, argsB)]; -// matched case-insensitive literal trees generate empty diffs such that the original is maintained -list[TextEdit] layoutDiff(loc _span, - appl(prod(cilit(str l), _, _), list[Tree] _), - appl(prod(cilit(l) , _, _), list[Tree] _), - bool copyComments = false) - = []; + // first add required locations to layout nodes + original = reposition(original, markLayout=true, markSubLayout=false); + + return rec(original, formatted); +} -// recurse through the parse tree in the right order to collect layout edits -// this default fails when the two compared trees are unequal-modulo-layout, such that -// this precondition is checked and failure to comply is detected as early (high) as possible. -default list[TextEdit] layoutDiff(loc span, - Tree t:appl(Production p, list[Tree] argsA), - t:appl(p, list[Tree] argsB), // note the non-linear equality-modulo-layout check here - bool copyComments = false) - = [*layoutDiff(|todo:///|, a, b, copyComments=copyComments) | <- zip2(argsA, argsB)]; // TODO: here we have to recover the loc of the outermost layout node @synopsis{Make sure the new layout still contains all the source code comments of the original layout} @description{ @@ -94,4 +93,7 @@ private str learnComments(loc span, str replacement, str original, bool copyComm // 4. integrate the new comments with the new whitespace in a smart manner throw "not yet implemented"; } -} \ No newline at end of file +} + +private Symbol delabel(label(_, Symbol t)) = t; +private default Symbol delabel(Symbol x) = x; \ No newline at end of file From d9b9c770d42bae2d5d4a5dab522b66c0f44c37ff Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Fri, 8 Aug 2025 13:03:59 +0200 Subject: [PATCH 36/76] initial comment learning tricks for layoutDiff --- .../analysis/diff/edits/HiFiLayoutDiff.rsc | 32 ++++++++++++------- 1 file changed, 20 insertions(+), 12 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc index 9a4bf2afa00..047d20bd5e4 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc @@ -45,7 +45,7 @@ list[TextEdit] layoutDiff(Tree original, Tree formatted, bool copyComments = fal list[TextEdit] rec( t:appl(prod(Symbol tS, _, _), list[Tree] tArgs), // layout is not necessarily parsed with the same rules (i.e. comments are lost!) u:appl(prod(Symbol uS, _, _), list[Tree] uArgs)) - = [replace(t@\loc, copyComments ? learnComments(t@\loc, "", "") : "") | tArgs != uArgs] + = [replace(t@\loc, copyComments ? learnComments(t, "") : "") | tArgs != uArgs] when delabel(tS) is layouts, delabel(uS) is layouts, @@ -78,20 +78,28 @@ list[TextEdit] layoutDiff(Tree original, Tree formatted, bool copyComments = fal @synopsis{Make sure the new layout still contains all the source code comments of the original layout} @description{ -This algorithm uses a heuristic to detect source code comments inside layout substrings. If the original -layout contains comments, but the replacement layout does not, we re-introduce the comments at the -expected level of indentation. +This algorithm uses the @category("Comments") tag to detect source code comments inside layout substrings. If the original +layout contains comments, we re-introduce the comments at the expected level of indentation. New comments present in the +replacement are also kept. + +This trick is complicated by the syntax of multiline comments and single line comments that have +to end with a newline. } -private str learnComments(loc span, str replacement, str original, bool copyComments = false) { - if (!copyComments) { +private str learnComments(Tree original, str replacement) { + commentStrings = ["" | /c:appl(prod(_,_,{\tag("category"("Comment")), *_}), _) := original]; + + if (commentStrings == []) { return replacement; } - else { - // TODO: 1. detect "non-whitespace" in `original` - // 2. strip leading indentation from the non-whitespace if multiple lines are detected - // 3. re-indent the multiple lines - // 4. integrate the new comments with the new whitespace in a smart manner - throw "not yet implemented"; + + // TODO this is still a w.i.p. + if (/\n/ <- commentStrings) { // multiline + return " + '<}> + '"; + } + else { // single line + return "<}>"; } } From 2ae0e4e9646f9fe8779706a2a5f18b46eb1bc621 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Fri, 8 Aug 2025 13:11:41 +0200 Subject: [PATCH 37/76] comments --- .../library/analysis/diff/edits/HiFiLayoutDiff.rsc | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc index 047d20bd5e4..217aa41d3b1 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc @@ -30,15 +30,15 @@ import String; // this should not be be necessary because imported by HiFiTreeDi @description{ See ((HiFiLayoutDiff)). } -list[TextEdit] layoutDiff(Tree original, Tree formatted, bool copyComments = false) { +list[TextEdit] layoutDiff(Tree original, Tree formatted, bool copyComments = true) { assert original := formatted : "nothing except layout and keyword fields may be different for layoutDiff to work correctly."; @synopsis{rec is the recursive workhorse, doing a pairwise recursion over the original and the formatted tree} @description{ We recursively skip over every "equal" pairs of nodes, until we detect two different _layout_ nodes. The original location - of that node is used to construct a replace ((TextEdit)), and optionally the original layout is inspected for - source code comments which may have been lost. Literals are skipped explicitly to avoid arbitrary edits for - case insensitive literals, and to safe some time. + of that node and the new contents of the formatted node is used to construct a replace ((TextEdit)), and + optionally the original layout is inspected for source code comments which may have been lost. Literals are skipped + explicitly to avoid arbitrary edits for case insensitive literals, and to safe some time. } // if layout differences are detected, here we produce a `replace` node: @@ -92,7 +92,8 @@ private str learnComments(Tree original, str replacement) { return replacement; } - // TODO this is still a w.i.p. + // TODO this is still a w.i.p. + // TODO: can we guarantee that these changes are grammatically correct? probably not.. if (/\n/ <- commentStrings) { // multiline return " '<}> From a38434400943de678684899947b7cb89dae74ad1 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Fri, 8 Aug 2025 13:19:49 +0200 Subject: [PATCH 38/76] minor --- .../library/analysis/diff/edits/HiFiTreeDiff.rsc | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index 8ab3faaf021..40e1c799f7e 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -52,7 +52,7 @@ and comments, of the result. With this module we bring these two modalities of source-to-source transformations together: 1. The language engineer uses concrete syntax rewrite rules to derive a new ParseTree from the original; -2. We run ((treeDiff)) to obtain a set of minimal text edit; +2. We run ((treeDiff)) to obtain a set of minimal text edits; 3. We apply the text edits to the editor contents or the file system. } @benefits{ @@ -78,7 +78,7 @@ testing. If the trees are first independently serialized to disk and then deseri this optimization is not present and the algorithm will perform (very) poorly. * Substitution patterns should be formatted as best as possible. The algorithm will not infer spacing or relative indentation inside of the substituted subtree. It will only infer indentation -for the entire subtree. +for the entire subtree. Another way of resolving this is using a code formatter on the subsituted patterns. } module analysis::diff::edits::HiFiTreeDiff @@ -109,7 +109,7 @@ However, the parsed tree could be different from the derived tree in terms of wh * when tree nodes (grammar rules) are equal, smaller edits are searched by pair-wise comparison of the children * differences between respective layout or (case insensitve) literal nodes are always ignored * when lists have changed, careful editing of possible separators ensures syntactic correctness -* when new sub-trees are inserted, the replacement will be at the same indentation level as the original. (((TODO this is a todo))) +* when new sub-trees are inserted, the replacement will be at the same indentation level as the original. * when case-insensitive literals have been changed under a grammar rule that remained the same, no edits are produced. The function comes in handy when we use Rascal to rewrite parse trees, and then need to communicate the effect @@ -240,13 +240,10 @@ list[TextEdit] treeDiff( default list[TextEdit] treeDiff(t:appl(Production p, list[Tree] argsA), appl(p, list[Tree] argsB)) = [*treeDiff(a, b) | <- zip2(argsA, argsB)] when bprintln("into

on both sides"); // TODO remove debug print - - - @synopsis{decide how many separators we have} -int seps(\iter-seps(_, list[Symbol] s)) = size(s); -int seps(\iter-star-seps(_, list[Symbol] s)) = size(s); -default int seps(Symbol _) = 0; +private int seps(\iter-seps(_, list[Symbol] s)) = size(s); +private int seps(\iter-star-seps(_, list[Symbol] s)) = size(s); +private default int seps(Symbol _) = 0; @synopsis{List diff is like text diff on lines; complex and easy to make slow} @description{ From 6112a814362ffcea7e33c8095f1683e4b948ab3d Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Fri, 8 Aug 2025 13:28:44 +0200 Subject: [PATCH 39/76] worked around ambiguities with character class types using an alias --- src/org/rascalmpl/library/ParseTree.rsc | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/src/org/rascalmpl/library/ParseTree.rsc b/src/org/rascalmpl/library/ParseTree.rsc index 6b2e1651d16..b460c94133a 100644 --- a/src/org/rascalmpl/library/ParseTree.rsc +++ b/src/org/rascalmpl/library/ParseTree.rsc @@ -814,6 +814,9 @@ bool isNonTerminalType(Symbol::\parameterized-lex(str _, list[Symbol] _)) = true bool isNonTerminalType(Symbol::\start(Symbol s)) = isNonTerminalType(s); default bool isNonTerminalType(Symbol s) = false; +private alias NewLineChar = [\n]; +private alias ReturnChar = [\t]; + @synopsis{Re-compute and overwrite origin locations for all sub-trees of a ((Tree))} @description{ This function takes a ((Tree)) and overwrites the old \loc annotations of every subtree @@ -909,11 +912,11 @@ yield of a tree should always produce the exact same locations as ((reposition)) curOffset += 1; switch (t) { - case [\r] _: { + case ReturnChar _: { curColumn = 0; } - - case [\n] _: { + + case NewLineChar _ : { curLine += 1; curColumn = 0; } From 3fbe31dbf37dad09726a411b71c299199a05e21c Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Sun, 10 Aug 2025 14:38:29 +0200 Subject: [PATCH 40/76] reposition by defaults reproduces parse --- src/org/rascalmpl/library/ParseTree.rsc | 4 ++-- .../rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc | 2 +- .../rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc | 2 +- src/org/rascalmpl/library/util/Highlight.rsc | 2 +- 4 files changed, 5 insertions(+), 5 deletions(-) diff --git a/src/org/rascalmpl/library/ParseTree.rsc b/src/org/rascalmpl/library/ParseTree.rsc index b460c94133a..aae4603d16e 100644 --- a/src/org/rascalmpl/library/ParseTree.rsc +++ b/src/org/rascalmpl/library/ParseTree.rsc @@ -869,8 +869,8 @@ yield of a tree should always produce the exact same locations as ((reposition)) bool \markLexical = true, bool \markSubLexical = false, bool \markRegular = true, - bool \markLayout = false, - bool \markSubLayout = false, + bool \markLayout = true, + bool \markSubLayout = true, bool \markLit = false, bool \markSubLit = false, bool \markAmb = false, diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc index 217aa41d3b1..924a646751c 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc @@ -70,7 +70,7 @@ list[TextEdit] layoutDiff(Tree original, Tree formatted, bool copyComments = tru = [*rec(a, b) | <- zip2(argsA, argsB)]; // first add required locations to layout nodes - original = reposition(original, markLayout=true, markSubLayout=false); + original = reposition(original, markLit=true, markLayout=true, markSubLayout=true); return rec(original, formatted); } diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index 40e1c799f7e..dfca8f36aed 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -6,7 +6,7 @@ Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, -this list of conditions and the following disclaimer. +this litst of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation diff --git a/src/org/rascalmpl/library/util/Highlight.rsc b/src/org/rascalmpl/library/util/Highlight.rsc index 65f8d530690..022fab074a0 100644 --- a/src/org/rascalmpl/library/util/Highlight.rsc +++ b/src/org/rascalmpl/library/util/Highlight.rsc @@ -1,4 +1,4 @@ - + = @license{ Copyright (c) 2013-2024 CWI All rights reserved. This program and the accompanying materials From 89efd7a3fbc9e73848bc107f5787ff3935fe7e27 Mon Sep 17 00:00:00 2001 From: Toine Hartman Date: Mon, 11 Aug 2025 11:38:48 +0200 Subject: [PATCH 41/76] Fix failing tests. --- .../tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc index 12cad41f768..cb19dd66197 100644 --- a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc +++ b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc @@ -1,6 +1,7 @@ module lang::rascal::tests::library::analysis::diff::edits::HiFiTreeDiffTests extend analysis::diff::edits::ExecuteTextEdits; +extend analysis::diff::edits::HiFiLayoutDiff; extend analysis::diff::edits::HiFiTreeDiff; extend lang::pico::\syntax::Main; @@ -151,7 +152,7 @@ test bool naturalToStringAndAtoBTest() test bool nulTestWithIdLayout() = editsAreSyntacticallyCorrect(#start[Program], simpleExample, identity, layoutDiff) - && editsMaintainIndentationLevels(#start[Program], simpleExample, indent(), layoutDiff); + && editsMaintainIndentationLevels(#start[Program], simpleExample, identity, layoutDiff); test bool indentAllLayout() = editsAreSyntacticallyCorrect(#start[Program], simpleExample, indent(), layoutDiff) @@ -159,4 +160,4 @@ test bool indentAllLayout() test bool insertSpacesInDeclarationLayout() = editsAreSyntacticallyCorrect(#start[Program], simpleExample, insertSpacesInDeclaration, layoutDiff) - && editsMaintainIndentationLevels(#start[Program], simpleExample, indent(), layoutDiff); + && editsMaintainIndentationLevels(#start[Program], simpleExample, insertSpacesInDeclaration, layoutDiff); From a9f675a515ae801ad76ed14d53012960ff66e578 Mon Sep 17 00:00:00 2001 From: Toine Hartman Date: Mon, 11 Aug 2025 11:40:58 +0200 Subject: [PATCH 42/76] Add base case for cycles and chars. --- .../library/analysis/diff/edits/HiFiLayoutDiff.rsc | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc index 924a646751c..365cb84deb0 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc @@ -69,6 +69,13 @@ list[TextEdit] layoutDiff(Tree original, Tree formatted, bool copyComments = tru appl(p /* must be the same by the above assert */, list[Tree] argsB)) = [*rec(a, b) | <- zip2(argsA, argsB)]; + default list[TextEdit] rec( + Tree t, + t + ) = [] + when t is char + || t is cycle; + // first add required locations to layout nodes original = reposition(original, markLit=true, markLayout=true, markSubLayout=true); From 7f733934e2201bd1e3ed06eb27a144ad14b84edd Mon Sep 17 00:00:00 2001 From: Toine Hartman Date: Mon, 11 Aug 2025 11:41:22 +0200 Subject: [PATCH 43/76] Remove accidental character. --- src/org/rascalmpl/library/util/Highlight.rsc | 1 - 1 file changed, 1 deletion(-) diff --git a/src/org/rascalmpl/library/util/Highlight.rsc b/src/org/rascalmpl/library/util/Highlight.rsc index 022fab074a0..54ebdc4c04a 100644 --- a/src/org/rascalmpl/library/util/Highlight.rsc +++ b/src/org/rascalmpl/library/util/Highlight.rsc @@ -1,4 +1,3 @@ - = @license{ Copyright (c) 2013-2024 CWI All rights reserved. This program and the accompanying materials From 78654482e8199d54d7ae3e9f6a36f6f8b7552764 Mon Sep 17 00:00:00 2001 From: Toine Hartman Date: Mon, 11 Aug 2025 11:57:04 +0200 Subject: [PATCH 44/76] Write base case with explicit patterns. --- .../library/analysis/diff/edits/HiFiLayoutDiff.rsc | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc index 365cb84deb0..9a28b1b1b51 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc @@ -70,11 +70,14 @@ list[TextEdit] layoutDiff(Tree original, Tree formatted, bool copyComments = tru = [*rec(a, b) | <- zip2(argsA, argsB)]; default list[TextEdit] rec( - Tree t, - t - ) = [] - when t is char - || t is cycle; + char(int c), + char(c) + ) = []; + + default list[TextEdit] rec( + cycle(Symbol s, int l), + cycle(s, l) + ) = []; // first add required locations to layout nodes original = reposition(original, markLit=true, markLayout=true, markSubLayout=true); From 21d08bbfd99a5bf8c80d5dd01457d345e716ca14 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Mon, 11 Aug 2025 11:23:46 +0200 Subject: [PATCH 45/76] added first test for reposition --- .../lang/rascal/tests/basic/RepositionTree.rsc | 15 +++++++++++++++ 1 file changed, 15 insertions(+) create mode 100644 src/org/rascalmpl/library/lang/rascal/tests/basic/RepositionTree.rsc diff --git a/src/org/rascalmpl/library/lang/rascal/tests/basic/RepositionTree.rsc b/src/org/rascalmpl/library/lang/rascal/tests/basic/RepositionTree.rsc new file mode 100644 index 00000000000..1f89ed4ebc5 --- /dev/null +++ b/src/org/rascalmpl/library/lang/rascal/tests/basic/RepositionTree.rsc @@ -0,0 +1,15 @@ +module lang::rascal::tests::basic::RepositionTree + +import ParseTree; +import lang::pico::\syntax::Main; + +loc facPico = |project://rascal/src/org/rascalmpl/library/lang/pico/examples/fac.pico|; + +private list[loc] collect(Tree t) = [s@\loc | /Tree s := t, s@\loc?]; + +test bool repositionSimulatesReparse() { + t1 = parse(#start[Program], facPico); + t2 = reposition(t1); // defaults set + assert t1 := t2; // but that skips keyword parameters and layout + return collect(t1) == collect(t2); +} \ No newline at end of file From 70b9140a7c2c9f182f5e4e69500795bc8bf478d2 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Mon, 11 Aug 2025 12:02:21 +0200 Subject: [PATCH 46/76] added missing documentation on Tree@\loc --- src/org/rascalmpl/library/ParseTree.rsc | 74 +++++++++++++++++++------ 1 file changed, 58 insertions(+), 16 deletions(-) diff --git a/src/org/rascalmpl/library/ParseTree.rsc b/src/org/rascalmpl/library/ParseTree.rsc index aae4603d16e..b9245ac1509 100644 --- a/src/org/rascalmpl/library/ParseTree.rsc +++ b/src/org/rascalmpl/library/ParseTree.rsc @@ -354,9 +354,52 @@ Production associativity(Symbol s, Associativity as, {*Production a, priority(Sy @synopsis{Annotate a parse tree node with a source location.} +@description{ +A generated ((parser)) will produce ((Tree)) instances annotated with @\loc. In this way every node knows its own precise +range in the file, _and_ the file it originally came from. The ((reposition)) function +can simulate the same behavior without erasing other information (keyword parameters) that was produced after parsing. + +It is here, with ((parser)), ((parsers)) and ((reposition)), that location information is given its exact semantics for parse ((Tree))s: +* The URI points to a single file location that is the source (or target) for the current parse tree. +* Right after parsing and repositioning, the URI is the same for all \@loc annotation in a single ((Tree)) instance. +However, after tree rewriting this is not the case anymore. +* The `offset` is _zero-based_, inclusive, and is increasing from left to right, as long as the tree has not changed yet. +The offset of the very first character in a file is `0`. +* The `length` is always zero or positive. The length of a character (Unicode codepoint) is always 1, even if it is a control +code like `\n` or `\r`. Even `\t` has length `1`! +* The `begin.line` is _one-based_, inclusive, and increasing from top to bottom, as long as the tree has not changed yet. This follows the +POSIX convention that the first line on a screen or a punch card is labeled with `1`. +* The `begin.column` is _zero-based_, inclusive, and increasing from left-to-right, as long as the tree has not changed yet. +The column is reset to `0` on `\r` and `\n` characters. Zero based columns are also a POSIX convention. It is sometimes motivated +by the `|` bar cursor being _before_ the first character initially. +* The `end.line` is _one-based_ and inclusive, always larger or equal than `begin.line`. +* The `end.column` is _zero-based_ and inclusive, and _not_ always larger or equal than `begin.column`. That's true only if `begin.line == end.line`. +} +@benefits{ +* @\loc can be used to point to the origins of trees, even if rewritten parse trees are composed of values +from different sources, their @\loc value will explain where they come from. This can be used to construct +debugging interfaces for DSLs and PLs, for example. +* @\loc contains offset/length and line/column information to cater for all kinds of different ways that editors work. +* @\loc follows POSIX conventions to help in minimizing off-by-one errors when mapping to editor APIs +* @\loc indexes work on the basic concept of an "abstract character", namely Unicode codepoints. The character is +what most easily relates to what a users sees as a character on the screen. +} +@pitfalls{ +* @\loc is based on Unicode's abstract characters, a.k.a. codepoints. If your editor is byte-based or follows another character +encoding than the 24-bit integer codepoints (e.g. java/javascript 16-bit characters), then you need smart just-in-time bidirectional +conversion methods to make sure selection and highlighting ranges (for example) are always exact. +* If a concrete character ("grapheme") on screen is composed of several abstract characters ("codepoints"), then the @\loc +character metaphor breaks. It depends on how the editor internally handles graphemes and on the way it is connected to Rascal +what the effect for the user is. +* @\loc annotations make ((Tree)) instances _unique_ ,where otherwise they could be semantically and syntactically equivalent. +Therefor if you want to test for ((Tree)) (in)equality, always use `t1 := t2` and `t1 !:= t2`. Pattern matching already automatically +ignores @\loc annotations and whitespace and comments. +* Annotated trees are strictly too big for optimal memory usage. Often `@\loc` is the first and only annotation, so it introduces a map for keyword parameters +for every node. Also more nodes are different, impeding in optimal reference sharing. If you require long time storage of many +parse trees it may be useful to strip them of annotations for selected categories of nodes, using ((reposition)). +} anno loc Tree@\loc; - @synopsis{Parse input text (from a string or a location) and return a parse tree.} @description{ * Parse a string and return a parse tree. @@ -740,38 +783,30 @@ data Exp = add(Exp, Exp); } java &T<:value implode(type[&T<:value] t, Tree tree); - @synopsis{Annotate a parse tree node with an (error) message.} anno Message Tree@message; - @synopsis{Annotate a parse tree node with a list of (error) messages.} anno set[Message] Tree@messages; - @synopsis{Annotate a parse tree node with a documentation string.} anno str Tree@doc; - @synopsis{Annotate a parse tree node with documentation strings for several locations.} anno map[loc,str] Tree@docs; - - @synopsis{Annotate a parse tree node with the target of a reference.} anno loc Tree@link; - @synopsis{Annotate a parse tree node with multiple targets for a reference.} anno set[loc] Tree@links; - -@synopsis{Annotate the top of the tree with hyperlinks between entities in the tree (or other trees) - -This is similar to link and links annotations, except that you can put it as one set at the top of the tree.} +@synopsis{Annotate the top of the tree with hyperlinks between entities in the tree (or other trees)} +@description{ +This is similar to link and links annotations, except that you can put it as one set at the top of the tree. +} anno rel[loc,loc] Tree@hyperlinks; - @synopsis{Tree search result type for ((treeAt)).} data TreeSearchResult[&T<:Tree] = treeFound(&T tree) | treeNotFound(); @@ -860,14 +895,15 @@ yield of a tree should always produce the exact same locations as ((reposition)) @benefits{ * Unlike reparsing, ((reposition)) will maintain all other keyword parameters of ((Tree)) nodes, like resolved qualified names and type attributes. * Can be used to erase superfluous annotations for memory efficiency, while keeping the essential ones. -* +* The default mark options simulatete the behavior of ((parser)) functions. } &T <: Tree reposition( &T <: Tree tree, loc file = tree@\loc.top, + bool \markStart = true, bool \markSyntax = true, bool \markLexical = true, - bool \markSubLexical = false, + bool \markSubLexical = true, bool \markRegular = true, bool \markLayout = true, bool \markSubLayout = true, @@ -886,13 +922,18 @@ yield of a tree should always produce the exact same locations as ((reposition)) default bool doAnno(Production _) = false; bool doAnno(prod(\lex(_), _, _)) = markLexical; bool doAnno(prod(\label(_, \lex(_)), _, _)) = markLexical; + bool doAnno(prod(\parameterized-lex(_, _), _, _)) = markLexical; + bool doAnno(prod(\label(_, \parameterized-lex(_, _)), _, _)) = markLexical; bool doAnno(prod(\layouts(_), _, _)) = markLayout; bool doAnno(prod(\label(_, \layouts(_)), _, _)) = markLayout; bool doAnno(prod(\sort(_), _, _)) = markSyntax; bool doAnno(prod(\label(_, \sort(_)), _, _)) = markSyntax; + bool doAnno(prod(\parameterized-sort(_, _), _, _)) = markSyntax; + bool doAnno(prod(\label(_, \parameterized-sort(_, _)), _, _)) = markSyntax; bool doAnno(\regular(_)) = markRegular; bool doAnno(prod(\lit(_), _, _)) = markLit; bool doAnno(prod(\cilit(_), _, _)) = markLit; + bool doAnno(prod(\start(_), _, _)) = markStart; @synopsis{Check if sub-structure of this rule is configured to be annotated} default bool doSub(Production _) = true; @@ -910,6 +951,7 @@ yield of a tree should always produce the exact same locations as ((reposition)) beginColumn = curColumn; curOffset += 1; + curColumn += 1; switch (t) { case ReturnChar _: { @@ -944,7 +986,7 @@ yield of a tree should always produce the exact same locations as ((reposition)) // once `sub` is false, going down, we can never turn it on again newArgs = [mergeRec(a, sub && doSub(prod)) | a <- args]; - return sub && doAnno(prod) + return (sub && doAnno(prod)) ? appl(prod, newArgs)[@\loc=file(beginOffset, curOffset - beginOffset, , )] : appl(prod, newArgs) ; From 7f87c8a4ca3cf9e1bb046930f0ff68fe4dc77978 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Mon, 11 Aug 2025 12:02:36 +0200 Subject: [PATCH 47/76] added tests --- .../rascal/tests/basic/RepositionTree.rsc | 28 +++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/src/org/rascalmpl/library/lang/rascal/tests/basic/RepositionTree.rsc b/src/org/rascalmpl/library/lang/rascal/tests/basic/RepositionTree.rsc index 1f89ed4ebc5..66618f19506 100644 --- a/src/org/rascalmpl/library/lang/rascal/tests/basic/RepositionTree.rsc +++ b/src/org/rascalmpl/library/lang/rascal/tests/basic/RepositionTree.rsc @@ -1,5 +1,6 @@ module lang::rascal::tests::basic::RepositionTree +import List; import ParseTree; import lang::pico::\syntax::Main; @@ -12,4 +13,31 @@ test bool repositionSimulatesReparse() { t2 = reposition(t1); // defaults set assert t1 := t2; // but that skips keyword parameters and layout return collect(t1) == collect(t2); +} + +test bool removeAllAnnotations() { + t1 = parse(#start[Program], facPico); + t2 = reposition(t1, + markSyntax=false, + markLexical=false, + markSubLexical=false, + markAmb=false, + markChar=false, + markLayout=false, + markLit=false, + markStart=false, + markSubLit=false, + markSubLayout=false, + markRegular=false); + assert t1 := t2; // but that skips keyword parameters and layout + return collect(t2) == []; +} + +test bool charsFromLeftToRight() { + t1 = parse(#start[Program], facPico); + t2 = reposition(t1, markChar=true); + allChars = [ch | /ch:char(_) := t2]; + sortedChars = sort(allChars, bool (c1, c2) { return c1@\loc.offset < c2@\loc.offset;}); + + return allChars == sortedChars; } \ No newline at end of file From 4f27ec997c84efe6b772d2d485b039ea21f7c7b1 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Mon, 11 Aug 2025 12:08:14 +0200 Subject: [PATCH 48/76] removed debug println --- .../rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc | 7 ++----- .../library/analysis/diff/edits/HiFiTreeDiffTests.rsc | 2 -- 2 files changed, 2 insertions(+), 7 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index dfca8f36aed..c6e852ebc05 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -224,10 +224,7 @@ list[TextEdit] treeDiff( default list[TextEdit] treeDiff( t:appl(Production p:prod(_,_,_), list[Tree] _), r:appl(Production q:!p , list[Tree] _)) - { - rprintln(t); // TODO remove debug statement - return [replace(t@\loc, learnIndentation(t@\loc, "", ""))]; - } + = [replace(t@\loc, learnIndentation(t@\loc, "", ""))]; // If list production are the same, then the element lists can still be of different length // and we switch to listDiff which has different heuristics than normal trees to detect large identical sublists. @@ -238,7 +235,7 @@ list[TextEdit] treeDiff( // When the productions are equal, but the children may be different, we dig deeper for differences default list[TextEdit] treeDiff(t:appl(Production p, list[Tree] argsA), appl(p, list[Tree] argsB)) - = [*treeDiff(a, b) | <- zip2(argsA, argsB)] when bprintln("into

on both sides"); // TODO remove debug print + = [*treeDiff(a, b) | <- zip2(argsA, argsB)]; @synopsis{decide how many separators we have} private int seps(\iter-seps(_, list[Symbol] s)) = size(s); diff --git a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc index cb19dd66197..aab839c6a23 100644 --- a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc +++ b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc @@ -31,8 +31,6 @@ bool editsAreSyntacticallyCorrect(type[&T<:Tree] grammar, str example, (&T<:Tree transformed = transform(orig); edits = diff(orig, transformed); edited = executeTextEdits(example, edits); - println(" leads to:"); - iprintln(edits); try { if (transformed := parse(grammar, edited)) { From 6e3d0cd56e0aafae398411beb3c5cdd6673773a9 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Mon, 11 Aug 2025 12:53:10 +0200 Subject: [PATCH 49/76] optimized base cases --- .../analysis/diff/edits/HiFiLayoutDiff.rsc | 20 ++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc index 9a28b1b1b51..6c8b68f11ba 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc @@ -63,21 +63,23 @@ list[TextEdit] layoutDiff(Tree original, Tree formatted, bool copyComments = tru appl(prod(cilit(_), _, _), list[Tree] _)) = []; + list[TextEdit] rec( + char(_), + char(_) + ) = []; + + list[TextEdit] rec( + cycle(Symbol _, int _), + cycle(_, _) + ) = []; + // recurse through the entire parse tree to collect layout edits: default list[TextEdit] rec( Tree t:appl(Production p, list[Tree] argsA), appl(p /* must be the same by the above assert */, list[Tree] argsB)) = [*rec(a, b) | <- zip2(argsA, argsB)]; - default list[TextEdit] rec( - char(int c), - char(c) - ) = []; - - default list[TextEdit] rec( - cycle(Symbol s, int l), - cycle(s, l) - ) = []; + // first add required locations to layout nodes original = reposition(original, markLit=true, markLayout=true, markSubLayout=true); From 69e3e1e2c925d204496aa1383f7a8a5ee6814f51 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Mon, 11 Aug 2025 14:31:13 +0200 Subject: [PATCH 50/76] added 5 modes of case-insensitive formatting to layoutDiff --- .../analysis/diff/edits/HiFiLayoutDiff.rsc | 86 ++++++++++++++----- 1 file changed, 65 insertions(+), 21 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc index 6c8b68f11ba..652bcbd4af2 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc @@ -13,6 +13,7 @@ translating the tree to a `str` using some formatting tools, and then reparsing } @pitfalls{ * if `originalTree !:= formattedTree` the algorithm will produce junk. It will break the syntactical correctness of the source code and forget source code comments. +* if comments are not marked with `@category("Comment")` in the original grammar, then this algorithm can not recover them. } @benefits{ * Recovers source code comments which have been lost during earlier steps in the formatting pipeline. This makes losing source code comments an independent concern of a declarative formatter. @@ -26,11 +27,20 @@ extend analysis::diff::edits::HiFiTreeDiff; import ParseTree; // this should not be necessary because imported by HiFiTreeDiff import String; // this should not be be necessary because imported by HiFiTreeDiff +@synopsis{Normalization choices for case-insensitive literals.} +data CaseInsensitivity + = toLower() + | toUpper() + | toCapitalized() + | asOriginal() + | asFormatted() + ; + @synopsis{Extract TextEdits for the differences in whitespace between two otherwise identical ((ParseTree))s.} @description{ See ((HiFiLayoutDiff)). } -list[TextEdit] layoutDiff(Tree original, Tree formatted, bool copyComments = true) { +list[TextEdit] layoutDiff(Tree original, Tree formatted, bool recoverComments = true, CaseInsensitivity ci = asOriginal()) { assert original := formatted : "nothing except layout and keyword fields may be different for layoutDiff to work correctly."; @synopsis{rec is the recursive workhorse, doing a pairwise recursion over the original and the formatted tree} @@ -45,7 +55,7 @@ list[TextEdit] layoutDiff(Tree original, Tree formatted, bool copyComments = tru list[TextEdit] rec( t:appl(prod(Symbol tS, _, _), list[Tree] tArgs), // layout is not necessarily parsed with the same rules (i.e. comments are lost!) u:appl(prod(Symbol uS, _, _), list[Tree] uArgs)) - = [replace(t@\loc, copyComments ? learnComments(t, "") : "") | tArgs != uArgs] + = [replace(t@\loc, recoverComments ? learnComments(t, u) : "") | tArgs != uArgs] when delabel(tS) is layouts, delabel(uS) is layouts, @@ -57,11 +67,27 @@ list[TextEdit] layoutDiff(Tree original, Tree formatted, bool copyComments = tru appl(prod(lit(_), _, _), list[Tree] _)) = []; - // matched case-insensitive literal trees generate empty diffs such that the original is maintained + // matched case-insensitive literal trees generate empty diffs such that the original is maintained. + // however, we also offer some convenience functionality to standardize their formatting right here. list[TextEdit] rec( - appl(prod(cilit(_), _, _), list[Tree] _), - appl(prod(cilit(_), _, _), list[Tree] _)) - = []; + t:appl(prod(cilit(_), _, _), list[Tree] _), + appl(prod(cilit(_), _, _), list[Tree] _)) { + + str yield = ""; + + switch (ci) { + case asOriginal(): + return []; + case asFormatted(): + return [replace(t@\loc, "") | "" != yield]; + case toUpper(): + return [replace(t@loc, result) | str result := toUpperCase(yield), result != yield]; + case toLower(): + return [replace(t@loc, result) | str result := toLowerCase(yield), result != yield]; + case toCapitalized(): + return [replace(t@loc, result) | str result := capitalize(yield), result != yield] + } + } list[TextEdit] rec( char(_), @@ -87,33 +113,51 @@ list[TextEdit] layoutDiff(Tree original, Tree formatted, bool copyComments = tru return rec(original, formatted); } - @synopsis{Make sure the new layout still contains all the source code comments of the original layout} @description{ This algorithm uses the @category("Comments") tag to detect source code comments inside layout substrings. If the original layout contains comments, we re-introduce the comments at the expected level of indentation. New comments present in the -replacement are also kept. +replacement are kept and will overwrite any original comments. This trick is complicated by the syntax of multiline comments and single line comments that have to end with a newline. } -private str learnComments(Tree original, str replacement) { - commentStrings = ["" | /c:appl(prod(_,_,{\tag("category"("Comment")), *_}), _) := original]; +@benefits{ +* if comments are kepts and formatted by tools like Tree2Box, then this algorithm does not overwrite these. +* if comments were completely lost, then this algorithm _always_ puts them back (under assumptions of ((layoutDiff))) +* recovered comments are indented according to the indentation discovered in the _formatted_ replacement tree. +} +@pitfalls{ +* if comments are not marked with `@category("Comment")` in the original grammar, then this algorithm recovers nothing. +} +private str learnComments(Tree original, Tree replacement) { + originalComments = ["" | /c:appl(prod(_,_,{\tag("category"("Comment")), *_}), _) := original]; - if (commentStrings == []) { - return replacement; + if (originalComments == []) { + // if the original did not contain comments, stick with the replacements + return ""; } - // TODO this is still a w.i.p. - // TODO: can we guarantee that these changes are grammatically correct? probably not.. - if (/\n/ <- commentStrings) { // multiline - return " - '<}> - '"; - } - else { // single line - return "<}>"; + replacementComments = ["" | /c:appl(prod(_,_,{\tag("category"("Comment")), *_}), _) := replacement]; + + if (replacementComments != []) { + // if the replacement contains comments, we assume they've been accurately retained by a previous stage (like Tree2Box): + return ""; } + + // At this point, we know that: (a) comments are not present in the replacement and (b) they used to be there in the original. + // So the old comments are going to be the new output. however, we want to learn indentation from the replacement. + + // Drop the last newline of single-line comments, because we don't want two newlines in the output for every comment: + str dropEndNl(str line:/^.*\n$/) = (line[..-1]); + default str dropEndNl(str line) = line; + + // the first line of the replacement is the indentation to use. + str replacementIndent = split("\n", "")[0]; + + // trimming each line makes sure we forget about the original indentation, and drop accidental spaces after comment lines + return " + '<}>"; } private Symbol delabel(label(_, Symbol t)) = t; From 0e0b252edf764434cbdb13c9f6982d8dec6592c4 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Mon, 11 Aug 2025 14:32:03 +0200 Subject: [PATCH 51/76] fixed parse error --- .../rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc index 652bcbd4af2..efb54494066 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc @@ -72,7 +72,7 @@ list[TextEdit] layoutDiff(Tree original, Tree formatted, bool recoverComments = list[TextEdit] rec( t:appl(prod(cilit(_), _, _), list[Tree] _), appl(prod(cilit(_), _, _), list[Tree] _)) { - + str yield = ""; switch (ci) { @@ -85,7 +85,7 @@ list[TextEdit] layoutDiff(Tree original, Tree formatted, bool recoverComments = case toLower(): return [replace(t@loc, result) | str result := toLowerCase(yield), result != yield]; case toCapitalized(): - return [replace(t@loc, result) | str result := capitalize(yield), result != yield] + return [replace(t@loc, result) | str result := capitalize(yield), result != yield]; } } From 028b42828fae98e07d3cab5a6fb3cd4d8fc53391 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Mon, 11 Aug 2025 14:33:42 +0200 Subject: [PATCH 52/76] more parse errors. still learning how to work with error recovery in the editor --- .../library/analysis/diff/edits/HiFiLayoutDiff.rsc | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc index efb54494066..a3f44e6cc53 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc @@ -79,13 +79,13 @@ list[TextEdit] layoutDiff(Tree original, Tree formatted, bool recoverComments = case asOriginal(): return []; case asFormatted(): - return [replace(t@\loc, "") | "" != yield]; + return [replace(t@\loc, result) | str result := "", result != yield]; case toUpper(): - return [replace(t@loc, result) | str result := toUpperCase(yield), result != yield]; + return [replace(t@\loc, result) | str result := toUpperCase(yield), result != yield]; case toLower(): - return [replace(t@loc, result) | str result := toLowerCase(yield), result != yield]; + return [replace(t@\loc, result) | str result := toLowerCase(yield), result != yield]; case toCapitalized(): - return [replace(t@loc, result) | str result := capitalize(yield), result != yield]; + return [replace(t@\loc, result) | str result := capitalize(yield), result != yield]; } } From 3beb15f125d606e0e0b325b88fdc70eb53194233 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Mon, 11 Aug 2025 14:35:58 +0200 Subject: [PATCH 53/76] added default option --- .../rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc index a3f44e6cc53..a426a01ed78 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc @@ -86,6 +86,8 @@ list[TextEdit] layoutDiff(Tree original, Tree formatted, bool recoverComments = return [replace(t@\loc, result) | str result := toLowerCase(yield), result != yield]; case toCapitalized(): return [replace(t@\loc, result) | str result := capitalize(yield), result != yield]; + default: + throw "unexpected option: "; } } From a136daffc7164d92a452a442730a36b6a543609f Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Mon, 11 Aug 2025 14:44:01 +0200 Subject: [PATCH 54/76] fixed comment --- src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc | 1 + 1 file changed, 1 insertion(+) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc index a426a01ed78..aff753d6d28 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc @@ -18,6 +18,7 @@ translating the tree to a `str` using some formatting tools, and then reparsing @benefits{ * Recovers source code comments which have been lost during earlier steps in the formatting pipeline. This makes losing source code comments an independent concern of a declarative formatter. * Recovers the original capitalization of case-insensitive literals which may have been lost during earlier steps in the formatting pipeline. +* Can standardize the layout of case insensitive literals to ALLCAPS, all lowercase, or capitalized. Or can leave the literal as it was formatted by an earlier stage. * Is agnostic towards the design of earlier steps in the formatting pipeline, so lang as `formattedTree := originalTree`. This means that the pipeline may change layout (whitespace and comments and capitalization of case-insensitive literals), but nothing else. } From 64b4b1a8ddc2ea348410e9ebd3a242bb7309adcc Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Mon, 11 Aug 2025 14:46:47 +0200 Subject: [PATCH 55/76] aligned CI options of Tree2Box with layoutDiff --- .../library/analysis/diff/edits/HiFiLayoutDiff.rsc | 6 +++--- src/org/rascalmpl/library/lang/box/util/Tree2Box.rsc | 2 ++ 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc index aff753d6d28..692b95941c3 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc @@ -33,7 +33,7 @@ data CaseInsensitivity = toLower() | toUpper() | toCapitalized() - | asOriginal() + | asIs() | asFormatted() ; @@ -41,7 +41,7 @@ data CaseInsensitivity @description{ See ((HiFiLayoutDiff)). } -list[TextEdit] layoutDiff(Tree original, Tree formatted, bool recoverComments = true, CaseInsensitivity ci = asOriginal()) { +list[TextEdit] layoutDiff(Tree original, Tree formatted, bool recoverComments = true, CaseInsensitivity ci = asIs()) { assert original := formatted : "nothing except layout and keyword fields may be different for layoutDiff to work correctly."; @synopsis{rec is the recursive workhorse, doing a pairwise recursion over the original and the formatted tree} @@ -77,7 +77,7 @@ list[TextEdit] layoutDiff(Tree original, Tree formatted, bool recoverComments = str yield = ""; switch (ci) { - case asOriginal(): + case asIs(): return []; case asFormatted(): return [replace(t@\loc, result) | str result := "", result != yield]; diff --git a/src/org/rascalmpl/library/lang/box/util/Tree2Box.rsc b/src/org/rascalmpl/library/lang/box/util/Tree2Box.rsc index 0c0f3f13596..c8fe25b56b7 100644 --- a/src/org/rascalmpl/library/lang/box/util/Tree2Box.rsc +++ b/src/org/rascalmpl/library/lang/box/util/Tree2Box.rsc @@ -77,6 +77,7 @@ data FormatOptions = formatOptions( data CaseInsensitivity = toLower() | toUpper() + | toCapitalized() | asIs() ; @@ -234,6 +235,7 @@ private FO fo() = formatOptions(); @synopsis{Implements normalization of case-insensitive literals} private str ci(str word, toLower()) = toLowerCase(word); private str ci(str word, toUpper()) = toUpperCase(word); +private str ci(str word, toCapitalized()) = capitalize(word); private str ci(str word, asIs()) = word; @synopsis{Split a text by the supported whitespace characters} From a9fb177e680a664775d2ce4029473f6adedb47e1 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Mon, 11 Aug 2025 15:20:46 +0200 Subject: [PATCH 56/76] default indentation size set to 4 --- src/org/rascalmpl/library/lang/box/syntax/Box.rsc | 2 +- src/org/rascalmpl/library/lang/box/util/Box2Text.rsc | 2 +- src/org/rascalmpl/library/lang/box/util/Tree2Box.rsc | 8 ++++---- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/src/org/rascalmpl/library/lang/box/syntax/Box.rsc b/src/org/rascalmpl/library/lang/box/syntax/Box.rsc index f63eff82a72..6763aa434c3 100644 --- a/src/org/rascalmpl/library/lang/box/syntax/Box.rsc +++ b/src/org/rascalmpl/library/lang/box/syntax/Box.rsc @@ -38,7 +38,7 @@ set on every `I` Box according to the current preferences of the user. @pitfalls{ * `U(boxes)` is rendered as `H(boxes)` if it's the outermost Box. } -data Box(int hs=1, int vs=0, int is=2) +data Box(int hs=1, int vs=0, int is=4) = H(list[Box] boxes) | V(list[Box] boxes) | HOV(list[Box] boxes) diff --git a/src/org/rascalmpl/library/lang/box/util/Box2Text.rsc b/src/org/rascalmpl/library/lang/box/util/Box2Text.rsc index b2f113d8c7f..06afbf9e9b6 100644 --- a/src/org/rascalmpl/library/lang/box/util/Box2Text.rsc +++ b/src/org/rascalmpl/library/lang/box/util/Box2Text.rsc @@ -120,7 +120,7 @@ between horizontal and vertical for HOV boxes. data Options = options( int hs = 1, int vs = 0, - int is = 2, + int is = 4, int maxWidth=80, int wrapAfter=70 ); diff --git a/src/org/rascalmpl/library/lang/box/util/Tree2Box.rsc b/src/org/rascalmpl/library/lang/box/util/Tree2Box.rsc index c8fe25b56b7..247c2adda68 100644 --- a/src/org/rascalmpl/library/lang/box/util/Tree2Box.rsc +++ b/src/org/rascalmpl/library/lang/box/util/Tree2Box.rsc @@ -178,19 +178,19 @@ default Box toBox(t:appl(Production p, list[Tree] args), FO opts = fo()) { // operators. The effect will be somewhat like a separated list of expressions where // the operators are the separators. case : - return U([toBox(e) | e <- elements]); + return U([toBox(e, opts=opts) | e <- elements]); // postfix operators stick case : - return H([toBox(e) | e <- elements], hs=0); + return H([toBox(e, opts=opts) | e <- elements], hs=0); // prefix operators stick case : - return H([toBox(e) | e <- elements], hs=0); + return H([toBox(e, opts=opts) | e <- elements], hs=0); // brackets stick case : - return H([toBox(e) | e <- elements], hs=0); + return H([toBox(e, opts=opts) | e <- elements], hs=0); // if the sort name is statement-like and the structure block-like, we go for // vertical with indentation From cb7dedfecd4a6e3addef561a9054e55947e511d3 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Mon, 11 Aug 2025 16:02:26 +0200 Subject: [PATCH 57/76] added more complex tests for sublist equality --- .../analysis/diff/edits/HiFiTreeDiff.rsc | 8 ++++--- .../analysis/diff/edits/HiFiTreeDiffTests.rsc | 24 +++++++++++++++++++ 2 files changed, 29 insertions(+), 3 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index c6e852ebc05..5ff99aa8f2b 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -242,14 +242,16 @@ private int seps(\iter-seps(_, list[Symbol] s)) = size(s); private int seps(\iter-star-seps(_, list[Symbol] s)) = size(s); private default int seps(Symbol _) = 0; -@synopsis{List diff is like text diff on lines; complex and easy to make slow} +@synopsis{List diff finds minimal differences between the elements of two lists.} @description{ This algorithm uses heuristics to avoid searching for the largest common sublist all too often. +Also it minimized the sublists that largest common sublist is executed on. 1. Since many patches to parse tree lists typically only change a prefix or a postfix, and we can detect this quickly, we first extract patches for those instances. -2. However, it is also very fast to detect unchanged prefixes and postfixes, so by focusing +2. It is also fast and easy to detect unchanged prefixes and postfixes, so by focusing on the changes parts in the middle we generate more instances of case 1. +3. Another simple and quick case is when simply all elements are different (the prefix==the list==the postfix) 3. What we are left with is either an empty list and we are done, or a more complex situation where we apply the "largestEqualSubList" algorithm, which splits the list in three parts: * two unequal prefixes @@ -283,7 +285,7 @@ list[TextEdit] listDiff(loc span, int seps, list[Tree] originals, list[Tree] rep // TODO: what about the separators? // we align the prefixes and the postfixes and // continue recursively. - + println("largestEqualSubList was used!"); return edits + listDiff(beginCover(span, preO), seps, preO, preR) + listDiff(endCover(span, postO), seps, postO, postR) diff --git a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc index aab839c6a23..233b340e560 100644 --- a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc +++ b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc @@ -106,6 +106,18 @@ start[Program] addDeclarationToStart(start[Program] p) = visit(p) { 'end` }; +start[Program] addDeclarationToMiddle(start[Program] p) = visit(p) { + case (Program) `begin declare <{IdType ","}* pre>, , <{IdType ","}* post>; <{Statement ";"}* body> end` + => (Program) `begin + ' declare + ' <{IdType ","}* pre>, + ' , + ' middle : natural, + ' <{IdType ","}* post>; + ' <{Statement ";"}* body> + 'end` +}; + start[Program](start[Program]) indent(str indentation = " ", bool indentFirstLine = true) { return start[Program](start[Program] p) { return parse(#start[Program], indent(indentation, "

", indentFirstLine=indentFirstLine)); @@ -130,9 +142,21 @@ test bool addDeclarationToEndTest() test bool addDeclarationToStartTest() = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToStart, treeDiff); +test bool addDeclarationToMiddleTest() + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToMiddle, treeDiff); + +test bool addDeclarationToStartAndMiddleTest() + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToStart o addDeclarationToMiddle, treeDiff); + +test bool addDeclarationToMiddleAndEndTest() + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToMiddle o addDeclarationToEnd, treeDiff); + test bool addDeclarationToStartAndEndTest() = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToStart o addDeclarationToEnd, treeDiff); +test bool addDeclarationToStartMiddleAndEndTest() + = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToStart o addDeclarationToMiddle o addDeclarationToEnd, treeDiff); + test bool addDeclarationToEndAndSwapABTest() = editsAreSyntacticallyCorrect(#start[Program], simpleExample, addDeclarationToEnd o swapAB, treeDiff); From bd9cc834116fc2cdd36a0ad17aee4202ba1a50af Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Mon, 11 Aug 2025 16:12:52 +0200 Subject: [PATCH 58/76] fixed asFormatted literal option --- .../rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc index 692b95941c3..409bea1f8b5 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc @@ -72,7 +72,7 @@ list[TextEdit] layoutDiff(Tree original, Tree formatted, bool recoverComments = // however, we also offer some convenience functionality to standardize their formatting right here. list[TextEdit] rec( t:appl(prod(cilit(_), _, _), list[Tree] _), - appl(prod(cilit(_), _, _), list[Tree] _)) { + u:appl(prod(cilit(_), _, _), list[Tree] _)) { str yield = ""; From 6f2dceb11178d1b4414f68e5c4f031c74946ed4f Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Mon, 11 Aug 2025 17:25:26 +0200 Subject: [PATCH 59/76] added TODO --- src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc | 1 + 1 file changed, 1 insertion(+) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index 5ff99aa8f2b..54a8189aadc 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -419,6 +419,7 @@ private str learnIndentation(loc span, str replacement, str original) { minIndent = sort(origIndents[1..])[0]? ""; } + // TODO: if the minIndent is larger than the current line indent, it should still be stripped up to the max stripped = [ /^$/ := line ? rest : line | line <- replLines]; return indent(minIndent, " From 37334ab28d7c488aa8bea1414561094525c009ca Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Mon, 11 Aug 2025 20:07:20 +0200 Subject: [PATCH 60/76] added test --- .../analysis/diff/edits/HiFiTreeDiff.rsc | 11 +++++-- .../analysis/diff/edits/HiFiTreeDiffTests.rsc | 29 +++++++++++++++++++ 2 files changed, 37 insertions(+), 3 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index 54a8189aadc..ccb2221db0b 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -400,7 +400,7 @@ Then it measures the depth of indentation of every line in the original, and tak That minimum indentation is stripped off every line that already has that much indentation in the replacement, and then _all_ lines are re-indented with the discovered minimum. } -private str learnIndentation(loc span, str replacement, str original) { +private str learnIndentation(loc span, str replacement, str original, bool useReplacementIndent=true) { list[str] indents(str text) = [indent | /^[^\ \t]/ <- split("\n", text)]; origIndents = indents(original); @@ -422,6 +422,11 @@ private str learnIndentation(loc span, str replacement, str original) { // TODO: if the minIndent is larger than the current line indent, it should still be stripped up to the max stripped = [ /^$/ := line ? rest : line | line <- replLines]; - return indent(minIndent, " - '<}>"[..-1]); + // return indent(minIndent, " + // '<}>"[..-1]); + return indent( + minIndent, + "$/ <- replLines) {> + '<}>"[..-1]) + ; } diff --git a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc index 233b340e560..a71610f1357 100644 --- a/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc +++ b/src/org/rascalmpl/library/lang/rascal/tests/library/analysis/diff/edits/HiFiTreeDiffTests.rsc @@ -20,6 +20,22 @@ public str simpleExample 'end '"; +public str ifThenElseExample + = "begin + ' declare + ' a : natural; + ' if a then + ' a := 10 + ' else + ' if a then + ' a := 11 + ' else + ' a := 12 + ' fi + ' fi + 'end + '"; + @synopsis{Specification of what it means for `treeDiff` to be syntactically correct} @description{ TreeDiff is syntactically correct if: @@ -82,6 +98,15 @@ start[Program] swapAB(start[Program] p) = visit(p) { case (Id) `b` => (Id) `a` }; +start[Program] swapIfBranches(start[Program] p) = visit(p) { + case (Statement) `if then <{Statement ";"}* thenBranch> else <{Statement ";"}* elseBranch> fi` + => (Statement) `if then + ' <{Statement ";"}* elseBranch> + 'else + ' <{Statement ";"}* thenBranch> + 'fi` +}; + start[Program] naturalToString(start[Program] p) = visit(p) { case (Type) `natural` => (Type) `string` }; @@ -172,6 +197,9 @@ test bool naturalToStringTest() test bool naturalToStringAndAtoBTest() = editsAreSyntacticallyCorrect(#start[Program], simpleExample, naturalToString o swapAB, treeDiff); +test bool swapBranchesTest() + = editsAreSyntacticallyCorrect(#start[Program], ifThenElseExample, swapIfBranches, treeDiff); + test bool nulTestWithIdLayout() = editsAreSyntacticallyCorrect(#start[Program], simpleExample, identity, layoutDiff) && editsMaintainIndentationLevels(#start[Program], simpleExample, identity, layoutDiff); @@ -183,3 +211,4 @@ test bool indentAllLayout() test bool insertSpacesInDeclarationLayout() = editsAreSyntacticallyCorrect(#start[Program], simpleExample, insertSpacesInDeclaration, layoutDiff) && editsMaintainIndentationLevels(#start[Program], simpleExample, insertSpacesInDeclaration, layoutDiff); + From 5d12f498b2b39447a0d27d1b095e2a86204c8d60 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Mon, 11 Aug 2025 20:43:53 +0200 Subject: [PATCH 61/76] improved indentation inheritance --- .../library/analysis/diff/edits/HiFiTreeDiff.rsc | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index ccb2221db0b..417e75ec552 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -411,19 +411,19 @@ private str learnIndentation(loc span, str replacement, str original, bool useRe } minIndent = ""; + if ([_] := origIndents) { // only one line. have to invent indentation from span - minIndent = " <}>"; + minIndent = " <}>"; } else { + // we skip the first line for learning indentation, because that one would typically be embedded in a previous line. minIndent = sort(origIndents[1..])[0]? ""; } - // TODO: if the minIndent is larger than the current line indent, it should still be stripped up to the max - stripped = [ /^$/ := line ? rest : line | line <- replLines]; - - // return indent(minIndent, " - // '<}>"[..-1]); + // we remove the leading spaces _up to_ the minimal indentation of the original, + // keep the rest of the indentation from the replacement (if any is left), and then the actual content. + // that entire multiline result is then lazily indented with the minimal indentation we learned from the original. return indent( minIndent, "$/ <- replLines) {> From 5d60bdfa041ca1349363c3d1d6a35b65ad1df224 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Mon, 11 Aug 2025 20:46:35 +0200 Subject: [PATCH 62/76] refactored private function name --- .../analysis/diff/edits/HiFiTreeDiff.rsc | 22 +++++++++---------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index 417e75ec552..aa5f58c8c9b 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -83,11 +83,11 @@ for the entire subtree. Another way of resolving this is using a code formatter module analysis::diff::edits::HiFiTreeDiff extend analysis::diff::edits::TextEdits; -import ParseTree; + import List; -import String; import Location; -import IO; +import ParseTree; +import String; import util::Math; @synopsis{Detects minimal differences between parse trees and makes them explicit as ((TextEdit)) instructions.} @@ -253,7 +253,7 @@ can detect this quickly, we first extract patches for those instances. on the changes parts in the middle we generate more instances of case 1. 3. Another simple and quick case is when simply all elements are different (the prefix==the list==the postfix) 3. What we are left with is either an empty list and we are done, or a more complex situation -where we apply the "largestEqualSubList" algorithm, which splits the list in three parts: +where we apply the "findEqualSubList" algorithm, which splits the list in three parts: * two unequal prefixes * two equal sublists in the middle * two unequal postfixes @@ -274,7 +274,7 @@ list[TextEdit] listDiff(loc span, int seps, list[Tree] originals, list[Tree] rep = commonSpecialCases(span, seps, originals, replacements); edits += specialEdits; - equalSubList = largestEqualSubList(originals, replacements); + equalSubList = findEqualSubList(originals, replacements); // by using the (or "a") largest common sublist as a pivot to divide-and-conquer // to the left and right of it, we minimize the number of necessary @@ -285,7 +285,7 @@ list[TextEdit] listDiff(loc span, int seps, list[Tree] originals, list[Tree] rep // TODO: what about the separators? // we align the prefixes and the postfixes and // continue recursively. - println("largestEqualSubList was used!"); + return edits + listDiff(beginCover(span, preO), seps, preO, preR) + listDiff(endCover(span, postO), seps, postO, postR) @@ -322,10 +322,10 @@ uses particular properties of the relation between the original and the replacem * Equal prefixes and postfixes may be assumed to be maximal sublists as well (see above). * Candidate equal sublists always have consecutive source locations from the origin. } -list[Tree] largestEqualSubList([*Tree sub], [*_, *sub, *_]) = sub; -list[Tree] largestEqualSubList([*_, *Tree sub, *_], [*sub]) = sub; -list[Tree] largestEqualSubList([*_, p, *Tree sub, q, *_], [*_, !p, *sub, !q, *_]) = sub; -default list[Tree] largestEqualSubList(list[Tree] _orig, list[Tree] _repl) = []; +list[Tree] findEqualSubList([*Tree sub], [*_, *sub, *_]) = sub; +list[Tree] findEqualSubList([*_, *Tree sub, *_], [*sub]) = sub; +list[Tree] findEqualSubList([*_, p, *Tree sub, q, *_], [*_, !p, *sub, !q, *_]) = sub; +default list[Tree] findEqualSubList(list[Tree] _orig, list[Tree] _repl) = []; @synopsis{trips equal elements from the front and the back of both lists, if any.} tuple[loc, list[Tree], list[Tree]] trimEqualElements(loc span, @@ -400,7 +400,7 @@ Then it measures the depth of indentation of every line in the original, and tak That minimum indentation is stripped off every line that already has that much indentation in the replacement, and then _all_ lines are re-indented with the discovered minimum. } -private str learnIndentation(loc span, str replacement, str original, bool useReplacementIndent=true) { +private str learnIndentation(loc span, str replacement, str original) { list[str] indents(str text) = [indent | /^[^\ \t]/ <- split("\n", text)]; origIndents = indents(original); From f99ee230195ee2cf2528a75d87ac8920b11de528 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Mon, 11 Aug 2025 21:36:12 +0200 Subject: [PATCH 63/76] fixed documentation --- src/org/rascalmpl/library/ParseTree.rsc | 2 +- .../analysis/diff/edits/HiFiTreeDiff.rsc | 18 ++++++++++++------ 2 files changed, 13 insertions(+), 7 deletions(-) diff --git a/src/org/rascalmpl/library/ParseTree.rsc b/src/org/rascalmpl/library/ParseTree.rsc index b9245ac1509..e865dd42580 100644 --- a/src/org/rascalmpl/library/ParseTree.rsc +++ b/src/org/rascalmpl/library/ParseTree.rsc @@ -869,7 +869,7 @@ in a ((Tree)) are not at all anymore what they were just after parsing: 3. subtrees may have been introduced from concrete syntax expressions in Rascal code. 4. other algorithms may have added more keyword fields, for example fully resolved qualified names, resolved types, error messages or future computations (closures). -5. location fields themselves may have been lost accidentally when rewriting trees with ((Statement-Visit)) +5. location fields themselves may have been lost accidentally when rewriting trees with `visit` 6. etc. Some downstream algorithms (e.g. ((HiFiLayoutDiff)) ) require source locations to be consistent with the current actual position diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index aa5f58c8c9b..609b57ad6f3 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -94,7 +94,7 @@ import util::Math; @description{ This is a "diff" algorithm of two parse trees to generate a ((TextEdit)) script that applies the differences on the textual level, _with minimal collatoral damage in whitespace_. This is why it is called "HiFi": minimal unnecessary -noise introduction to the original file. +noise introduction to the original file. It also tries to conserve source code comments; where still possible. The resulting ((TextEdit))s are an intermediate representation for making changes in source code text files. They can be executed independently via ((ExecuteTextEdits)), or interactively via ((IDEServices)), or LanguageServer features. @@ -113,7 +113,7 @@ However, the parsed tree could be different from the derived tree in terms of wh * when case-insensitive literals have been changed under a grammar rule that remained the same, no edits are produced. The function comes in handy when we use Rascal to rewrite parse trees, and then need to communicate the effect -back to the IDE (for example using ((util::IDEServices)) or ((util::LanguageServer)) interfaces). We use +back to the IDE (for example using ((util::IDEServices)) or `util::LanguageServer` interfaces). We use ((ExecuteTextEdits)) to _test_ the effect of ((TextEdits)) while developing a source-to-source transformation. } @benefits{ @@ -125,6 +125,8 @@ the exactness of syntactic and semantic knowledge of the parse trees. * The algorithm retrieves and retains indentation levels from the original tree, even if sub-trees in the derived tree have mangled indentation. This allows us to ignore the indentation concern while thinking of rewrite rules for source-to-souce transformation, and focus on the semantic effect. +* The algorithm inherits source code comments from the original, wherever sub-trees of the original and the +rewritten tree still line up. } @pitfalls{ * If the first argument is not an original parse tree, then basic assumptions of the algorithm fail and it may produce erroneous text edits. @@ -133,8 +135,9 @@ rules for source-to-souce transformation, and focus on the semantic effect. and the performance of the algorithm will degenerate quickly. * If the parse tree of the original does not reflect the current state of the text in the file, then the generated text edits will do harm. * If the original tree is not annotated with source locations, the algorithm fails. -* Both parse trees must be type correct, e.g. the number of symbols in a production rule, must be equal to the number of elements of the argument list of ((Tree::appl)). +* Both parse trees must be type correct, e.g. the number of symbols in a production rule, must be equal to the number of elements of the argument list of ((appl)). * This algorithm does not work with ambiguous (sub)trees. +* When large sub-trees or sub-lists are moved to other parts of the tree, comment inheritance is not possible anymore. } @examples{ If we rewrite parse trees, this can be done with concrete syntax matching. @@ -145,9 +148,9 @@ import lang::pico::\syntax::Main; import IO; import analysis::diff::edits::ExecuteTextEdits; import analysis::diff::edits::TextEdits; -import analysis::diff::edits::TreeDiff; +import analysis::diff::edits::HiFiTreeDiff; // an example Pico program: -writeFile(|tmp://example.pico|, +writeFile(|tmp:///example.pico|, "begin ' declare ' a : natural, @@ -158,7 +161,8 @@ writeFile(|tmp://example.pico|, ' b := a ' fi 'end"); -original = parse(#start[Program], |tmp://example.pico|); +import ParseTree; +original = parse(#start[Program], |tmp:///example.pico|); // match and replace all conditionals rewritten = visit(original) { case (Statement) `if then <{Statement ";"}* ifBranch> else <{Statement ";"}* elseBranch> fi` @@ -178,6 +182,8 @@ edit = changed(original@\loc.top, edits); executeDocumentEdit(edit); // and when we read the result back, we see the transformation succeeded, and indentation was not lost: readFile(tmp://example.pico|); +// It's also possible to directly rewrite the original string, for debugging purposes: +executeTextEdits("", treeDiff(original, rewritten)) ``` } // equal trees generate empty diffs (note this already ignores whitespace differences because non-linear matching ignores layout nodes) From 5b24d7da2163852e5d42b8503ae9fe1af8b0ba75 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Mon, 11 Aug 2025 21:40:36 +0200 Subject: [PATCH 64/76] fixed minor issue in tutor compiler --- src/org/rascalmpl/tutor/lang/rascal/tutor/Compiler.rsc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/org/rascalmpl/tutor/lang/rascal/tutor/Compiler.rsc b/src/org/rascalmpl/tutor/lang/rascal/tutor/Compiler.rsc index 63b4c70d295..42cca05e888 100644 --- a/src/org/rascalmpl/tutor/lang/rascal/tutor/Compiler.rsc +++ b/src/org/rascalmpl/tutor/lang/rascal/tutor/Compiler.rsc @@ -101,7 +101,7 @@ int main(PathConfig pcfg = getProjectPathConfig(|cwd:///|), messages = compile(pcfg); - return mainMessageHandler(messages, srcs=pcfg.srcs, errorsAsWarnings=errorsAsWarnings, warningsAsErrors=warningsAsErrors); + return mainMessageHandler(messages, projectRoot=pcfg.projectRoot, errorsAsWarnings=errorsAsWarnings, warningsAsErrors=warningsAsErrors); } @synopsis{compiles each pcfg.srcs folder as a course root} From 62689abe82f8350701645353c2a9ad7eabf6796d Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Mon, 11 Aug 2025 22:09:31 +0200 Subject: [PATCH 65/76] apply suggesting by @toinehartman for improved readability --- src/org/rascalmpl/library/ParseTree.rsc | 20 +++++++++----------- 1 file changed, 9 insertions(+), 11 deletions(-) diff --git a/src/org/rascalmpl/library/ParseTree.rsc b/src/org/rascalmpl/library/ParseTree.rsc index 45bab479dde..b00de31f9ac 100644 --- a/src/org/rascalmpl/library/ParseTree.rsc +++ b/src/org/rascalmpl/library/ParseTree.rsc @@ -147,6 +147,7 @@ extend Message; extend Type; import Node; +import Set; @synopsis{The Tree data type as produced by the parser.} @description{ @@ -996,17 +997,14 @@ yield of a tree should always produce the exact same locations as ((reposition)) } // ambiguity nodes are simply choices between alternatives which each receive their own positions. - Tree rec(t:amb(set[Tree] alts), bool sub) { - if (newAlts:{Tree x, *_} := {mergeRec(a, sub) | a <- alts}) { - // inherit the outermost positions from one of the alternatives, since they are all the same by definition. - return markAmb && x@\loc? - ? amb(newAlts)[@\loc=x@\loc] - : amb(newAlts) - ; - } - - // this never happens because there is always at least two alternatives in a cluster - fail; + Tree rec(amb(set[Tree] alts), bool sub) { + newAlts = {mergeRec(a, sub) | a <- alts}; + // inherit the outermost positions from one of the alternatives, since they are all the same by definition. + Tree x = getFirstFrom(newAlts); + return markAmb && x@\loc? + ? amb(newAlts)[@\loc=x@\loc] + : amb(newAlts) + ; } @synopsis{Recurse, but not without recovering all other keyword parameters except "src" a.k.a. @\loc from the original.} From f197f4a897f25bb73527331a4869380f50c41570 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Mon, 11 Aug 2025 23:02:50 +0200 Subject: [PATCH 66/76] workaround for #2342 --- src/org/rascalmpl/library/ParseTree.rsc | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/src/org/rascalmpl/library/ParseTree.rsc b/src/org/rascalmpl/library/ParseTree.rsc index b00de31f9ac..be867360468 100644 --- a/src/org/rascalmpl/library/ParseTree.rsc +++ b/src/org/rascalmpl/library/ParseTree.rsc @@ -968,9 +968,11 @@ yield of a tree should always produce the exact same locations as ((reposition)) } } + Tree washCC(Tree x) = x; + return markChar - ? char(ch)[@\loc=file(beginOffset, 1, , )] - : char(ch) + ? washCC(char(ch))[@\loc=file(beginOffset, 1, , )] + : washCC(char(ch)) ; } @@ -1018,4 +1020,5 @@ yield of a tree should always produce the exact same locations as ((reposition)) // we start recursion at the top, not forgetting to merge its other keyword fields return mergeRec(tree, true); -} \ No newline at end of file +} + \ No newline at end of file From a893ed1f99b145ea7cfe01b3141d8578380ea486 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Mon, 11 Aug 2025 23:03:52 +0200 Subject: [PATCH 67/76] added comment --- src/org/rascalmpl/library/ParseTree.rsc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/org/rascalmpl/library/ParseTree.rsc b/src/org/rascalmpl/library/ParseTree.rsc index be867360468..6a8cee8c19e 100644 --- a/src/org/rascalmpl/library/ParseTree.rsc +++ b/src/org/rascalmpl/library/ParseTree.rsc @@ -968,7 +968,7 @@ yield of a tree should always produce the exact same locations as ((reposition)) } } - Tree washCC(Tree x) = x; + Tree washCC(Tree x) = x; // workaround for issue #2342 return markChar ? washCC(char(ch))[@\loc=file(beginOffset, 1, , )] From d1ff2bbe3e16250529e4ae1a5e7b24434eebeb96 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Tue, 12 Aug 2025 14:20:53 +0200 Subject: [PATCH 68/76] bug in execute text edits solved --- .../rascalmpl/library/analysis/diff/edits/ExecuteTextEdits.rsc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/ExecuteTextEdits.rsc b/src/org/rascalmpl/library/analysis/diff/edits/ExecuteTextEdits.rsc index a738eb94035..f1c41d5c695 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/ExecuteTextEdits.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/ExecuteTextEdits.rsc @@ -37,7 +37,7 @@ void executeFileSystemChange(changed(loc file)) { } @synopsis{Edit a file according to the given ((TextEdit)) instructions} -void executeDocumentEdit(changed(loc file, list[TextEdit] edits)) { +void executeFileSystemChange(changed(loc file, list[TextEdit] edits)) { str content = readFile(file); content = executeTextEdits(content, edits); From 7bff86ddcc3fd1dd63a6dd354379639b5bca387c Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Tue, 12 Aug 2025 15:22:10 +0200 Subject: [PATCH 69/76] finetune comma and semicolon separated lists --- .../library/lang/box/util/Tree2Box.rsc | 32 ++++++++++++++++++- 1 file changed, 31 insertions(+), 1 deletion(-) diff --git a/src/org/rascalmpl/library/lang/box/util/Tree2Box.rsc b/src/org/rascalmpl/library/lang/box/util/Tree2Box.rsc index 247c2adda68..9a672e67640 100644 --- a/src/org/rascalmpl/library/lang/box/util/Tree2Box.rsc +++ b/src/org/rascalmpl/library/lang/box/util/Tree2Box.rsc @@ -67,6 +67,7 @@ module lang::box::util::Tree2Box import ParseTree; import lang::box::\syntax::Box; import String; +import IO; @synopsis{Configuration options for toBox} data FormatOptions = formatOptions( @@ -118,8 +119,37 @@ default Box toBox(t:appl(Production p, list[Tree] args), FO opts = fo()) { case : return H([toBox(e, opts=opts) | e <- elements], hs=0); + // comma's are usually for parameters separation case : - return HV([G([toBox(e, opts=opts) | e <- elements], gs=4, hs=0, op=H)], hs=1); + return HOV([ + H([ + toBox(elements[i], opts=opts), + H([toBox(f, opts=opts) | f <- elements[i+1..i+3]], hs=1) + ], hs=0) | int i <- [0,4..size(elements)]]); + + // comma's are usually for parameters separation + case : + return HOV([ + H([ + toBox(elements[i], opts=opts), + H([toBox(f, opts=opts) | f <- elements[i+1..i+3]], hs=1) + ], hs=0) | int i <- [0,4..size(elements)]]); + + // semi-colons are usually for statement separation + case : + return V([ + H([ + toBox(elements[i], opts=opts), + H([toBox(f, opts=opts) | f <- elements[i+1..i+3]], hs=1) + ], hs=0) | int i <- [0,4..size(elements)]]); + + // semi-colons are usually for parameters separation + case : + return V([ + H([ + toBox(elements[i], opts=opts), + H([toBox(f, opts=opts) | f <- elements[i+1..i+3]], hs=1) + ], hs=0) | int i <- [0,4..size(elements)]]); case : return V([G([toBox(e, opts=opts) | e <- elements], gs=4, hs=0, op=H)], hs=1); From 122932fcd6f445977c9aa0b5010610c1d1bc5333 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Tue, 12 Aug 2025 15:44:26 +0200 Subject: [PATCH 70/76] better defaults for binary expressions --- .../library/lang/box/util/Tree2Box.rsc | 11 ++-- .../library/lang/pico/format/Formatting.rsc | 61 +++++++++++++++++++ 2 files changed, 65 insertions(+), 7 deletions(-) create mode 100644 src/org/rascalmpl/library/lang/pico/format/Formatting.rsc diff --git a/src/org/rascalmpl/library/lang/box/util/Tree2Box.rsc b/src/org/rascalmpl/library/lang/box/util/Tree2Box.rsc index 9a672e67640..22cedf08c1f 100644 --- a/src/org/rascalmpl/library/lang/box/util/Tree2Box.rsc +++ b/src/org/rascalmpl/library/lang/box/util/Tree2Box.rsc @@ -204,11 +204,8 @@ default Box toBox(t:appl(Production p, list[Tree] args), FO opts = fo()) { // Those kinds of structures appear again and again as many languages share inspiration // from their pre-decessors. Watching out not to loose any comments... - // we flatten binary operators into their context for better flow of deeply nested - // operators. The effect will be somewhat like a separated list of expressions where - // the operators are the separators. case : - return U([toBox(e, opts=opts) | e <- elements]); + return HOV([toBox(elements[0], opts=opts), H([toBox(e, opts=opts) | e <- elements[1..]])]); // postfix operators stick case : @@ -225,10 +222,10 @@ default Box toBox(t:appl(Production p, list[Tree] args), FO opts = fo()) { // if the sort name is statement-like and the structure block-like, we go for // vertical with indentation // program: "begin" Declarations decls {Statement ";"}* body "end" ; - case : + case : return V([ - toBox(elements[0], opts=opts), - I([V([toBox(e, opts=opts) | e <- elements[1..-1]])]), + H([*[toBox(p, opts=opts) | Tree p <- elements[0..size(pre)]], toBox(elements[size(pre)], opts=opts)]), + I([V([toBox(e, opts=opts) | Tree e <- elements[size(pre)+1..-1]])]), toBox(elements[-1], opts=opts) ]); } diff --git a/src/org/rascalmpl/library/lang/pico/format/Formatting.rsc b/src/org/rascalmpl/library/lang/pico/format/Formatting.rsc new file mode 100644 index 00000000000..e32660084aa --- /dev/null +++ b/src/org/rascalmpl/library/lang/pico/format/Formatting.rsc @@ -0,0 +1,61 @@ +@synopsis{Demonstrates ((Tree2Box)), ((Box2Text)) and ((HiFiLayoutDiff)) for constructing a declarative and HiFi Pico formatting pipeline} +module lang::pico::format::Formatting + +extend lang::box::util::Tree2Box; + +import analysis::diff::edits::ExecuteTextEdits; +import analysis::diff::edits::HiFiLayoutDiff; +import lang::box::\syntax::Box; +import lang::box::util::Box2Text; +import lang::pico::\syntax::Main; +import ParseTree; + +@synopsis{In-place formatting of an entire Pico file} +void formatPicoFile(loc file) { + // first we parse the program + start[Program] tree = parse(#start[Program], file); + + // then we apply an adaptable formatting style to every node + Box box = toBox(tree); + + // then we solve the two-dimensional layout problem, and get a formatted result + str formatted = format(box); + + // now we extract a list of exact differences from the old and a new parse tree + start[Program] formattedTree = parse(#start[Program], formatted, file); + list[TextEdit] edits = layoutDiff(tree, formattedTree); + + // finally we apply the differences to the original file + executeFileSystemChanges([changed(file, edits)]); +} + +@synopsis{Format a string that contains an entire Pico program} +str formatPicoString(str file) { + // first we parse the program + start[Program] tree = parse(#start[Program], file, |unknown:///|); + + // then we apply an adaptable formatting style to every node + Box box = toBox(tree); + + // then we solve the two-dimensional layout problem, and get a formatted result + str formatted = format(box); + + // now we extract a list of exact differences from the old and a new parse tree + start[Program] formattedTree = parse(#start[Program], formatted, |unknown:///|); + list[TextEdit] edits = layoutDiff(tree, formattedTree); + + // finally we apply the differences to the original contents + return executeTextEdits(file, edits); +} + +@synopsis{Pico Format function for use in an IDE} +list[FileSystemChange] formatPico(start[Program] file) + = [changed(file@\loc, layoutDiff(file, parse(#start[Program], (format o toBox)(file), |unknown:///|)))]; + +@synopsis{Make sure while loops are formatted the way we want them to be.} +Box toBox((Statement) `while do <{Statement ";"}* block> od`, FO opts = fo()) + = V([ + H([L("while"), toBox(e, opts=opts), L("do")]), + I([toBox(block, opts=opts)]), + L("od") + ]); \ No newline at end of file From 407b95bcc60b2ad04d0f29fcad4e4a05de159609 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Tue, 12 Aug 2025 15:46:24 +0200 Subject: [PATCH 71/76] minor refactoring --- src/org/rascalmpl/library/lang/pico/format/Formatting.rsc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/org/rascalmpl/library/lang/pico/format/Formatting.rsc b/src/org/rascalmpl/library/lang/pico/format/Formatting.rsc index e32660084aa..b505b640cbd 100644 --- a/src/org/rascalmpl/library/lang/pico/format/Formatting.rsc +++ b/src/org/rascalmpl/library/lang/pico/format/Formatting.rsc @@ -50,7 +50,7 @@ str formatPicoString(str file) { @synopsis{Pico Format function for use in an IDE} list[FileSystemChange] formatPico(start[Program] file) - = [changed(file@\loc, layoutDiff(file, parse(#start[Program], (format o toBox)(file), |unknown:///|)))]; + = [changed(file@\loc.top, layoutDiff(file, parse(#start[Program], (format o toBox)(file), file@\loc.top)))]; @synopsis{Make sure while loops are formatted the way we want them to be.} Box toBox((Statement) `while do <{Statement ";"}* block> od`, FO opts = fo()) @@ -58,4 +58,4 @@ Box toBox((Statement) `while do <{Statement ";"}* block> od`, FO H([L("while"), toBox(e, opts=opts), L("do")]), I([toBox(block, opts=opts)]), L("od") - ]); \ No newline at end of file + ]); \ No newline at end of file From 5d1606d60874366dc0cc24fd8268ec0e7ff6e764 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Tue, 12 Aug 2025 21:39:11 +0200 Subject: [PATCH 72/76] with layoutDiff in the game, comment preservation is no longer a task of toBox. This then makes the treatment of files with comments much easier, because every box will always have the same predictable amount of children --- .../analysis/diff/edits/HiFiLayoutDiff.rsc | 9 +++++---- .../library/lang/box/util/Tree2Box.rsc | 20 ++++++++----------- 2 files changed, 13 insertions(+), 16 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc index 409bea1f8b5..1820c7d0c3a 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc @@ -134,14 +134,14 @@ to end with a newline. * if comments are not marked with `@category("Comment")` in the original grammar, then this algorithm recovers nothing. } private str learnComments(Tree original, Tree replacement) { - originalComments = ["" | /c:appl(prod(_,_,{\tag("category"("Comment")), *_}), _) := original]; + originalComments = ["" | /c:appl(prod(_,_,{\tag("category"(/^[Cc]omment$/)), *_}), _) := original]; if (originalComments == []) { // if the original did not contain comments, stick with the replacements return ""; } - replacementComments = ["" | /c:appl(prod(_,_,{\tag("category"("Comment")), *_}), _) := replacement]; + replacementComments = ["" | /c:appl(prod(_,_,{\tag("category"(/^[Cc]omment$/)), *_}), _) := replacement]; if (replacementComments != []) { // if the replacement contains comments, we assume they've been accurately retained by a previous stage (like Tree2Box): @@ -159,8 +159,9 @@ private str learnComments(Tree original, Tree replacement) { str replacementIndent = split("\n", "")[0]; // trimming each line makes sure we forget about the original indentation, and drop accidental spaces after comment lines - return " - '<}>"; + return "" + indent(replacementIndent, + " + '<}>"[..-1], indentFirstLine=false) + ""; } private Symbol delabel(label(_, Symbol t)) = t; diff --git a/src/org/rascalmpl/library/lang/box/util/Tree2Box.rsc b/src/org/rascalmpl/library/lang/box/util/Tree2Box.rsc index 22cedf08c1f..2ef4b09bb02 100644 --- a/src/org/rascalmpl/library/lang/box/util/Tree2Box.rsc +++ b/src/org/rascalmpl/library/lang/box/util/Tree2Box.rsc @@ -84,7 +84,7 @@ data CaseInsensitivity @synopsis{This is the generic default formatter} @description{ -This generic formatter is to be overridden by someone constructig a formatter tools +This generic formatter is to be overridden by someone constructing a formatter tools for a specific language. The goal is that this `toBox` default rule maps syntax trees to plausible Box expressions, and that only a minimal amount of specialization by the user is necessary. @@ -132,7 +132,7 @@ default Box toBox(t:appl(Production p, list[Tree] args), FO opts = fo()) { return HOV([ H([ toBox(elements[i], opts=opts), - H([toBox(f, opts=opts) | f <- elements[i+1..i+3]], hs=1) + H([toBox(f, opts=opts) | f <- elements[i+1..i+3]], hs=0) ], hs=0) | int i <- [0,4..size(elements)]]); // semi-colons are usually for statement separation @@ -140,7 +140,7 @@ default Box toBox(t:appl(Production p, list[Tree] args), FO opts = fo()) { return V([ H([ toBox(elements[i], opts=opts), - H([toBox(f, opts=opts) | f <- elements[i+1..i+3]], hs=1) + H([toBox(f, opts=opts) | f <- elements[i+1..i+3]], hs=0) ], hs=0) | int i <- [0,4..size(elements)]]); // semi-colons are usually for parameters separation @@ -164,17 +164,12 @@ default Box toBox(t:appl(Production p, list[Tree] args), FO opts = fo()) { case : return V([G([toBox(e, opts=opts) | e <- elements], gs=2, hs=0, op=H)], hs=0); - // if comments are found in layout trees, then we include them here - // and splice them into our context. If the deep match does not find any - // comments, then layout positions are reduced to U([]) which dissappears - // by splicing the empty list. + // we remove all layout node positions to make the number of children predictable + // comments can be recovered by `layoutDiff` case : - return U([toBox(u, opts=opts) | /u:appl(prod(_, _, {*_,\tag("category"(/^[Cc]omment$/))}), _) <- content]); + return NULL(); - // single line comments are special, since they have the newline in a literal - // we must guarantee that the formatter will print the newline, but we don't - // want an additional newline due to the formatter. We do remove any unnecessary - // spaces + // if we are given a comment node, then we can format it here for use by layoutDiff case : return V([ H([toBox(elements[0], opts=opts), @@ -182,6 +177,7 @@ default Box toBox(t:appl(Production p, list[Tree] args), FO opts = fo()) { ], hs=1) ]); + // if we are given a comment node, then we can pretty print it here for use by layoutDiff case : return V([ H([toBox(elements[0], opts=opts), From c545a746584cf8437ce7d2ec38dc2d452b083ccb Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Wed, 13 Aug 2025 10:46:47 +0200 Subject: [PATCH 73/76] finetuning the pico demo --- .../library/lang/box/util/Tree2Box.rsc | 43 ++++++++++++++----- .../library/lang/pico/format/Formatting.rsc | 12 +++++- 2 files changed, 44 insertions(+), 11 deletions(-) diff --git a/src/org/rascalmpl/library/lang/box/util/Tree2Box.rsc b/src/org/rascalmpl/library/lang/box/util/Tree2Box.rsc index 2ef4b09bb02..6aff94d64e0 100644 --- a/src/org/rascalmpl/library/lang/box/util/Tree2Box.rsc +++ b/src/org/rascalmpl/library/lang/box/util/Tree2Box.rsc @@ -124,32 +124,54 @@ default Box toBox(t:appl(Production p, list[Tree] args), FO opts = fo()) { return HOV([ H([ toBox(elements[i], opts=opts), - H([toBox(f, opts=opts) | f <- elements[i+1..i+3]], hs=1) - ], hs=0) | int i <- [0,4..size(elements)]]); + *[H([toBox(elements[i+2], opts=opts)], hs=1) | i + 2 < size(elements)] + ], hs=0) | int i <- [0,4..size(elements)] + ]); // comma's are usually for parameters separation case : return HOV([ H([ toBox(elements[i], opts=opts), - H([toBox(f, opts=opts) | f <- elements[i+1..i+3]], hs=0) - ], hs=0) | int i <- [0,4..size(elements)]]); + *[H([toBox(elements[i+2], opts=opts)], hs=1) | i + 2 < size(elements)] + ], hs=0) | int i <- [0,4..size(elements)] + ]); // semi-colons are usually for statement separation case : return V([ H([ toBox(elements[i], opts=opts), - H([toBox(f, opts=opts) | f <- elements[i+1..i+3]], hs=0) - ], hs=0) | int i <- [0,4..size(elements)]]); + *[H([toBox(elements[i+2], opts=opts)], hs=1) | i + 2 < size(elements)] + ], hs=0) | int i <- [0,4..size(elements)] + ]); + + // optional semi-colons also happen often + case : + return V([ + H([ + toBox(elements[i], opts=opts), + *[H([toBox(elements[i+2], opts=opts)], hs=1) | i + 2 < size(elements)] + ], hs=0) | int i <- [0,4..size(elements)] + ]); // semi-colons are usually for parameters separation case : return V([ H([ toBox(elements[i], opts=opts), - H([toBox(f, opts=opts) | f <- elements[i+1..i+3]], hs=1) - ], hs=0) | int i <- [0,4..size(elements)]]); + *[H([toBox(elements[i+2], opts=opts)], hs=1) | i + 2 < size(elements)] + ], hs=0) | int i <- [0,4..size(elements)] + ]); + + // optional semi-colons also happen often + case : + return V([ + H([ + toBox(elements[i], opts=opts), + *[H([toBox(elements[i+2], opts=opts)], hs=1) | i + 2 < size(elements)] + ], hs=0) | int i <- [0,4..size(elements)] + ]); case : return V([G([toBox(e, opts=opts) | e <- elements], gs=4, hs=0, op=H)], hs=1); @@ -164,8 +186,9 @@ default Box toBox(t:appl(Production p, list[Tree] args), FO opts = fo()) { case : return V([G([toBox(e, opts=opts) | e <- elements], gs=2, hs=0, op=H)], hs=0); - // we remove all layout node positions to make the number of children predictable - // comments can be recovered by `layoutDiff` + // We remove all layout node positions to make the number of children predictable + // Comments can be recovered by `layoutDiff`. By not recursing into layout + // positions `toBox` becomes more than twice as fast. case : return NULL(); diff --git a/src/org/rascalmpl/library/lang/pico/format/Formatting.rsc b/src/org/rascalmpl/library/lang/pico/format/Formatting.rsc index b505b640cbd..594adbcd1c0 100644 --- a/src/org/rascalmpl/library/lang/pico/format/Formatting.rsc +++ b/src/org/rascalmpl/library/lang/pico/format/Formatting.rsc @@ -52,10 +52,20 @@ str formatPicoString(str file) { list[FileSystemChange] formatPico(start[Program] file) = [changed(file@\loc.top, layoutDiff(file, parse(#start[Program], (format o toBox)(file), file@\loc.top)))]; -@synopsis{Make sure while loops are formatted the way we want them to be.} +@synopsis{Format while} Box toBox((Statement) `while do <{Statement ";"}* block> od`, FO opts = fo()) = V([ H([L("while"), toBox(e, opts=opts), L("do")]), I([toBox(block, opts=opts)]), L("od") + ]); + +@synopsis{Format if-then-else } +Box toBox((Statement) `if then <{Statement ";"}* thenPart> else <{Statement ";"}* elsePart> fi`, FO opts = fo()) + = V([ + H([L("if"), toBox(e, opts=opts), L("then")]), + I([toBox(thenPart, opts=opts)]), + L("else"), + I([toBox(elsePart, opts=opts)]), + L("fi") ]); \ No newline at end of file From 9e07b1fed8f061df2a7e13a5e463da047233d064 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Wed, 13 Aug 2025 13:39:01 +0200 Subject: [PATCH 74/76] factored clones in pico formatting demo --- .../library/lang/pico/format/Formatting.rsc | 38 ++++--------------- 1 file changed, 7 insertions(+), 31 deletions(-) diff --git a/src/org/rascalmpl/library/lang/pico/format/Formatting.rsc b/src/org/rascalmpl/library/lang/pico/format/Formatting.rsc index 594adbcd1c0..42ef9fd3c75 100644 --- a/src/org/rascalmpl/library/lang/pico/format/Formatting.rsc +++ b/src/org/rascalmpl/library/lang/pico/format/Formatting.rsc @@ -12,45 +12,21 @@ import ParseTree; @synopsis{In-place formatting of an entire Pico file} void formatPicoFile(loc file) { - // first we parse the program - start[Program] tree = parse(#start[Program], file); - - // then we apply an adaptable formatting style to every node - Box box = toBox(tree); - - // then we solve the two-dimensional layout problem, and get a formatted result - str formatted = format(box); - - // now we extract a list of exact differences from the old and a new parse tree - start[Program] formattedTree = parse(#start[Program], formatted, file); - list[TextEdit] edits = layoutDiff(tree, formattedTree); - - // finally we apply the differences to the original file + edits = formatPicoTree(parse(#start[Program], file)); executeFileSystemChanges([changed(file, edits)]); } @synopsis{Format a string that contains an entire Pico program} str formatPicoString(str file) { - // first we parse the program start[Program] tree = parse(#start[Program], file, |unknown:///|); - - // then we apply an adaptable formatting style to every node - Box box = toBox(tree); - - // then we solve the two-dimensional layout problem, and get a formatted result - str formatted = format(box); - - // now we extract a list of exact differences from the old and a new parse tree - start[Program] formattedTree = parse(#start[Program], formatted, |unknown:///|); - list[TextEdit] edits = layoutDiff(tree, formattedTree); - - // finally we apply the differences to the original contents - return executeTextEdits(file, edits); + return executeTextEdits(file, formatPico(tree)[0].edits); } -@synopsis{Pico Format function for use in an IDE} -list[FileSystemChange] formatPico(start[Program] file) - = [changed(file@\loc.top, layoutDiff(file, parse(#start[Program], (format o toBox)(file), file@\loc.top)))]; +@synopsis{Pico Format function for reuse in file, str or IDE-based formatting contexts} +list[TextEdit] formatPicoTree(start[Program] file) { + formatted = format(toBox(file)); + return layoutDiff(file, parse(#start[Program], formatted, file@\loc.top)); +} @synopsis{Format while} Box toBox((Statement) `while do <{Statement ";"}* block> od`, FO opts = fo()) From 93e0b93d12936ba437fcc3b6ddfa526a434127d1 Mon Sep 17 00:00:00 2001 From: "Jurgen J. Vinju" Date: Wed, 13 Aug 2025 14:02:15 +0200 Subject: [PATCH 75/76] improved docs --- .../analysis/diff/edits/HiFiLayoutDiff.rsc | 4 +--- .../library/lang/pico/format/Formatting.rsc | 19 ++++++++++++++++++- 2 files changed, 19 insertions(+), 4 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc index 1820c7d0c3a..2bdc6a878bd 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc @@ -56,7 +56,7 @@ list[TextEdit] layoutDiff(Tree original, Tree formatted, bool recoverComments = list[TextEdit] rec( t:appl(prod(Symbol tS, _, _), list[Tree] tArgs), // layout is not necessarily parsed with the same rules (i.e. comments are lost!) u:appl(prod(Symbol uS, _, _), list[Tree] uArgs)) - = [replace(t@\loc, recoverComments ? learnComments(t, u) : "") | tArgs != uArgs] + = [replace(t@\loc, recoverComments ? learnComments(t, u) : "") | tArgs != uArgs, "" != "" /* avoid useless edits */] when delabel(tS) is layouts, delabel(uS) is layouts, @@ -108,8 +108,6 @@ list[TextEdit] layoutDiff(Tree original, Tree formatted, bool recoverComments = appl(p /* must be the same by the above assert */, list[Tree] argsB)) = [*rec(a, b) | <- zip2(argsA, argsB)]; - - // first add required locations to layout nodes original = reposition(original, markLit=true, markLayout=true, markSubLayout=true); diff --git a/src/org/rascalmpl/library/lang/pico/format/Formatting.rsc b/src/org/rascalmpl/library/lang/pico/format/Formatting.rsc index 42ef9fd3c75..2ac0a99eeb1 100644 --- a/src/org/rascalmpl/library/lang/pico/format/Formatting.rsc +++ b/src/org/rascalmpl/library/lang/pico/format/Formatting.rsc @@ -1,14 +1,31 @@ @synopsis{Demonstrates ((Tree2Box)), ((Box2Text)) and ((HiFiLayoutDiff)) for constructing a declarative and HiFi Pico formatting pipeline} +@description{ +Using four generic or generated, "language parametric", building blocks we construct a Pico formatting pipeline: + +* ((ParseTree)) is used to _generate_ a parser for Pico. +* ((Tree2Box)) provides the extensible/overridable and declarative ((toBox) function which maps language constructs to Box expressions. +The ((toBox)) function combines generic language-parametric rules, as well as bespoke language specific rules.. +* ((Box2Tree)) is a _generic_ reusable algorithm for two-dimensional string layout. +* Finally, ((HiFiLayoutDiff)) _generically_ extracts ((TextEdit))s from two trees which are equal modulo whitespace and comments. +} +@benefits{ +* The formatting is style is programmed _declaratively_ by mapping language patterns to Box expressions. +* The pipeline never loses source code comments, and this requires no attention from the language engineer. +} +@pitfalls{ +* ((Box2Text)) must be _extended_ for the open recursive calls of ((toBox)) to reach the extensions in the current module. +If you import ((Box2Text)) the extended ((toBox)) rules will only be found if they describe top-level tree nodes. +} module lang::pico::format::Formatting extend lang::box::util::Tree2Box; +import ParseTree; import analysis::diff::edits::ExecuteTextEdits; import analysis::diff::edits::HiFiLayoutDiff; import lang::box::\syntax::Box; import lang::box::util::Box2Text; import lang::pico::\syntax::Main; -import ParseTree; @synopsis{In-place formatting of an entire Pico file} void formatPicoFile(loc file) { From b56bb9346118599033ed4a914470526c40ef8846 Mon Sep 17 00:00:00 2001 From: Toine Hartman Date: Thu, 14 Aug 2025 10:59:20 +0200 Subject: [PATCH 76/76] Fix tutor & type errors. --- .../rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc | 2 +- src/org/rascalmpl/library/lang/pico/format/Formatting.rsc | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc index 609b57ad6f3..2fcb2d3b3d5 100644 --- a/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc +++ b/src/org/rascalmpl/library/analysis/diff/edits/HiFiTreeDiff.rsc @@ -179,7 +179,7 @@ edits = treeDiff(original, rewritten); // Wrap them in a single document edit edit = changed(original@\loc.top, edits); // Apply the document edit on disk: -executeDocumentEdit(edit); +executeDocumentEdits([edit]); // and when we read the result back, we see the transformation succeeded, and indentation was not lost: readFile(tmp://example.pico|); // It's also possible to directly rewrite the original string, for debugging purposes: diff --git a/src/org/rascalmpl/library/lang/pico/format/Formatting.rsc b/src/org/rascalmpl/library/lang/pico/format/Formatting.rsc index 2ac0a99eeb1..20941f8a5f8 100644 --- a/src/org/rascalmpl/library/lang/pico/format/Formatting.rsc +++ b/src/org/rascalmpl/library/lang/pico/format/Formatting.rsc @@ -3,9 +3,9 @@ Using four generic or generated, "language parametric", building blocks we construct a Pico formatting pipeline: * ((ParseTree)) is used to _generate_ a parser for Pico. -* ((Tree2Box)) provides the extensible/overridable and declarative ((toBox) function which maps language constructs to Box expressions. +* ((Tree2Box)) provides the extensible/overridable and declarative ((toBox)) function which maps language constructs to Box expressions. The ((toBox)) function combines generic language-parametric rules, as well as bespoke language specific rules.. -* ((Box2Tree)) is a _generic_ reusable algorithm for two-dimensional string layout. +* ((Box2Text)) is a _generic_ reusable algorithm for two-dimensional string layout. * Finally, ((HiFiLayoutDiff)) _generically_ extracts ((TextEdit))s from two trees which are equal modulo whitespace and comments. } @benefits{ @@ -36,7 +36,7 @@ void formatPicoFile(loc file) { @synopsis{Format a string that contains an entire Pico program} str formatPicoString(str file) { start[Program] tree = parse(#start[Program], file, |unknown:///|); - return executeTextEdits(file, formatPico(tree)[0].edits); + return executeTextEdits(file, formatPicoTree(tree)); } @synopsis{Pico Format function for reuse in file, str or IDE-based formatting contexts}