[TVMScript] AST, Source and diagnostics for Parser #12978

cyx-6 · 2022-10-04T00:48:32Z

This PR introduces AST, Source and diagnostics for Parser

Co-authored-by: yongwww yongcale@gmail.com

This PR introduces AST, Source and diagnostics for Parser Co-authored-by: yongwww <yongcale@gmail.com>

tests/python/unittest/test_tvmscript_parser_source.py

tkonolige

Hi @cyx-6, I appreciate the work you're doing to improve the tvmscript parser. I've left some feedback.

tkonolige · 2022-10-04T22:13:39Z

python/tvm/script/_parser/core/doc.py

+# specific language governing permissions and limitations
+# under the License.
+# pylint: disable=missing-docstring
+import ast


This should be using synr instead of the python ast module. Synr provides a stable ast across multiple versions of python, unlike ast which changes version to version. Synr is also used by the old tvmscript parser.

We are using the new stable AST provided by TVMScript. Its stability has been tested across python 3.7 - 3.10 in the following two settings:

with 1000 real-world TVMScripts in the testsuite;

every single python file in TVM repo.

Not using synr is intentional, and the reason is stated and widely discussed in the RFC: for metaprogramming. During metaprogramming, python interpreter is used to evaluate pieces of the python AST, but honest back-translation from synr to python AST is not possible because it drops necessary information, for example, x += 1 is normalized into x = x + 1.

TVMScript's stable AST is designed with this in mind: it is an honest python 3.8 AST without any change. Any other versions of the AST is canonicalized to python 3.8 AST, and the file allow it to be translated back.

During metaprogramming, python interpreter is used to evaluate pieces of the python AST, but honest back-translation from synr to python AST is not possible because it drops necessary information, for example, x += 1 is normalized into x = x + 1.

I'm a little unclear on why normalization is a problem. x += 1 and x = x + 1 are equivalent statements.

Why not fix synr instead? It avoids having to rewrite large amounts of code. Sorry, but ditching synr wasn't made clear in the RFC. I was assuming you were going to use synr to maintain compatibility with multiple versions of python.

glad you mentioned the engineering effort of maintaining synr, and that's exactly why we need a stable AST that's exactly the same as the one as python 3.8 :-)

At the time of writing, there are 1725 lines of code in synr folder. There are 22 tests developed to test all the features for parsing across all python versions.

Comparing with synr, there are only 361 lines of code for the new stable AST in TVMScript, which in the meantime, handles translation and back-translation in-between python ast. It's tested against thousands of real-world cases as mentioned in my previous reply.

The key here is that the stable ast is exactly honest with no tweaking to python's AST, so many logics can be implemented with some python programming tricks, for example, __getattr__.

To summarize, with this approach, the AST is more succinct, more stable, more stringently tested and more maintainable.

x += 1 and x = x + 1 are equivalent statements.

it could be equivalent with certain assumptions, but not in general case, for example,

a[f()] += 1

is not equivalent to

a[f()] = a[f()] + 1

if f has side effects, which is completely possible in metaprogramming, then we cannot make assumption.

just a question here @junrushao

with 1000 real-world TVMScripts in the testsuite;

do these tests make asserts about the IR to validate that they roundtrip correctly and that the parsed IR matches the Python? I agree there are a great many TVMScript in the tvm repo, but I am wondering about the logic that they all serve to validate this PR. I see one test case in this PR, so my concern here is that we may silently introduce a bug because those 1000 tests are not intended to validate the TVMScript and therefore don't assert it is parsed correctly. If the new TVMScript parser changes, then we could in effect decrease the coverage of 1000s of tests that rely on TVMScript to assert various compiler behaviors.

A more convincing argument here is that you've hooked the TVMScript parser before and after this change, dumped the IR, and validated case-by-case that the IR matches. Is that what's being argued?

do these tests make asserts about the IR to validate that they roundtrip correctly and that the parsed IR matches the Python?

Yes, these are all roundtrip tests

could you point me at the test suite so i can make sure we're on the same page?

@areusch https://github.com/junrushao/tvmscript-testsuite

ok, i agree that does look like a reasonable test. is there any reason these aren't upstreamed?

we haven't got the parser upstreamed yet, so there is some dependency

python/tvm/script/_parser/core/diagnostics.py

python/tvm/script/_parser/core/doc.py

cyx-6 · 2022-10-05T00:02:01Z

Thanks @tkonolige for valuable feedback over this pr. All those comments about documentation have been updated.

junrushao · 2022-10-06T18:00:56Z

Gentle ping @tkonolige

junrushao · 2022-10-07T15:16:28Z

The thread has been stale for a couple of days. Any more feedbacks? Thanks a lot!

tkonolige

Thank you for the discussion thus far. I am not opposed to the introduction of this new AST, but before we do so I want to make sure it has solid test coverage and documentation so that it's extensible and maintainable going forward.

I would ask that we:

Integrate the test suite prior to the introduction of the new parser. This way we can verify it works with the current parser before switching over to the new one.
Add documentation to both doc_core.py and the parser. Notably there should be examples on each AST node of how and where they occur in a python program. synr.ast.Call is a good example of how the docs can be written.
Correct the spans for parts of the python ast. withitem, Slice, Index, and keyword all need corrections (you can see some here: https://github.com/octoml/synr/blob/main/synr/compiler.py#L60-L88).
Improve error messages so that they state where and why the parse failed. For example: https://github.com/octoml/synr/blob/main/synr/compiler.py#L295-L298.

These additions/changes would address my concerns regarding this PR. Please let me know if there's anything I can do to help clarify or address these points. Happy to discuss further as needed and work together to get this PR in!

Thank you again @cyx-6 for this contribution and @cyx-6 and @junrushao for the valuable discussion!

tkonolige · 2022-10-06T19:36:44Z

python/tvm/script/_parser/core/diagnostics.py

+        try:
+            # It will cause a problem when running in Jupyter Notebook.
+            # `mod` will be <module '__main__'>, which is a built-in module
+            # and `getsource` will throw a TypeError
+            mod = inspect.getmodule(program)
+            if mod:
+                self.full_source = inspect.getsource(mod)
+            else:
+                self.full_source = self.source
+        except TypeError:
+            # It's a work around for Jupyter problem.
+            # Since `findsource` is an internal API of inspect, we just use it
+            # as a fallback method.
+            src, _ = inspect.findsource(program)  # type: ignore
+            self.full_source = "".join(src)


This looks directly copied from synr (https://github.com/octoml/synr/blob/main/synr/compiler.py#L681-L695). I think you'll have to give attribution. @areusch how can this be done?

the author is @Hzfengsy. @cyx-6 please properly credit @Hzfengsy for his contribution in synr.

python/tvm/script/_parser/core/doc.py

python/tvm/script/_parser/core/diagnostics.py

python/tvm/script/_parser/core/doc.py

tkonolige · 2022-10-06T23:52:31Z

python/tvm/script/_parser/core/doc.py

+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""TVM Script Parser doc AST"""


Can you document how new AST nodes can be added to the parser.

python/tvm/script/_parser/core/doc.py

tkonolige · 2022-10-07T18:39:47Z

python/tvm/script/_parser/core/doc.py

+                    lineno=getattr(x.slice, "lineno", None),
+                    col_offset=getattr(x.slice, "col_offset", None),
+                    end_lineno=getattr(x.slice, "end_lineno", None),
+                    end_col_offset=getattr(x.slice, "end_col_offset", None),


You should use the line numbers from x.slice.lower and x.slice.upper here. The x.slice doesn't have the correct line numbers in some versions of the python ast.

jwfromm · 2022-10-07T19:53:35Z

@tkonolige I just want to clarify for point 1. Are you saying that the test suite for the new parser should be ported to work for the old parser, then upstreamed, then ported back for the new parser? I understand there is some benefit to making sure the tests work but it seems like a pretty high effort / reward ratio.

tkonolige · 2022-10-07T19:57:56Z

@jwfromm There shouldn't be any work needed to port the test suite. From looking through it, it is all tvmscript prim funcs. They should be parsed equivalently by the new and old parsers. Otherwise it would mean that there is a regression in the new parser (which is the whole point of adding these tests first).

jwfromm · 2022-10-07T20:32:40Z

@cyx-6 and @junrushao, are there any blockers to doing a PR with the tvmscript tests before the infrastructure for the new parser is upstreamed? I think its reasonable that if those tests work regardless of parser, they could be done in a separate parallel PR.

jwfromm · 2022-10-08T03:18:06Z

After reading the comment again I'm still a little confused. Why does it matter the tests are added first? As long as the tests themselves are good and used on the new parser there shouldnt be regressions. I'm having a hard time understanding why its worth delaying this PR.

junrushao

LGTM

tkonolige · 2022-10-10T15:52:47Z

@jwfromm You say "As long as the tests themselves are good". That is what I would like checked by running with the existing parser first.

jwfromm · 2022-10-10T17:37:02Z

I guess I'm not understanding what outcome would be meaningful. If the tests work then great, there's no functionality difference between the two. If they fail on the current parser, then it means this parser implements some new features or support. Why would that be a problem? I would totally agree if we were planning on replacing rather than supplementing the current parser tests, but that's not the plan as far as I can tell.

tkonolige · 2022-10-10T18:20:51Z

If they fail on the current parser, then it means this parser implements some new features or support. Why would that be a problem?

The alternative is that these tests fail on the current parser because the new parser has different behavior. This would mean that existing tmvscript programs could be broken. I want to avoid breaking changes because there is already a lot of tvmscript out there.

jwfromm · 2022-10-10T21:09:29Z

I see, so to summarize the concern is that there may be untested behaviors of the current parser that users rely on in external codebases but adding more tests pre-migration reduces the odds of this. Although its not clear to me that its worth delaying this PR for, the argument does make sense. However, it only would work if the test suite is compatible with the parser on main today. @junrushao is that the case or does the new test suite rely on features being introduced by this work?

junrushao · 2022-10-10T21:12:25Z

To be crystal clear, this PR, as C1 of a complete parser, is a simple scaffolding, which doesn't contain any TIR parsing logic, which means it doesn't parse TIR at all. The testsuite would only pass if we have all pieces C1-C4 upstreamed to mainline, and at that time we will be able to parse TIR.

tkonolige · 2022-10-10T22:45:37Z

If we can't use the test suite for this PR, then I think some more tests need to be added. Looking at what is tested currently, the following AST nodes are missing testing:

ClassDef
Return
AnnAssign
AugAssign
If
Assert
BoolOp
UnaryOp
Call
ExtSlice
Tuple
Comparison operators (> < <= >= ==)

junrushao · 2022-10-10T23:34:32Z

@tkonolige I would love to remind that there is no specific processing logic in the PR (besides ExtSlice) for the nodes you just listed, and thus I don't believe the testcases you proposed are relevant.

tkonolige

I'm removing my request for changes, that being said, not all my concerns here have been addressed, particularly those around test coverage.

@tkonolige

@tkonolige: I'm removing my request for changes, that being said, not all my concerns here have been addressed, particularly those around test coverage.

junrushao · 2022-10-11T18:54:21Z

Thanks @tkonolige @areusch @jwfromm @spectrometerHBH for your valuable input! We will make sure tests are added according to the infra being built.

This PR introduces AST, Source and diagnostics for Parser

[TVMScript] AST, Source and diagnostics for Parser

260e96d

This PR introduces AST, Source and diagnostics for Parser Co-authored-by: yongwww <yongcale@gmail.com>

cyx-6 force-pushed the upstream-c-ast branch from 47b0af1 to 260e96d Compare October 4, 2022 01:08

junrushao reviewed Oct 4, 2022

View reviewed changes

tests/python/unittest/test_tvmscript_parser_source.py Outdated Show resolved Hide resolved

apply code review suggestion

7208596

tkonolige requested changes Oct 4, 2022

View reviewed changes

add doc for public APIs

d41a853

add module doc and function doc

7af4bc7

tkonolige previously requested changes Oct 7, 2022

View reviewed changes

add more detailed doc

400ea02

junrushao approved these changes Oct 8, 2022

View reviewed changes

spectrometerHBH approved these changes Oct 8, 2022

View reviewed changes

junrushao mentioned this pull request Oct 10, 2022

[Tracking Issue] TVMScript Metaprogramming #12442

Closed

29 tasks

tkonolige reviewed Oct 11, 2022

View reviewed changes

junrushao merged commit afeab6e into apache:main Oct 11, 2022

xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 25, 2022

[TVMScript] AST, Source and diagnostics for Parser (apache#12978)

157ea9e

This PR introduces AST, Source and diagnostics for Parser

leandron mentioned this pull request Feb 1, 2023

TVM v0.11.0 Release Candidate Notes #13899

Closed

tkonolige mentioned this pull request Feb 6, 2023

[TVMScript,Fix] Fix findsource when classes are indented #13924

Merged

[TVMScript] AST, Source and diagnostics for Parser #12978

[TVMScript] AST, Source and diagnostics for Parser #12978

Uh oh!

Conversation

cyx-6 commented Oct 4, 2022

Uh oh!

Uh oh!

tkonolige left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

junrushao Oct 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

junrushao Oct 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

junrushao Oct 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

junrushao Oct 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cyx-6 commented Oct 5, 2022

Uh oh!

junrushao commented Oct 6, 2022

Uh oh!

junrushao commented Oct 7, 2022

Uh oh!

tkonolige left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jwfromm commented Oct 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tkonolige commented Oct 7, 2022

Uh oh!

jwfromm commented Oct 7, 2022

Uh oh!

jwfromm commented Oct 8, 2022

Uh oh!

junrushao left a comment

Choose a reason for hiding this comment

Uh oh!

tkonolige commented Oct 10, 2022

Uh oh!

junrushao Oct 4, 2022 •

edited

Loading

junrushao Oct 4, 2022 •

edited

Loading

junrushao Oct 5, 2022 •

edited

Loading

junrushao Oct 5, 2022 •

edited

Loading

tkonolige left a comment •

edited

Loading

jwfromm commented Oct 7, 2022 •

edited

Loading

jwfromm commented Oct 10, 2022 •

edited

Loading

jwfromm commented Oct 10, 2022 •

edited

Loading

junrushao commented Oct 10, 2022 •

edited

Loading

junrushao commented Oct 10, 2022 •

edited

Loading