Skip to content

Conversation

@cyx-6
Copy link
Contributor

@cyx-6 cyx-6 commented Oct 4, 2022

This PR introduces AST, Source and diagnostics for Parser

Co-authored-by: yongwww yongcale@gmail.com

This PR introduces AST, Source and diagnostics for Parser

Co-authored-by: yongwww <yongcale@gmail.com>
Copy link
Contributor

@tkonolige tkonolige left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @cyx-6, I appreciate the work you're doing to improve the tvmscript parser. I've left some feedback.

# specific language governing permissions and limitations
# under the License.
# pylint: disable=missing-docstring
import ast
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be using synr instead of the python ast module. Synr provides a stable ast across multiple versions of python, unlike ast which changes version to version. Synr is also used by the old tvmscript parser.

Copy link
Member

@junrushao junrushao Oct 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using the new stable AST provided by TVMScript. Its stability has been tested across python 3.7 - 3.10 in the following two settings:

  • with 1000 real-world TVMScripts in the testsuite;
  • every single python file in TVM repo.

Not using synr is intentional, and the reason is stated and widely discussed in the RFC: for metaprogramming. During metaprogramming, python interpreter is used to evaluate pieces of the python AST, but honest back-translation from synr to python AST is not possible because it drops necessary information, for example, x += 1 is normalized into x = x + 1.

TVMScript's stable AST is designed with this in mind: it is an honest python 3.8 AST without any change. Any other versions of the AST is canonicalized to python 3.8 AST, and the file allow it to be translated back.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During metaprogramming, python interpreter is used to evaluate pieces of the python AST, but honest back-translation from synr to python AST is not possible because it drops necessary information, for example, x += 1 is normalized into x = x + 1.

I'm a little unclear on why normalization is a problem. x += 1 and x = x + 1 are equivalent statements.

Why not fix synr instead? It avoids having to rewrite large amounts of code. Sorry, but ditching synr wasn't made clear in the RFC. I was assuming you were going to use synr to maintain compatibility with multiple versions of python.

Copy link
Member

@junrushao junrushao Oct 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

glad you mentioned the engineering effort of maintaining synr, and that's exactly why we need a stable AST that's exactly the same as the one as python 3.8 :-)

At the time of writing, there are 1725 lines of code in synr folder. There are 22 tests developed to test all the features for parsing across all python versions.

Comparing with synr, there are only 361 lines of code for the new stable AST in TVMScript, which in the meantime, handles translation and back-translation in-between python ast. It's tested against thousands of real-world cases as mentioned in my previous reply.

The key here is that the stable ast is exactly honest with no tweaking to python's AST, so many logics can be implemented with some python programming tricks, for example, __getattr__.

To summarize, with this approach, the AST is more succinct, more stable, more stringently tested and more maintainable.

x += 1 and x = x + 1 are equivalent statements.

it could be equivalent with certain assumptions, but not in general case, for example,

a[f()] += 1

is not equivalent to

a[f()] = a[f()] + 1

if f has side effects, which is completely possible in metaprogramming, then we cannot make assumption.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a question here @junrushao

with 1000 real-world TVMScripts in the testsuite;

do these tests make asserts about the IR to validate that they roundtrip correctly and that the parsed IR matches the Python? I agree there are a great many TVMScript in the tvm repo, but I am wondering about the logic that they all serve to validate this PR. I see one test case in this PR, so my concern here is that we may silently introduce a bug because those 1000 tests are not intended to validate the TVMScript and therefore don't assert it is parsed correctly. If the new TVMScript parser changes, then we could in effect decrease the coverage of 1000s of tests that rely on TVMScript to assert various compiler behaviors.

A more convincing argument here is that you've hooked the TVMScript parser before and after this change, dumped the IR, and validated case-by-case that the IR matches. Is that what's being argued?

Copy link
Member

@junrushao junrushao Oct 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do these tests make asserts about the IR to validate that they roundtrip correctly and that the parsed IR matches the Python?

Yes, these are all roundtrip tests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you point me at the test suite so i can make sure we're on the same page?

Copy link
Member

@junrushao junrushao Oct 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i agree that does look like a reasonable test. is there any reason these aren't upstreamed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we haven't got the parser upstreamed yet, so there is some dependency

@cyx-6
Copy link
Contributor Author

cyx-6 commented Oct 5, 2022

Thanks @tkonolige for valuable feedback over this pr. All those comments about documentation have been updated.

@junrushao
Copy link
Member

Gentle ping @tkonolige

@junrushao
Copy link
Member

The thread has been stale for a couple of days. Any more feedbacks? Thanks a lot!

Copy link
Contributor

@tkonolige tkonolige left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the discussion thus far. I am not opposed to the introduction of this new AST, but before we do so I want to make sure it has solid test coverage and documentation so that it's extensible and maintainable going forward.

I would ask that we:

  1. Integrate the test suite prior to the introduction of the new parser. This way we can verify it works with the current parser before switching over to the new one.

  2. Add documentation to both doc_core.py and the parser. Notably there should be examples on each AST node of how and where they occur in a python program. synr.ast.Call is a good example of how the docs can be written.

  3. Correct the spans for parts of the python ast. withitem, Slice, Index, and keyword all need corrections (you can see some here: https://github.com/octoml/synr/blob/main/synr/compiler.py#L60-L88).

  4. Improve error messages so that they state where and why the parse failed. For example: https://github.com/octoml/synr/blob/main/synr/compiler.py#L295-L298.

These additions/changes would address my concerns regarding this PR. Please let me know if there's anything I can do to help clarify or address these points. Happy to discuss further as needed and work together to get this PR in!

Thank you again @cyx-6 for this contribution and @cyx-6 and @junrushao for the valuable discussion!

Comment on lines +57 to +71
try:
# It will cause a problem when running in Jupyter Notebook.
# `mod` will be <module '__main__'>, which is a built-in module
# and `getsource` will throw a TypeError
mod = inspect.getmodule(program)
if mod:
self.full_source = inspect.getsource(mod)
else:
self.full_source = self.source
except TypeError:
# It's a work around for Jupyter problem.
# Since `findsource` is an internal API of inspect, we just use it
# as a fallback method.
src, _ = inspect.findsource(program) # type: ignore
self.full_source = "".join(src)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks directly copied from synr (https://github.com/octoml/synr/blob/main/synr/compiler.py#L681-L695). I think you'll have to give attribution. @areusch how can this be done?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the author is @Hzfengsy. @cyx-6 please properly credit @Hzfengsy for his contribution in synr.

# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""TVM Script Parser doc AST"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document how new AST nodes can be added to the parser.

Comment on lines +314 to +317
lineno=getattr(x.slice, "lineno", None),
col_offset=getattr(x.slice, "col_offset", None),
end_lineno=getattr(x.slice, "end_lineno", None),
end_col_offset=getattr(x.slice, "end_col_offset", None),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use the line numbers from x.slice.lower and x.slice.upper here. The x.slice doesn't have the correct line numbers in some versions of the python ast.

@jwfromm
Copy link
Contributor

jwfromm commented Oct 7, 2022

@tkonolige I just want to clarify for point 1. Are you saying that the test suite for the new parser should be ported to work for the old parser, then upstreamed, then ported back for the new parser? I understand there is some benefit to making sure the tests work but it seems like a pretty high effort / reward ratio.

@tkonolige
Copy link
Contributor

@jwfromm There shouldn't be any work needed to port the test suite. From looking through it, it is all tvmscript prim funcs. They should be parsed equivalently by the new and old parsers. Otherwise it would mean that there is a regression in the new parser (which is the whole point of adding these tests first).

@jwfromm
Copy link
Contributor

jwfromm commented Oct 7, 2022

@cyx-6 and @junrushao, are there any blockers to doing a PR with the tvmscript tests before the infrastructure for the new parser is upstreamed? I think its reasonable that if those tests work regardless of parser, they could be done in a separate parallel PR.

@jwfromm
Copy link
Contributor

jwfromm commented Oct 8, 2022

After reading the comment again I'm still a little confused. Why does it matter the tests are added first? As long as the tests themselves are good and used on the new parser there shouldnt be regressions. I'm having a hard time understanding why its worth delaying this PR.

Copy link
Member

@junrushao junrushao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tkonolige
Copy link
Contributor

@jwfromm You say "As long as the tests themselves are good". That is what I would like checked by running with the existing parser first.

@jwfromm
Copy link
Contributor

jwfromm commented Oct 10, 2022

I guess I'm not understanding what outcome would be meaningful. If the tests work then great, there's no functionality difference between the two. If they fail on the current parser, then it means this parser implements some new features or support. Why would that be a problem? I would totally agree if we were planning on replacing rather than supplementing the current parser tests, but that's not the plan as far as I can tell.

@tkonolige
Copy link
Contributor

If they fail on the current parser, then it means this parser implements some new features or support. Why would that be a problem?

The alternative is that these tests fail on the current parser because the new parser has different behavior. This would mean that existing tmvscript programs could be broken. I want to avoid breaking changes because there is already a lot of tvmscript out there.

@jwfromm
Copy link
Contributor

jwfromm commented Oct 10, 2022

I see, so to summarize the concern is that there may be untested behaviors of the current parser that users rely on in external codebases but adding more tests pre-migration reduces the odds of this. Although its not clear to me that its worth delaying this PR for, the argument does make sense. However, it only would work if the test suite is compatible with the parser on main today. @junrushao is that the case or does the new test suite rely on features being introduced by this work?

@junrushao
Copy link
Member

junrushao commented Oct 10, 2022

To be crystal clear, this PR, as C1 of a complete parser, is a simple scaffolding, which doesn't contain any TIR parsing logic, which means it doesn't parse TIR at all. The testsuite would only pass if we have all pieces C1-C4 upstreamed to mainline, and at that time we will be able to parse TIR.

@tkonolige
Copy link
Contributor

If we can't use the test suite for this PR, then I think some more tests need to be added. Looking at what is tested currently, the following AST nodes are missing testing:

  • ClassDef
  • Return
  • AnnAssign
  • AugAssign
  • If
  • Assert
  • BoolOp
  • UnaryOp
  • Call
  • ExtSlice
  • Tuple
  • Comparison operators (> < <= >= ==)

@junrushao
Copy link
Member

junrushao commented Oct 10, 2022

@tkonolige I would love to remind that there is no specific processing logic in the PR (besides ExtSlice) for the nodes you just listed, and thus I don't believe the testcases you proposed are relevant.

Copy link
Contributor

@tkonolige tkonolige left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm removing my request for changes, that being said, not all my concerns here have been addressed, particularly those around test coverage.

@junrushao junrushao dismissed tkonolige’s stale review October 11, 2022 16:43

@tkonolige: I'm removing my request for changes, that being said, not all my concerns here have been addressed, particularly those around test coverage.

@junrushao junrushao merged commit afeab6e into apache:main Oct 11, 2022
@junrushao
Copy link
Member

Thanks @tkonolige @areusch @jwfromm @spectrometerHBH for your valuable input! We will make sure tests are added according to the infra being built.

xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 25, 2022
This PR introduces AST, Source and diagnostics for Parser
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants