Skip to content

Conversation

@laenas
Copy link
Contributor

@laenas laenas commented Dec 1, 2020

  • Went with the trinary-union for the setting since we want a default case, but decomposing a 2-case option was awkward.
  • Changed the name of the option since there's a little bit more going on with it now
  • Stacked it as a unique flag rather than an additive flag to simplify the CompilerOption
  • I am slightly less confident in the first-pass approach to the new handling in the parser:

We have a disjointed venn-diagram of the three state situation and the two relevant outcomes of that state - since two of them share a tokenizer and the other two of them share the consumption function. It seemed to either end up in duplication of the match deconstruction or duplication in the deconstruction. I chose the latter.

Also added a note that threw me for a brief moment at first glance as to how the flag's while loop was exiting and to what effect. A guidepost to the future.

@cartermp cartermp requested a review from dsyme December 5, 2020 02:07
@laenas laenas force-pushed the unfiltered-tokenize branch from 4c7f088 to 4ff375e Compare December 18, 2020 22:00
@laenas
Copy link
Contributor Author

laenas commented Dec 18, 2020

Rebased due to awkward habits - but since there was a semi-direct merge conflict there I think it mostly works out to have a cleaner single commit to add this behavior. The only change from the initial PR commit is that I have now added the Tokenizer alias since it gets used in those pair of functions, owing to the change to LexFilter that preceded this - and it seems clearer to do this rather than use the raw function type signatures there. It requires wrapping the unfiltered lexer in a lambda, but that seems reasonable in this circumstance.

Copy link
Contributor

@KevinRansom KevinRansom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be nice to add a test case.
Miserably these types of tests are still in our very worst test harness and haven't been ported to our newer less (but still quite) horrific test harness.

For now the test belongs here:

The tests are really quite yukky!!

The command line goes in an env.lst file, like this
This example validates the output of fsc /?

ReqENU	SOURCE=dummy.fsx COMPILE_ONLY=1  PRECMD="\$FSC_PIPE >help40.txt /? 2>&1"     POSTCMD="\$FSI_PIPE --nologo --quiet --exec ..\\..\\..\\comparer.fsx help40.txt help40.437.1033.bsl"	# /?

The precmd here executes fsc and gathers the output

The post cmd compares the output against a

To verify the tests still succeed, use:

build -c release -testfsharpqa

@laenas
Copy link
Contributor Author

laenas commented Dec 23, 2020

I'll hack it in and get a commit up over the holidays, thanks for the feedback!

@KevinRansom
Copy link
Contributor

Other than that the code looks good mate. Thanks for this.

@laenas
Copy link
Contributor Author

laenas commented Dec 28, 2020

Other than that the code looks good mate. Thanks for this.

Thanks!

I have two things to follow up on, before I've got a commit ready:

  1. I'm not sure how best to tackle actually doing a side-by-side comparison against a known-good: The major issue here is that when fsc is operating with tokenize - it prints a fully-qualified filepath along with the range of the token - and so immediately, the output of fsc is going to be very machine-specific (and possibly run specific, in CI server settings). It feels very wrong to go update how fsc prints this stuff, just to make tests work; updating the current comparer.fsx to have special cases for these tests also feels a little heavy-handed; and so I'm left wondering if the route we'd want to take is to add another special-case comparison script just for these tests, that handles the sanitizing of output - my concern with that approach being that we lose 'known good' output, to some degree, as we'd likely need to trim lines that look something like tokenize - got IDENT @ C:\Users\ryan\Source\fsharp\artifacts\bin\fsc\Debug\net472\dummy.fsx(3,0)-(3,4) down to tokenize - got IDENT (3,0)-(3,4) - not difficult, but does suddenly add the fragility of a special-case script.
  2. Along with that, I ponder somewhat the completeness we'd be looking for in these tests. Is it desirable just to have a quick smoke test of 'these two options produce a specific known output, distinct from each other' - against a minimal dummyfile? Or do we want a comprehensive sample script that covers every token?

@laenas
Copy link
Contributor Author

laenas commented Mar 26, 2021

@dsyme - Context around this is starting to blur really badly in my memory. I summon thee, to determine whether we should:

  1. Expand this out to tweak the tokenization outputs to do something more like relative pathing rather than absolute (which allows us to write clean tests)
  2. Write some extra (untested) logic into the tests to handle matching of paths
  3. Ignore both for a feature that's going to get used once a year 😅

@Happypig375
Copy link
Member

Bump @dsyme @KevinRansom

@KevinRansom
Copy link
Contributor

@laenas Can we please have some tests added.

Thanks

Kevin

@laenas
Copy link
Contributor Author

laenas commented Jun 28, 2021

@KevinRansom
See the above questions from the initial request. Are there decisions made for the problems in attempting to add such tests?

@vzarytovskii
Copy link
Member

@KevinRansom @laenas yeah, we will need to add some sort of framework for comparing the lexer tokens (with paths and (possibly) ranges erasure).
I can look into it. If it's gonna be a low effort - I can quickly make it work, if not - I think we can merge it since not much of a new functionality was added here.

Also, I think I messed up a bit with reverts, I'm gonna take a look at it.

@laenas
Copy link
Contributor Author

laenas commented Jun 29, 2021

@vzarytovskii I'm happy to pitch in with regards to trying to bolt some framework on for that output comparison, I'm just trying to think through the edge-cases of the parsing of the string, since filepaths can be so wildly similar to other tokens we'd be looking for. I suppose rather than trying to 'skip over' what looks like a filepath, we could just read from both ends? Up to the first @ on the front, and backwards to the initial range from the end?

@vzarytovskii
Copy link
Member

Yeah, that's a good question...in IL comparer/verifier, we just try to skip the paths and normalize certain things. Not sure if the same will work with the tokenizer.

@laenas
Copy link
Contributor Author

laenas commented Jun 29, 2021

At a glance it should, I'm just thinking through trying to minimize the parsing complexity - since so much overlap exists between allowed filepaths and 'known' tokens in the output string.

But second to that, what do we want these sort of tests to even...test? All tokens? Only tokens that filter? I'm a little fuzzy and don't have the source up now, but I don't think even normal --tokenize has tests around it to model this after.

@vzarytovskii
Copy link
Member

vzarytovskii commented Jun 29, 2021

At a glance it should, I'm just thinking through trying to minimize the parsing complexity - since so much overlap exists between allowed filepaths and 'known' tokens in the output string.

But second to that, what do we want these sort of tests to even...test? All tokens? Only tokens that filter? I'm a little fuzzy and don't have the source up now, but I don't think even normal --tokenize has tests around it to model this after.

Yeah, I think the only tokenizer tests are in service (TokenizerTests), which is slightly different.

We pretty much only want to test if the command line switch works and outputs something which looks like tokens (both normal and unfiltered). I don't think we want to test tokenizer correctness here, since you didn't change anything about tokenizer itself.

I don't even think we even should check ranges in such a test.
I would argue that having fsc --tokenize running on a very simple program, and checking certain things in output should be enough. @KevinRansom may disagree though :)

@vzarytovskii
Copy link
Member

vzarytovskii commented Jun 29, 2021

@laenas I've added a few basic tests which are checking tokens for given source. Does it look right to you?
I've removed paths and ranges, so these tokens should be 1:1 match to the source.

@vzarytovskii
Copy link
Member

CI is having a nap. There are some issues with AzDo currently.

@laenas
Copy link
Contributor Author

laenas commented Jun 29, 2021

@vzarytovskii at a glance that all looks correct from the differentials. Could you elaborate for me - when you say 'removed paths and ranges' - this is because the test stuff now can use the <Expects> element to effectively do the equivalent to a Contains just to check for that known good string?

@vzarytovskii
Copy link
Member

vzarytovskii commented Jun 29, 2021

@laenas

@vzarytovskii at a glance that all looks correct from the differentials. Could you elaborate for me - when you say 'removed paths and ranges' - this is because the test stuff now can use the <Expects> element to effectively do the equivalent to a Contains just to check for that known good string?

Yeah, value inside <Expects></Expects> is treated like a regex by FSarpQA perl runner, and gets applied to output (after run or compilation).

I didn't include ranges and filenames in it (although it certainly could be done), the original output was looking like this:

tokenize - got EQUALS @ C:\Users\vlza\code\fsharp\tests\fsharpqa\source\CompilerOptions\fsc\tokenize\tokenize02.fs(5,12)-(5,13)

So, <Expects></Expects> pattern could've looked more like ^tokenize - got EQUALS @ .+tokenize02.fs\(5,12\)-\(5,13\)$, but I decided to keep it simple.

Do you think we should include ranges and filenames into tests? They should be consistent between runs.

@laenas
Copy link
Contributor Author

laenas commented Jun 29, 2021

@vzarytovskii

So, <Expects></Expects> pattern could've looked more like ^tokenize - got EQUALS @ .+tokenize02.fs(5,12)-(5,13)$, but I decided to keep it simple.

That makes a lot of sense - out of curiosity - is that whole behavior something new this year? Because that's pretty much the right balance on this - and seems a lot better than my initial concerns, which were based on how a bunch of the other tests actually do full-compiler-output string matching - which seemed like it was going to be a pain to work with for this use case.

@vzarytovskii
Copy link
Member

@vzarytovskii

So, <Expects></Expects> pattern could've looked more like ^tokenize - got EQUALS @ .+tokenize02.fs(5,12)-(5,13)$, but I decided to keep it simple.

That makes a lot of sense - out of curiosity - is that whole behavior something new this year? Because that's pretty much the right balance on this - and seems a lot better than my initial concerns, which were based on how a bunch of the other tests actually do full-compiler-output string matching - which seemed like it was going to be a pain to work with for this use case.

I don't think it's new, probably only F#QA tests have this (and not CompilerAssert based ones).

@laenas
Copy link
Contributor Author

laenas commented Jun 29, 2021

Aha, that's probably it. Makes tons of sense, thanks for polishing that up and explaining!

@vzarytovskii
Copy link
Member

vzarytovskii commented Jun 29, 2021

Aha, that's probably it. Makes tons of sense, thanks for polishing that up and explaining!

Np, sorry that it took that long, and thanks for implementing it!

@vzarytovskii
Copy link
Member

CI is having a really bad day today :(

@vzarytovskii vzarytovskii merged commit 4157807 into dotnet:main Jun 29, 2021
@laenas laenas deleted the unfiltered-tokenize branch June 29, 2021 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants