Fix 11438 and 10807: categorize user defined literal as literal instead of number #4701

gerboengels · 2023-01-09T21:52:13Z

There's a big assumption, that if a token starts with a digit but isn't properly parsed as int/float, that it shall be a user defined literal.
So the code is not actually matching existing user defined literals, just "if it starts int/float-like, but isn't an actual int/float, assume it is a user defined literal"

Fixes https://trac.cppcheck.net/ticket/11438 and https://trac.cppcheck.net/ticket/10807

…ad of number There's a big assumption, that if a token starts with a digit but isn't properly parsed as int/float, that it shall be a user defined literal as a fallback

firewave · 2023-01-09T22:05:24Z

I guess the same issue exists in simplecpp in Token::flags().

The Cppcheck code also has the same issue as simplecpp when the numeric is prefixed with + - danmar/simplecpp@acbb847.

I wonder if the logic can be de-duplicated somehow.

gerboengels · 2023-01-09T21:53:47Z

lib/tokenize.cpp

            const Token *firstSemiColon = nullptr;
            int comment = 0;
-            while (Token::Match(endasm, "%num%|%name%|,|:|;") || (endasm && endasm->linenr() == comment)) {
+            while (Token::Match(endasm, "%num%|%name%|,|:|;") || (endasm && endasm->isLiteral()) || (endasm && endasm->linenr() == comment)) {


Unit tests pointed me here: 12h is valid asm (12 in hex), but it is not a valid C++ int. So after my changes, this is seen as a literal. Therefore this change

We need to allow and handle 12h and 101010b until Tokenizer::simplifyAsm() is executed. The checks should not see such tokens at all. My guess is that it doesn't matter if they are number or literal..

That sounds like we need more tests for ASM blocks.

gerboengels · 2023-01-09T21:56:04Z

test/testgarbage.cpp

            "}");
    }
+
+    void userDefinedLiterals() {


I've put it in testgarbage.cpp, although it isn't garbage. But this was the one test case which triggered the right checks.
Alternatively, I think I could move the first check to testcondition.cpp (which triggers the right check) and the second case maybe to testtokenizer.cpp?

spontanously I feel that both these tests should be in testtokenize.cpp and you could assert that the tokens are literals.

gerboengels · 2023-01-09T21:57:20Z

test/testtoken.cpp

        givenACodeSampleToTokenize nonNumeric("abc", true);
        ASSERT_EQUALS(false, Token::Match(nonNumeric.tokens(), "%num%"));

-        givenACodeSampleToTokenize binary("101010b", true);


Don't know in what context 101010b was a valid integer? C? asm? Something else?
Same for 0.0d below.

I've rewritten them to valid C++ integers

I think 101010b is used in inline assembler.

gerboengels · 2023-01-09T22:02:00Z

test/testtoken.cpp

        tok.str("false");
        ASSERT(tok.tokType() == Token::eBoolean);
+        tok.str("\"foo\"_userDefinedLiteral");
+        ASSERT(tok.tokType() == Token::eOther); // should be eLiteral


User defined string literals are still not properly processed, but this seems a lot harder to (properly) accomplish.
In my code base I don't run into issues with strings (I did ran into issues with user defined int literals, now fixed), so left that unchanged.

Please use TODO_ASSERT so we know this is not the expected result.

gerboengels · 2023-01-11T17:19:53Z

The Cppcheck code also has the same issue as simplecpp when the numeric is prefixed with + - danmar/simplecpp@acbb847.

I wonder if the logic can be de-duplicated somehow.

Putting the line you linked to in a function, and using that in cppcheck as well, is of course easy.
Going a step further and using cppcheck's MathLib in simplecpp (and replacing for example simplecpp's stringToLL with MathLib) seems a fair bit more work. Also, it has different performance characteristics: MathLibs isIntHex, for example, checks the whole string. simplecpp's isHex just checks the first two characters.

What are your thoughts about that?

I do think it is a bit out of scope for this PR.

Back to the point you made about a numeric prefixed with +: from what my debugger tells me: when you have -100, TokenList first creates 2 tokens ("-" and "100"), and then simplifyTokens1 merges those tokens to "-100", needing to parse the minus-sign.
In the case of "+100", first 2 tokens are created (as above), and then the "+"-token is simply deleted.
So from my basic observation, cppcheck doesn't need special parsing code for numerics prefixed with + in Token::update_property_info, as that case doesn't happen

firewave · 2023-01-11T19:16:18Z

The Cppcheck code also has the same issue as simplecpp when the numeric is prefixed with + - danmar/simplecpp@acbb847.
I wonder if the logic can be de-duplicated somehow.

Putting the line you linked to in a function, and using that in cppcheck as well, is of course easy. Going a step further and using cppcheck's MathLib in simplecpp (and replacing for example simplecpp's stringToLL with MathLib) seems a fair bit more work. Also, it has different performance characteristics: MathLibs isIntHex, for example, checks the whole string. simplecpp's isHex just checks the first two characters.

What are your thoughts about that?

I was more thinking along the lines that Cppcheck leverages more of the information available in simplecpp.

But I don't have much thoughts about this since I am not familiar with the integration at all.

I do think it is a bit out of scope for this PR.

Correct. I was mainly pointing it out so the "duplicated" code is in sync.

Back to the point you made about a numeric prefixed with +: from what my debugger tells me: when you have -100, TokenList first creates 2 tokens ("-" and "100"), and then simplifyTokens1 merges those tokens to "-100", needing to parse the minus-sign. In the case of "+100", first 2 tokens are created (as above), and then the "+"-token is simply deleted. So from my basic observation, cppcheck doesn't need special parsing code for numerics prefixed with + in Token::update_property_info, as that case doesn't happen

Maybe we should add a comment about that.

firewave · 2023-01-12T12:32:26Z

I also notes some of the issues with that code here: https://trac.cppcheck.net/ticket/11428#comment:5.

firewave · 2023-01-14T21:10:19Z

BTW thanks for working on this. I should have since I caused this but I didn't couldn't really figure out how to address that yet. I should have made that more clear in the ticket comments.

danmar · 2023-01-15T19:22:13Z

Putting the line you linked to in a function, and using that in cppcheck as well, is of course easy.
Going a step further and using cppcheck's MathLib in simplecpp (and replacing for example simplecpp's stringToLL with MathLib) seems a fair bit more work.

I think that creating a function in simplecpp might be a good idea.

Moving MathLib to simplecpp is not wanted as far as I see.

danmar · 2023-01-15T19:27:54Z

lib/tokenize.cpp

            const Token *firstSemiColon = nullptr;
            int comment = 0;
-            while (Token::Match(endasm, "%num%|%name%|,|:|;") || (endasm && endasm->linenr() == comment)) {
+            while (Token::Match(endasm, "%num%|%name%|,|:|;") || (endasm && endasm->isLiteral()) || (endasm && endasm->linenr() == comment)) {


We need to allow and handle 12h and 101010b until Tokenizer::simplifyAsm() is executed. The checks should not see such tokens at all. My guess is that it doesn't matter if they are number or literal..

danmar · 2023-01-15T19:36:55Z

test/testgarbage.cpp

            "}");
    }
+
+    void userDefinedLiterals() {


spontanously I feel that both these tests should be in testtokenize.cpp and you could assert that the tokens are literals.

danmar · 2023-01-15T19:37:22Z

test/testtoken.cpp

        givenACodeSampleToTokenize nonNumeric("abc", true);
        ASSERT_EQUALS(false, Token::Match(nonNumeric.tokens(), "%num%"));

-        givenACodeSampleToTokenize binary("101010b", true);


I think 101010b is used in inline assembler.

danmar · 2023-01-15T19:40:39Z

test/testtoken.cpp

        ASSERT_EQUALS(true, Token::Match(floatingPoint.tokens(), "%num%"));

-        givenACodeSampleToTokenize doublePrecision("0.0d", true);
+        givenACodeSampleToTokenize doublePrecision("0.0", true);


I do not know where 0.0d comes from. I don't want to immediately say if we need that or not.

It seems to come from this commit: acad87c

And I don't see why that was added it could be by mistake. I guess we can remove 0.0d.

firewave · 2023-01-21T15:54:37Z

I will take another look hopefully tomorrow.

BTW this should go in before 2.10 is released since it fixes regressions introduced during the current dev cycle.

gerboengels · 2023-01-21T23:04:32Z

The Cppcheck code also has the same issue as simplecpp when the numeric is prefixed with + - danmar/simplecpp@acbb847.

That commit isn't present (yet) in cppcheck? I extracted a function, but didn't see the +.

Furthermore, whats the policy regarding changes in simplecpp? I made a change (share the numeric-logic) in the cppcheck repo, do I need to create a PR to get that updated in the simplecpp-repo as well?

danmar · 2023-01-26T19:17:20Z

Furthermore, whats the policy regarding changes in simplecpp? I made a change (share the numeric-logic) in the cppcheck repo, do I need to create a PR to get that updated in the simplecpp-repo as well?

I would suggest that you update simplecpp repo first. Then we can "bump simplecpp" where we copy the simplecpp code to cppcheck repo. Then you can refactor cppcheck to use the function from simplecpp.

firewave · 2023-01-28T11:25:57Z

That commit isn't present (yet) in cppcheck? I extracted a function, but didn't see the +.

danmar/simplecpp#285 has not been merged yet,

firewave · 2023-02-21T17:43:42Z

danmar/simplecpp#285 has not been merged yet,

It was merged a while ago.

chrchr-github · 2023-03-17T13:06:42Z

There are some merge conflicts.

Based on #4701, #5418 A helper function for the `isdigit()` test should be introduced on the simplecpp side. Co-authored-by: gerboengels <github@gerbo.org>

chrchr-github · 2023-09-15T15:21:00Z

Superseded by #5448

Fix 11438 and 10807: categorize user defined literal as literal inste…

4986637

…ad of number There's a big assumption, that if a token starts with a digit but isn't properly parsed as int/float, that it shall be a user defined literal as a fallback

gerboengels commented Jan 11, 2023

View reviewed changes

danmar reviewed Jan 15, 2023

View reviewed changes

gerboe-of added 3 commits January 21, 2023 22:48

Move tests to testtokenize

69b786d

Add test for asm binary token

3b8c088

Extract function with shared logic

2eac468

Update Makefile for new include dependency

6bd09b5

firewave mentioned this pull request Sep 11, 2023

Fix #11438 MathLib error on user defined literals #5418

Closed

chrchr-github mentioned this pull request Sep 13, 2023

Fix #11438 MathLib error on user defined literals #5448

Merged

chrchr-github closed this Sep 15, 2023

Fix 11438 and 10807: categorize user defined literal as literal instead of number #4701

Fix 11438 and 10807: categorize user defined literal as literal instead of number #4701

Uh oh!

Conversation

gerboengels commented Jan 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

firewave commented Jan 9, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gerboengels commented Jan 11, 2023

Uh oh!

firewave commented Jan 11, 2023

Uh oh!

firewave commented Jan 12, 2023

Uh oh!

firewave commented Jan 14, 2023

Uh oh!

danmar commented Jan 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

firewave commented Jan 21, 2023

Uh oh!

gerboengels commented Jan 21, 2023

Uh oh!

danmar commented Jan 26, 2023

Uh oh!

firewave commented Jan 28, 2023

Uh oh!

firewave commented Feb 21, 2023

Uh oh!

chrchr-github commented Mar 17, 2023

Uh oh!

chrchr-github commented Sep 15, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

gerboengels commented Jan 9, 2023 •

edited

Loading

danmar commented Jan 15, 2023 •

edited

Loading