-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Fix 11438 and 10807: categorize user defined literal as literal instead of number #4701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix 11438 and 10807: categorize user defined literal as literal instead of number #4701
Conversation
…ad of number There's a big assumption, that if a token starts with a digit but isn't properly parsed as int/float, that it shall be a user defined literal as a fallback
|
I guess the same issue exists in simplecpp in The Cppcheck code also has the same issue as simplecpp when the numeric is prefixed with I wonder if the logic can be de-duplicated somehow. |
| const Token *firstSemiColon = nullptr; | ||
| int comment = 0; | ||
| while (Token::Match(endasm, "%num%|%name%|,|:|;") || (endasm && endasm->linenr() == comment)) { | ||
| while (Token::Match(endasm, "%num%|%name%|,|:|;") || (endasm && endasm->isLiteral()) || (endasm && endasm->linenr() == comment)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unit tests pointed me here: 12h is valid asm (12 in hex), but it is not a valid C++ int. So after my changes, this is seen as a literal. Therefore this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to allow and handle 12h and 101010b until Tokenizer::simplifyAsm() is executed. The checks should not see such tokens at all. My guess is that it doesn't matter if they are number or literal..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like we need more tests for ASM blocks.
test/testgarbage.cpp
Outdated
| "}"); | ||
| } | ||
|
|
||
| void userDefinedLiterals() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've put it in testgarbage.cpp, although it isn't garbage. But this was the one test case which triggered the right checks.
Alternatively, I think I could move the first check to testcondition.cpp (which triggers the right check) and the second case maybe to testtokenizer.cpp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spontanously I feel that both these tests should be in testtokenize.cpp and you could assert that the tokens are literals.
| givenACodeSampleToTokenize nonNumeric("abc", true); | ||
| ASSERT_EQUALS(false, Token::Match(nonNumeric.tokens(), "%num%")); | ||
|
|
||
| givenACodeSampleToTokenize binary("101010b", true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't know in what context 101010b was a valid integer? C? asm? Something else?
Same for 0.0d below.
I've rewritten them to valid C++ integers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 101010b is used in inline assembler.
| tok.str("false"); | ||
| ASSERT(tok.tokType() == Token::eBoolean); | ||
| tok.str("\"foo\"_userDefinedLiteral"); | ||
| ASSERT(tok.tokType() == Token::eOther); // should be eLiteral |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
User defined string literals are still not properly processed, but this seems a lot harder to (properly) accomplish.
In my code base I don't run into issues with strings (I did ran into issues with user defined int literals, now fixed), so left that unchanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use TODO_ASSERT so we know this is not the expected result.
Putting the line you linked to in a function, and using that in cppcheck as well, is of course easy. What are your thoughts about that? I do think it is a bit out of scope for this PR. Back to the point you made about a numeric prefixed with |
I was more thinking along the lines that Cppcheck leverages more of the information available in simplecpp. But I don't have much thoughts about this since I am not familiar with the integration at all.
Correct. I was mainly pointing it out so the "duplicated" code is in sync.
Maybe we should add a comment about that. |
|
I also notes some of the issues with that code here: https://trac.cppcheck.net/ticket/11428#comment:5. |
|
BTW thanks for working on this. I should have since I caused this but I didn't couldn't really figure out how to address that yet. I should have made that more clear in the ticket comments. |
I think that creating a function in simplecpp might be a good idea. Moving MathLib to simplecpp is not wanted as far as I see. |
| const Token *firstSemiColon = nullptr; | ||
| int comment = 0; | ||
| while (Token::Match(endasm, "%num%|%name%|,|:|;") || (endasm && endasm->linenr() == comment)) { | ||
| while (Token::Match(endasm, "%num%|%name%|,|:|;") || (endasm && endasm->isLiteral()) || (endasm && endasm->linenr() == comment)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to allow and handle 12h and 101010b until Tokenizer::simplifyAsm() is executed. The checks should not see such tokens at all. My guess is that it doesn't matter if they are number or literal..
test/testgarbage.cpp
Outdated
| "}"); | ||
| } | ||
|
|
||
| void userDefinedLiterals() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spontanously I feel that both these tests should be in testtokenize.cpp and you could assert that the tokens are literals.
| givenACodeSampleToTokenize nonNumeric("abc", true); | ||
| ASSERT_EQUALS(false, Token::Match(nonNumeric.tokens(), "%num%")); | ||
|
|
||
| givenACodeSampleToTokenize binary("101010b", true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 101010b is used in inline assembler.
| ASSERT_EQUALS(true, Token::Match(floatingPoint.tokens(), "%num%")); | ||
|
|
||
| givenACodeSampleToTokenize doublePrecision("0.0d", true); | ||
| givenACodeSampleToTokenize doublePrecision("0.0", true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not know where 0.0d comes from. I don't want to immediately say if we need that or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to come from this commit: acad87c
And I don't see why that was added it could be by mistake. I guess we can remove 0.0d.
|
I will take another look hopefully tomorrow. BTW this should go in before 2.10 is released since it fixes regressions introduced during the current dev cycle. |
That commit isn't present (yet) in cppcheck? I extracted a function, but didn't see the Furthermore, whats the policy regarding changes in simplecpp? I made a change (share the numeric-logic) in the cppcheck repo, do I need to create a PR to get that updated in the simplecpp-repo as well? |
I would suggest that you update simplecpp repo first. Then we can "bump simplecpp" where we copy the simplecpp code to cppcheck repo. Then you can refactor cppcheck to use the function from simplecpp. |
danmar/simplecpp#285 has not been merged yet, |
It was merged a while ago. |
|
There are some merge conflicts. |
|
Superseded by #5448 |
There's a big assumption, that if a token starts with a digit but isn't properly parsed as int/float, that it shall be a user defined literal.
So the code is not actually matching existing user defined literals, just "if it starts int/float-like, but isn't an actual int/float, assume it is a user defined literal"
Fixes https://trac.cppcheck.net/ticket/11438 and https://trac.cppcheck.net/ticket/10807