Skip to content

Conversation

@gerboengels
Copy link
Contributor

@gerboengels gerboengels commented Jan 9, 2023

There's a big assumption, that if a token starts with a digit but isn't properly parsed as int/float, that it shall be a user defined literal.
So the code is not actually matching existing user defined literals, just "if it starts int/float-like, but isn't an actual int/float, assume it is a user defined literal"

Fixes https://trac.cppcheck.net/ticket/11438 and https://trac.cppcheck.net/ticket/10807

…ad of number

There's a big assumption, that if a token starts with a digit but isn't properly parsed as int/float, that it shall be a user defined literal as a fallback
@firewave
Copy link
Collaborator

firewave commented Jan 9, 2023

I guess the same issue exists in simplecpp in Token::flags().

The Cppcheck code also has the same issue as simplecpp when the numeric is prefixed with + - danmar/simplecpp@acbb847.

I wonder if the logic can be de-duplicated somehow.

const Token *firstSemiColon = nullptr;
int comment = 0;
while (Token::Match(endasm, "%num%|%name%|,|:|;") || (endasm && endasm->linenr() == comment)) {
while (Token::Match(endasm, "%num%|%name%|,|:|;") || (endasm && endasm->isLiteral()) || (endasm && endasm->linenr() == comment)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit tests pointed me here: 12h is valid asm (12 in hex), but it is not a valid C++ int. So after my changes, this is seen as a literal. Therefore this change

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to allow and handle 12h and 101010b until Tokenizer::simplifyAsm() is executed. The checks should not see such tokens at all. My guess is that it doesn't matter if they are number or literal..

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like we need more tests for ASM blocks.

"}");
}

void userDefinedLiterals() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've put it in testgarbage.cpp, although it isn't garbage. But this was the one test case which triggered the right checks.
Alternatively, I think I could move the first check to testcondition.cpp (which triggers the right check) and the second case maybe to testtokenizer.cpp?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spontanously I feel that both these tests should be in testtokenize.cpp and you could assert that the tokens are literals.

givenACodeSampleToTokenize nonNumeric("abc", true);
ASSERT_EQUALS(false, Token::Match(nonNumeric.tokens(), "%num%"));

givenACodeSampleToTokenize binary("101010b", true);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know in what context 101010b was a valid integer? C? asm? Something else?
Same for 0.0d below.

I've rewritten them to valid C++ integers

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 101010b is used in inline assembler.

tok.str("false");
ASSERT(tok.tokType() == Token::eBoolean);
tok.str("\"foo\"_userDefinedLiteral");
ASSERT(tok.tokType() == Token::eOther); // should be eLiteral
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User defined string literals are still not properly processed, but this seems a lot harder to (properly) accomplish.
In my code base I don't run into issues with strings (I did ran into issues with user defined int literals, now fixed), so left that unchanged.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use TODO_ASSERT so we know this is not the expected result.

@gerboengels
Copy link
Contributor Author

The Cppcheck code also has the same issue as simplecpp when the numeric is prefixed with + - danmar/simplecpp@acbb847.

I wonder if the logic can be de-duplicated somehow.

Putting the line you linked to in a function, and using that in cppcheck as well, is of course easy.
Going a step further and using cppcheck's MathLib in simplecpp (and replacing for example simplecpp's stringToLL with MathLib) seems a fair bit more work. Also, it has different performance characteristics: MathLibs isIntHex, for example, checks the whole string. simplecpp's isHex just checks the first two characters.

What are your thoughts about that?

I do think it is a bit out of scope for this PR.

Back to the point you made about a numeric prefixed with +: from what my debugger tells me: when you have -100, TokenList first creates 2 tokens ("-" and "100"), and then simplifyTokens1 merges those tokens to "-100", needing to parse the minus-sign.
In the case of "+100", first 2 tokens are created (as above), and then the "+"-token is simply deleted.
So from my basic observation, cppcheck doesn't need special parsing code for numerics prefixed with + in Token::update_property_info, as that case doesn't happen

@firewave
Copy link
Collaborator

The Cppcheck code also has the same issue as simplecpp when the numeric is prefixed with + - danmar/simplecpp@acbb847.
I wonder if the logic can be de-duplicated somehow.

Putting the line you linked to in a function, and using that in cppcheck as well, is of course easy. Going a step further and using cppcheck's MathLib in simplecpp (and replacing for example simplecpp's stringToLL with MathLib) seems a fair bit more work. Also, it has different performance characteristics: MathLibs isIntHex, for example, checks the whole string. simplecpp's isHex just checks the first two characters.

What are your thoughts about that?

I was more thinking along the lines that Cppcheck leverages more of the information available in simplecpp.

But I don't have much thoughts about this since I am not familiar with the integration at all.

I do think it is a bit out of scope for this PR.

Correct. I was mainly pointing it out so the "duplicated" code is in sync.

Back to the point you made about a numeric prefixed with +: from what my debugger tells me: when you have -100, TokenList first creates 2 tokens ("-" and "100"), and then simplifyTokens1 merges those tokens to "-100", needing to parse the minus-sign. In the case of "+100", first 2 tokens are created (as above), and then the "+"-token is simply deleted. So from my basic observation, cppcheck doesn't need special parsing code for numerics prefixed with + in Token::update_property_info, as that case doesn't happen

Maybe we should add a comment about that.

@firewave
Copy link
Collaborator

I also notes some of the issues with that code here: https://trac.cppcheck.net/ticket/11428#comment:5.

@firewave
Copy link
Collaborator

BTW thanks for working on this. I should have since I caused this but I didn't couldn't really figure out how to address that yet. I should have made that more clear in the ticket comments.

@danmar
Copy link
Owner

danmar commented Jan 15, 2023

Putting the line you linked to in a function, and using that in cppcheck as well, is of course easy.
Going a step further and using cppcheck's MathLib in simplecpp (and replacing for example simplecpp's stringToLL with MathLib) seems a fair bit more work.

I think that creating a function in simplecpp might be a good idea.

Moving MathLib to simplecpp is not wanted as far as I see.

const Token *firstSemiColon = nullptr;
int comment = 0;
while (Token::Match(endasm, "%num%|%name%|,|:|;") || (endasm && endasm->linenr() == comment)) {
while (Token::Match(endasm, "%num%|%name%|,|:|;") || (endasm && endasm->isLiteral()) || (endasm && endasm->linenr() == comment)) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to allow and handle 12h and 101010b until Tokenizer::simplifyAsm() is executed. The checks should not see such tokens at all. My guess is that it doesn't matter if they are number or literal..

"}");
}

void userDefinedLiterals() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spontanously I feel that both these tests should be in testtokenize.cpp and you could assert that the tokens are literals.

givenACodeSampleToTokenize nonNumeric("abc", true);
ASSERT_EQUALS(false, Token::Match(nonNumeric.tokens(), "%num%"));

givenACodeSampleToTokenize binary("101010b", true);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 101010b is used in inline assembler.

ASSERT_EQUALS(true, Token::Match(floatingPoint.tokens(), "%num%"));

givenACodeSampleToTokenize doublePrecision("0.0d", true);
givenACodeSampleToTokenize doublePrecision("0.0", true);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know where 0.0d comes from. I don't want to immediately say if we need that or not.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to come from this commit: acad87c

And I don't see why that was added it could be by mistake. I guess we can remove 0.0d.

@firewave
Copy link
Collaborator

I will take another look hopefully tomorrow.

BTW this should go in before 2.10 is released since it fixes regressions introduced during the current dev cycle.

@gerboengels
Copy link
Contributor Author

The Cppcheck code also has the same issue as simplecpp when the numeric is prefixed with + - danmar/simplecpp@acbb847.

That commit isn't present (yet) in cppcheck? I extracted a function, but didn't see the +.

Furthermore, whats the policy regarding changes in simplecpp? I made a change (share the numeric-logic) in the cppcheck repo, do I need to create a PR to get that updated in the simplecpp-repo as well?

@danmar
Copy link
Owner

danmar commented Jan 26, 2023

Furthermore, whats the policy regarding changes in simplecpp? I made a change (share the numeric-logic) in the cppcheck repo, do I need to create a PR to get that updated in the simplecpp-repo as well?

I would suggest that you update simplecpp repo first. Then we can "bump simplecpp" where we copy the simplecpp code to cppcheck repo. Then you can refactor cppcheck to use the function from simplecpp.

@firewave
Copy link
Collaborator

That commit isn't present (yet) in cppcheck? I extracted a function, but didn't see the +.

danmar/simplecpp#285 has not been merged yet,

@firewave
Copy link
Collaborator

danmar/simplecpp#285 has not been merged yet,

It was merged a while ago.

@chrchr-github
Copy link
Collaborator

There are some merge conflicts.

chrchr-github added a commit that referenced this pull request Sep 15, 2023
Based on #4701, #5418

A helper function for the `isdigit()` test should be introduced on the
simplecpp side.

Co-authored-by: gerboengels <github@gerbo.org>
@chrchr-github
Copy link
Collaborator

Superseded by #5448

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants