Skip to content

Conversation

@Tal500
Copy link
Contributor

@Tal500 Tal500 commented Jun 15, 2025

The specification from danmar/cppcheck#7554 (comment) :

A path (both for (1) top-level source files and (2) header files - i.e. files that are introduced by #include) will be absolute if and only if at least one of the condition is true:

  1. The top-level source file (for headers it is the one that includes it) is given as an input file to simplecpp in its absolute form, except for the case that this is a header file that its inclusion was resolved by inclusion dirs (i.e. -I) - and this condition is false in that latter case (note that sometimes condition 3 can be true in that case, resulting in absolute flavor).
  2. For header files only that are not system includes(i.e. #include "..."): The include path is absolute.
  3. For header files that were resolved by inclusion dirs (i.e. -I): The resolved inclusion dir is absolute.

Whenever the same file is introduced twice or more, only the first occurrence that simplecpp sees will determine the absoluteness flavor.

@danmar
Copy link
Owner

danmar commented Jun 20, 2025

Thanks! Let's try this..

@danmar danmar merged commit c62239e into danmar:master Jun 20, 2025
16 checks passed
@danmar
Copy link
Owner

danmar commented Jun 20, 2025

@Tal500 This worked fine. However it did not compile when I merged to master :-(

daniel@laptop:~/simplecpp$ make -j12
g++ -Wall -Wextra -pedantic -Wcast-qual -Wfloat-equal -Wmissing-declarations -Wmissing-format-attribute -Wredundant-decls -Wundef -Wno-multichar -Wold-style-cast -std=c++0x -g -c simplecpp.cpp
g++ -Wall -Wextra -pedantic -Wcast-qual -Wfloat-equal -Wmissing-declarations -Wmissing-format-attribute -Wredundant-decls -Wundef -Wno-multichar -Wold-style-cast -std=c++0x -g -c main.cpp
simplecpp.cpp: In function ‘std::string getFileIdPath(const std::map<std::__cxx11::basic_string<char>, simplecpp::TokenList*>&, const std::string&, const std::string&, const simplecpp::DUI&, bool)’:
simplecpp.cpp:3283:23: error: ‘relativeOrAbsoluteFilename’ was not declared in this scope
 3283 |         openHeader(f, relativeOrAbsoluteFilename);
      |                       ^~~~~~~~~~~~~~~~~~~~~~~~~~

@danmar
Copy link
Owner

danmar commented Jun 20, 2025

@Tal500 I am trying to fix it.

@Tal500
Copy link
Contributor Author

Tal500 commented Jun 20, 2025

@Tal500 This worked fine. However it did not compile when I merged to master :-(

daniel@laptop:~/simplecpp$ make -j12
g++ -Wall -Wextra -pedantic -Wcast-qual -Wfloat-equal -Wmissing-declarations -Wmissing-format-attribute -Wredundant-decls -Wundef -Wno-multichar -Wold-style-cast -std=c++0x -g -c simplecpp.cpp
g++ -Wall -Wextra -pedantic -Wcast-qual -Wfloat-equal -Wmissing-declarations -Wmissing-format-attribute -Wredundant-decls -Wundef -Wno-multichar -Wold-style-cast -std=c++0x -g -c main.cpp
simplecpp.cpp: In function ‘std::string getFileIdPath(const std::map<std::__cxx11::basic_string<char>, simplecpp::TokenList*>&, const std::string&, const std::string&, const simplecpp::DUI&, bool)’:
simplecpp.cpp:3283:23: error: ‘relativeOrAbsoluteFilename’ was not declared in this scope
 3283 |         openHeader(f, relativeOrAbsoluteFilename);
      |                       ^~~~~~~~~~~~~~~~~~~~~~~~~~

See my comment in 8e5c5f4#r160442750

@danmar
Copy link
Owner

danmar commented Jun 20, 2025

@Tal500 I will let you take a proper look at this. As far as I see the new test does fail without the glankk fix.

Also.. we have performance issues in simplecpp in windows. and glankk has been able to identify that the realFileName is causing it. We discussed yesterday and glankk suggest to use some "file id" instead of "file name" to distinguish files. I think glankk can explain it better in next week than I can.. but I would appreciate feedback from you about this.

@Tal500
Copy link
Contributor Author

Tal500 commented Jun 20, 2025

@Tal500 I am trying to fix it.

The fix is clear - please rename the variable of the @glankk commit from the "old" relativeOrAbsoluteFilename to the updated absoluteFilename var that this PR does, and the code shall compile (this PR intentionally converted the intermediate result to be always absolute).

@Tal500
Copy link
Contributor Author

Tal500 commented Jun 20, 2025

@Tal500 I will let you take a proper look at this. As far as I see the new test does fail without the glankk fix.

Also.. we have performance issues in simplecpp in windows. and glankk has been able to identify that the realFileName is causing it. We discussed yesterday and glankk suggest to use some "file id" instead of "file name" to distinguish files. I think glankk can explain it better in next week than I can.. but I would appreciate feedback from you about this.

My obvious guess is that the performance bottleneck is because of the lock (MyLock). To test if my suggestion is correct, you can omit the lock and measure the performance on a single threaded execution.

As far as I can see, you have two options to reduce the intense usage of a lock:

  1. (the simple one) Use the "file exist cache" per thread (i.e. executor, it is the DUI parameter if I understand correctly?). If two threads check with the OS if the file exists, nothing happen.
  2. A more complex mechanism to handle atomical (unordered possibly) map, maybe with per thread reflected cache. But I believe this way is complicated and we wouldn't get meaningful performance gain from that, comparing to 1.

BTW: My guess is that you can improve performance if you'd use std::unordered_map in the condition that the user cpp version is high enough (C++ 11 right?)

BTW2: Isn't there any API in windows to check if a file exist? It might be faster than actually opening the file.
We can also significantly improve performance potentially and simplify the code, if for both unix and windows we will have a cache of "visited files", which map string(file path) to boolean, specifying if the path exist in the OS or not. If a file path is not in this cached (unordered) map, then we need to query the OS to check if it exist. (Right now there is a cache of only "non existing files", and only for windows, but it can be improved as described here)

I do not understand how file "id" will give better performance than file "name"/"path".

@glankk
Copy link
Contributor

glankk commented Jun 20, 2025

Currently realFilename is necessary for #pragma once to work correctly on case insensitive file systems (i.e. windows). It's an expensive operation (even without locks) that is called very often, but is really only necessary for very few cases (it's rare that a file is included multiple times with different capitalization). For the vast majority of cases, simplifyPath (sans realFilename) is sufficient to detect path name collisions.
My suggestion is that simplifyPath be used optimistically, and only when that fails do we resort to some slower collision detection mechanism. I've done benchmarks with this change on windows with pretty good results, although with an admittedly contrived test case.

I suggest using file id's instead of path name matching for the slow execution path because it only requires three kernel calls per file (which can maybe be reduced to one with some refactoring), instead of two per path name component plus string operations, which must then also be done for both the relative and absolute case.
Maybe more importantly though, it allows #pragma once to work correctly with symlinks and case insensitive systems mounted on linux, which matches the behavior of gcc and clang. In general I think it would make the behavior more correct and uniform across platforms.

@Tal500
Copy link
Contributor Author

Tal500 commented Jun 20, 2025

Currently realFilename is necessary for #pragma once to work correctly on case insensitive file systems (i.e. windows). It's an expensive operation (even without locks) that is called very often, but is really only necessary for very few cases (it's rare that a file is included multiple times with different capitalization). For the vast majority of cases, simplifyPath (sans realFilename) is sufficient to detect path name collisions. My suggestion is that simplifyPath be used optimistically, and only when that fails do we resort to some slower collision detection mechanism. I've done benchmarks with this change on windows with pretty good results, although with an admittedly contrived test case.

I suggest using file id's instead of path name matching for the slow execution path because it only requires three kernel calls per file (which can maybe be reduced to one with some refactoring), instead of two per path name component plus string operations, which must then also be done for both the relative and absolute case. Maybe more importantly though, it allows #pragma once to work correctly with symlinks and case insensitive systems mounted on linux, which matches the behavior of gcc and clang. In general I think it would make the behavior more correct and uniform across platforms.

I'm not sure what are the exact behaviour for different paths to the same file, i.e. for symbolic links. According to this blog, symbolic link is consider to be a different file in pragma once for both msvc and gcc. (didn't run a full test)

@danmar
Copy link
Owner

danmar commented Jun 20, 2025

According to this blog, symbolic link is consider to be a different file in pragma once for both msvc and gcc.

ok. My guess is that this has not been an active decision then. The blog says it's a caveat not a feature.

In general I think it would make the behavior more correct and uniform across platforms.

This is the feeling I have actually.

Maybe more importantly though, it allows #pragma once to work correctly with symlinks and case insensitive systems mounted on linux, which matches the behavior of gcc and clang. In general I think it would make the behavior more correct and uniform across platforms.

I am basically interested to see if file ids will provide a speedup and simplify the realFileName code. Therefore I suggest that @glankk makes some proof of concept. I do hope you @Tal500 will review and see if there are problems..

My obvious guess is that the performance bottleneck is because of the lock (MyLock). To test if my suggestion is correct, you can omit the lock and measure the performance on a single threaded execution.

This is something we can also test.

@Tal500
Copy link
Contributor Author

Tal500 commented Jun 21, 2025

According to this blog, symbolic link is consider to be a different file in pragma once for both msvc and gcc.

ok. My guess is that this has not been an active decision then. The blog says it's a caveat not a feature.

I shall say that according to cppreference, #pragma once is implementation defined behaviour, meaning we need to test the output of compilers in Windows on symlinks, and case insensitive paths, assuming we want to mimic the majority compilers behaviour.

@danmar
Copy link
Owner

danmar commented Jun 22, 2025

I shall say that according to cppreference, #pragma once is implementation defined behaviour, meaning we need to test the output of compilers in Windows on symlinks, and case insensitive paths, assuming we want to mimic the majority compilers behaviour.

I don't know yet what the performance difference is.

If it's not much faster then I think let's drop file-id and keep existing filepaths code.

If file-id would be much faster then:

  • either we replace the code and start using file-id instead
  • if compatibility is needed then let's add a flag

There is a performance issue with simplecpp when it's executed in windows. I don't think we can solve it but I would be very interested in a improvement. For information, a customer has a project with thousands of source files and it takes several minutes just to preprocess many of those (per file).

assuming we want to mimic the majority compilers behaviour.

Well basically I want that Cppcheck is able to scan code from a wide range of compilers.

Tweaks to handle the major compilers better is fine. I.e. handling GCC extensions are fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants