Skip to content

Don't log linking build actions by default.#1450

Merged
dkrupp merged 1 commit intoEricsson:masterfrom
bruntib:keep_link
Mar 19, 2018
Merged

Don't log linking build actions by default.#1450
dkrupp merged 1 commit intoEricsson:masterfrom
bruntib:keep_link

Conversation

@bruntib
Copy link
Copy Markdown
Contributor

@bruntib bruntib commented Mar 12, 2018

The logger can be given a flag if the user wants to keep the linking
actions in the compilation database.

@bruntib bruntib added enhancement 🌟 CLI 💻 Related to the command-line interface, such as the cmd, store, etc. commands labels Mar 12, 2018
return 0;
}

int isObjectFile(const char* filename)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameter names should end with underscore: filename_

{
size_t i;
for (i = 0; i < vec_->size; ++i)
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should remove the blocks from here and below.

LoggerCmpFuc cmp_);

/**
* Funds an elemnt using the given predicate fuction.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finds, element

*
* @param vec_ a vector struct (must be not NULL).
* @param pred_ a predicate function.
* @return SIZE_MAX if the element not found otherwise the item`s index
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mark the end of the sentence with dot: ..

if 'CC_LOGGER_GCC_LIKE' not in log_env:
log_env['CC_LOGGER_GCC_LIKE'] = 'gcc:g++:clang:clang++:cc:c++'
if keep_link or ('CC_LOGGER_KEEP_LINK' in log_env and
log_env['CC_LOGGER_KEEP_LINK'] == 'true'):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to check this condition. If it is true we set it to true?

@whisperity
Copy link
Copy Markdown
Contributor

How is executing CodeChecker log --build "g++ a.cpp" handled now? Will there be a compilation action in the JSON, and if keep linkage is set, two commands?

Copy link
Copy Markdown
Contributor

@gyorb gyorb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the feature based on the discussion. In the issue.

@gyorb
Copy link
Copy Markdown
Contributor

gyorb commented Mar 12, 2018

@whisperity the idea is that two separate json files will be generated (compilation, linking). If we can not decide right now if it is a linking only action I would still add it to the compilation actions, so we wont loose files which should be analyzed.

@bruntib
Copy link
Copy Markdown
Contributor Author

bruntib commented Mar 16, 2018

I'm in trouble a bit how this should be implemented. The goal was to omit linking actions from the compilation database, because these are

  1. not used by the checkers
  2. resulting too big files (>2GB in some cases) which can't be read by the Python scripts,
  3. not part of the compilation database format (I don't remember who mentioned this and what did he base it on).

On the other hand if we want to save them then the decision was to place them in another file (#1436). This doesn't fit in the current implementation of the build logger, since it is a general C/C++/Java logger, the log collector part can't distinguish between compilation and linking. And the linking actions can't be skipped in CodeChecker, because it's too late (see item 2.)
So my questions are: do we really need linking actions and if yes, then should this be implemented in the logger somehow?

@whisperity
Copy link
Copy Markdown
Contributor

In case CTU ever attempts to go beyond considering an ambiguous function call due to multiple implementations in the mangled name map as unknown, in some way, linkage information will be needed, just as how the clustering module in CodeCompass also requires them.

Until then, it looks safe enough to omit, as nothing currently understands them in any useful manner.

@whisperity
Copy link
Copy Markdown
Contributor

If I am taking it correctly the problem with too big JSONs is that we first load the entire JSON array into the memory and then try to iterate on it.

Could this be changed into a more dynamic, stream like approach, where the JSON is processed without it being fully loaded at any given time, and thus we can manually ignore link commands and only store build actions in the memory that are relevant? (Or perhaps store even less in memory!)

@dkrupp
Copy link
Copy Markdown
Member

dkrupp commented Mar 17, 2018 via email

@whisperity
Copy link
Copy Markdown
Contributor

@dkrupp It is not always trivial if a compile command is a simple compile command or also a link command. You are in luck with CMake as it, in most cases, generates separate compile and link commands (then again, CMake can also natively export a CCDB, so there is no reason to log the build at all), but older/custom/different systems can end up having their build logged like this:

/usr/bin/g++-5 /home/whisperity/Dummy/main.cpp

This is a compile and a link command. So the logger tool must either be able to see the ld (gold? lld?) calls "inside", or manually parse the arguments and try to figure out if it is a compilation only or a link only or a "both" command.

@Xazax-hun
Copy link
Copy Markdown
Contributor

In case we are not afraid of adding additional dependencies, there are som libraries for lazy json parsing which we could try:
https://github.com/fenhl/lazyjson
https://pypi.python.org/pypi/ijson/

I think it is a good idea to both remove linking commands and make handling of potentially large files lazy.

@bruntib
Copy link
Copy Markdown
Contributor Author

bruntib commented Mar 18, 2018

My problem was not related to distinction of linking and compilation commands. The problem is that the only .json output file has to be given to the CodeChecker log command by -o flag. So one option would be to introduce another output flag for linking actions, or another is using a "link_" prefix which is prepended to the output file name. Any option we choose, the current build logger implementation doesn't support several output files based on such a language specific build actions like compilation/linking, since the build logger logs Java actions too.

So for now I extend the build logger so it can be able to write two output files. The details can be specified later.

@whisperity
Copy link
Copy Markdown
Contributor

For now we can always say that in case of Java, the link command list will be empty, because Java always does dynamic linking and classloads. 😬 (Not that anyone would actively be using the ld-logger anyways.)

The logger can be given a flag if the user wants to keep the linking
actions in the compilation database.
Copy link
Copy Markdown
Member

@dkrupp dkrupp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified on xerces with or without logging linking.
compile_commands_with_linking.json 8.7MB
compile_commands_no_linking.json 447KB
impressive decrease!

Number of build actions analyzed 631 in both cases.
Looks good to me.

@dkrupp dkrupp merged commit dc41101 into Ericsson:master Mar 19, 2018
@bruntib bruntib deleted the keep_link branch March 22, 2018 11:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLI 💻 Related to the command-line interface, such as the cmd, store, etc. commands enhancement 🌟

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants