Skip to content

Refactor logging macros#382

Merged
timmoon10 merged 12 commits intoNVIDIA:mainfrom
timmoon10:logging-refactor
Oct 24, 2023
Merged

Refactor logging macros#382
timmoon10 merged 12 commits intoNVIDIA:mainfrom
timmoon10:logging-refactor

Conversation

@timmoon10
Copy link
Collaborator

@timmoon10 timmoon10 commented Aug 16, 2023

Changes:

  • Removes the logging macros from the installed headers. The installed headers are now pure C. There are now separate headers in common and each of the frameworks (I'm open to suggestion on this design).
  • Reimplements the logging macros in common to use the string utility functions and to improve the error messages (see check_cublas file and line is not helpful #376).
  • Follow the Google style guide for header includes in the files that I touched.

The refactored logging macros are backward compatible, but it's now easier to make descriptive error messages. For example, consider:

NVTE_CHECK(input.data.shape.size() == 2, "Input must have 2 dimensions.");

We can now do something like:

NVTE_CHECK(input.data.shape.size() == 2,
           "Input must have 2 dimensions, ",
           "but found ", input.data.shape.size(), ".");

Closes #376.

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
@timmoon10 timmoon10 added enhancement New feature or request bug Something isn't working labels Aug 16, 2023
@timmoon10
Copy link
Collaborator Author

/te-ci

@timmoon10
Copy link
Collaborator Author

/te-ci

@ksivaman
Copy link
Member

We would have to also change to the correct includes for the cpptests

Use Google style for header includes.

Signed-off-by: Tim Moon <tmoon@nvidia.com>
@timmoon10
Copy link
Collaborator Author

/te-ci

timmoon10 and others added 4 commits August 18, 2023 14:58
Incorporating changes from NVIDIA#389.

Co-authored-by: Tim Moon <tmoon@nvidia.com>
Co-authored-by: Jan Bielak <jbielak@nvidia.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Hack to get around macro redefinition warning.

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
@timmoon10
Copy link
Collaborator Author

/te-ci

@ksivaman
Copy link
Member

Why not reuse logging across the FWs?

@ksivaman ksivaman self-requested a review August 19, 2023 00:18
@timmoon10
Copy link
Collaborator Author

timmoon10 commented Aug 19, 2023

As I see it, common has three kinds of headers:

  1. Public C API headers (common/include/transformer_engine/transformer_engine.h)
  2. C++ headers that are shared between frameworks (common/include/transformer_engine/logging.h)
  3. C++ headers for internal implementation (common/util/vectorized_pointwise.h)

Currently (1) and (2) are combined together in common/include/transformer_engine. I'd prefer to avoid the frameworks accessing (3) as much as possible (although there is leakage in PyTorch due to the attention infrastructure). It would also be better if we didn't expose (2) to external users.

This PR makes logging.h depend on common/util/string.h. I see a few solutions:

  1. Move framework-common header to something like common/include/transformer_engine_utils, which is exposed to the frameworks but not installed with the C API headers.
  2. Remove logging.h from the installed headers and reimplement in frameworks. This gets rid of the framework-common headers.
  3. Move string.h to common/include/transformer_engine.
  4. Expose implementation headers in frameworks.

I think (1) is the right answer, but it's more complicated. (2) and (3) are simple, but have some iffiness. I don't like (4).

EDIT: Looking at #393, (4) doesn't seem as bad. The CUDA utilities are sufficiently complicated to make reimplementing a bad idea. I think the main difference is how the headers are called. #include "common/util/logging.h" is clearer than #include "util/logging.h".

@timmoon10
Copy link
Collaborator Author

/te-ci

@timmoon10 timmoon10 marked this pull request as ready for review September 5, 2023 23:34
@ptrendx
Copy link
Member

ptrendx commented Oct 3, 2023

I would probably prefer changing current common directory to core or libtransformer_engine and have common directory as an actual source for common code across both core and the FW-dependent code, like the logging.

@timmoon10
Copy link
Collaborator Author

timmoon10 commented Oct 6, 2023

I would probably prefer changing current common directory to core or libtransformer_engine and have common directory as an actual source for common code across both core and the FW-dependent code, like the logging.

Should we handle that now or in a future PR? That would significantly increase the scope of this refactor since we would need to change the build system to build the "common" library separately from the "core" library. I think there's value in improving the logging quickly.

Signed-off-by: Tim Moon <tmoon@nvidia.com>
@timmoon10 timmoon10 requested a review from ptrendx October 6, 2023 22:37
@timmoon10
Copy link
Collaborator Author

/te-ci

@timmoon10
Copy link
Collaborator Author

/te-ci

Copy link
Member

@ptrendx ptrendx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase to the current main.

Signed-off-by: Tim Moon <tmoon@nvidia.com>
@timmoon10
Copy link
Collaborator Author

/te-ci

@timmoon10 timmoon10 merged commit 6b311da into NVIDIA:main Oct 24, 2023
@timmoon10 timmoon10 deleted the logging-refactor branch October 24, 2023 18:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

check_cublas file and line is not helpful

3 participants

Comments