Refactor logging macros by timmoon10 · Pull Request #382 · NVIDIA/TransformerEngine

timmoon10 · 2023-08-16T03:45:26Z

Changes:

Removes the logging macros from the installed headers. The installed headers are now pure C. There are now separate headers in common and each of the frameworks (I'm open to suggestion on this design).
Reimplements the logging macros in common to use the string utility functions and to improve the error messages (see check_cublas file and line is not helpful #376).
Follow the Google style guide for header includes in the files that I touched.

The refactored logging macros are backward compatible, but it's now easier to make descriptive error messages. For example, consider:

TransformerEngine/transformer_engine/common/transpose/transpose.cu

Line 130 in cbfb8c6

NVTE_CHECK(input.data.shape.size() == 2, "Input must have 2 dimensions.");

We can now do something like:

NVTE_CHECK(input.data.shape.size() == 2,
           "Input must have 2 dimensions, ",
           "but found ", input.data.shape.size(), ".");

Closes #376.

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 · 2023-08-16T04:36:57Z

/te-ci

timmoon10 · 2023-08-16T17:57:50Z

/te-ci

ksivaman · 2023-08-16T19:33:28Z

We would have to also change to the correct includes for the cpptests

Use Google style for header includes. Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 · 2023-08-16T19:56:52Z

/te-ci

Incorporating changes from NVIDIA#389. Co-authored-by: Tim Moon <tmoon@nvidia.com> Co-authored-by: Jan Bielak <jbielak@nvidia.com> Signed-off-by: Tim Moon <tmoon@nvidia.com>

Hack to get around macro redefinition warning. Signed-off-by: Tim Moon <tmoon@nvidia.com>

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 · 2023-08-18T23:01:45Z

/te-ci

ksivaman · 2023-08-19T00:18:35Z

Why not reuse logging across the FWs?

timmoon10 · 2023-08-19T01:08:44Z

As I see it, common has three kinds of headers:

Public C API headers (common/include/transformer_engine/transformer_engine.h)
C++ headers that are shared between frameworks (common/include/transformer_engine/logging.h)
C++ headers for internal implementation (common/util/vectorized_pointwise.h)

Currently (1) and (2) are combined together in common/include/transformer_engine. I'd prefer to avoid the frameworks accessing (3) as much as possible (although there is leakage in PyTorch due to the attention infrastructure). It would also be better if we didn't expose (2) to external users.

This PR makes logging.h depend on common/util/string.h. I see a few solutions:

Move framework-common header to something like common/include/transformer_engine_utils, which is exposed to the frameworks but not installed with the C API headers.
Remove logging.h from the installed headers and reimplement in frameworks. This gets rid of the framework-common headers.
Move string.h to common/include/transformer_engine.
Expose implementation headers in frameworks.

I think (1) is the right answer, but it's more complicated. (2) and (3) are simple, but have some iffiness. I don't like (4).

EDIT: Looking at #393, (4) doesn't seem as bad. The CUDA utilities are sufficiently complicated to make reimplementing a bad idea. I think the main difference is how the headers are called. #include "common/util/logging.h" is clearer than #include "util/logging.h".

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 · 2023-09-02T00:27:56Z

/te-ci

ptrendx · 2023-10-03T23:25:03Z

I would probably prefer changing current common directory to core or libtransformer_engine and have common directory as an actual source for common code across both core and the FW-dependent code, like the logging.

timmoon10 · 2023-10-06T22:11:06Z

I would probably prefer changing current common directory to core or libtransformer_engine and have common directory as an actual source for common code across both core and the FW-dependent code, like the logging.

Should we handle that now or in a future PR? That would significantly increase the scope of this refactor since we would need to change the build system to build the "common" library separately from the "core" library. I think there's value in improving the logging quickly.

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 · 2023-10-06T22:37:34Z

/te-ci

timmoon10 · 2023-10-11T18:26:08Z

/te-ci

ptrendx

Please rebase to the current main.

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 · 2023-10-24T17:15:52Z

/te-ci

timmoon10 added 2 commits August 15, 2023 18:03

Do not include logging macros in installed C headers

59eef99

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Debug logging macros

39fcdf4

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 added enhancement New feature or request bug Something isn't working labels Aug 16, 2023

sophiawisdom mentioned this pull request Aug 16, 2023

Update logging to give correct line/file/func numbers #381

Closed

Debug C++ tests

9140a42

Use Google style for header includes. Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 mentioned this pull request Aug 18, 2023

Improve upon error reporting in common #389

Closed

timmoon10 and others added 4 commits August 18, 2023 14:58

Update CUDA driver macros

01ce05c

Incorporating changes from NVIDIA#389. Co-authored-by: Tim Moon <tmoon@nvidia.com> Co-authored-by: Jan Bielak <jbielak@nvidia.com> Signed-off-by: Tim Moon <tmoon@nvidia.com>

Use core error checking macros in PyTorch extensions

546bcfc

Hack to get around macro redefinition warning. Signed-off-by: Tim Moon <tmoon@nvidia.com>

Fix missing arg when getting CUDA driver error string

62b429e

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Merge branch 'main' into logging-refactor

6a28dbc

ksivaman self-requested a review August 19, 2023 00:18

timmoon10 mentioned this pull request Aug 21, 2023

Error handle for non-sm80/sm90 GPUs when using fused attention #393

Merged

2 tasks

timmoon10 marked this pull request as draft August 23, 2023 21:15

timmoon10 added 2 commits September 1, 2023 17:21

Merge branch 'main' into logging-refactor

62d7106

Reuse logging header in frameworks

598f5b5

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 marked this pull request as ready for review September 5, 2023 23:34

Merge branch 'main' into logging-refactor

d5d3398

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 requested a review from ptrendx October 6, 2023 22:37

Merge branch 'main' into logging-refactor

eab9148

ptrendx approved these changes Oct 20, 2023

View reviewed changes

Merge branch 'main' into logging-refactor

dd5ecc8

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 merged commit 6b311da into NVIDIA:main Oct 24, 2023

timmoon10 deleted the logging-refactor branch October 24, 2023 18:49

Conversation

timmoon10 commented Aug 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timmoon10 commented Aug 16, 2023

Uh oh!

timmoon10 commented Aug 16, 2023

Uh oh!

ksivaman commented Aug 16, 2023

Uh oh!

timmoon10 commented Aug 16, 2023

Uh oh!

timmoon10 commented Aug 18, 2023

Uh oh!

ksivaman commented Aug 19, 2023

Uh oh!

timmoon10 commented Aug 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timmoon10 commented Sep 2, 2023

Uh oh!

ptrendx commented Oct 3, 2023

Uh oh!

timmoon10 commented Oct 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timmoon10 commented Oct 6, 2023

Uh oh!

timmoon10 commented Oct 11, 2023

Uh oh!

ptrendx left a comment

Choose a reason for hiding this comment

Uh oh!

timmoon10 commented Oct 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

timmoon10 commented Aug 16, 2023 •

edited

Loading

timmoon10 commented Aug 19, 2023 •

edited

Loading

timmoon10 commented Oct 6, 2023 •

edited

Loading