fix bug in non-FP8 nvfuser path by ksivaman · Pull Request #81 · NVIDIA/TransformerEngine

ksivaman · 2023-02-25T00:38:33Z

This was mistakenly added during #41

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

ksivaman · 2023-02-25T00:46:47Z

/te-ci

ptrendx

Makes sense.

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by: Charlene Yang <charleney@nvidia.com>

[New feature] With cudnn backend 9.2.0 and above, `Graph::check_support` can determine support check for runtime engines without invoking the nvrtc compiler. This allows users to check the support surface of cudnn without invoking the nvrtc compilation. [New feature] Python pip wheel now contains the necessary c++ development headers. [New feature] Sliding window attention is now supported as an attribute to the sdpa forward and bprop node. Usage: `sdpa_attributes.set_sliding_window_length(window_length)` [New feature] Bottom right aligned causal masking is now supported as an attribute to the sdpa forward and bprop node. Usage: `sdpa_attributes.use_causal_mask_bottom_right(true)` [New feature] SDPA bprop attributes can choose deterministic algorithm using the `use_deterministic_algorithm` API. [New feature] Allow users to filter candidate execution plans of graph by its shared memory usage in cudnn 9.2.0 and later. [Bug fix] A runtime error if chosen execution plan candidate is incorrectly set in the backend has been fixed. This would happen when `check_support` does not correctly filter by the workspace size. [Bug fix] selecting/deselecting by behavior and numerical notes has now been fixed and works as intended. [Debugging] A new tool for easy reproduction of a failure using the json representation of the graph can be found [here](tools/json_reproducer). [Samples] Restructured the cpp samples into categories for easier navigation. [Samples] Added a sample to showcase how different plans can be built in parallel in separate threads. [Compilation enhancement] Added a new macro `CUDNN_FRONTEND_SKIP_NLOHMANN_JSON` as compilation flag to not have nlohman::json as compilation dependency. Users lose access to certain API functions like `print`, `key`, `serialize`, `deserialzie` that depend on the library. [Enhancement] Serialization of resample operation is now supported. [Enhancement] Bug template has been added for new github issues

fix bug in non-FP8 nvfuser path

a89a57c

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

ksivaman requested review from ptrendx and timmoon10 February 25, 2023 00:38

ptrendx approved these changes Feb 25, 2023

View reviewed changes

ksivaman merged commit 67114f9 into NVIDIA:main Feb 25, 2023

ptrendx pushed a commit that referenced this pull request Mar 7, 2023

fix bug in non-FP8 nvfuser path (#81)

f18e677

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

cyanguwa pushed a commit to cyanguwa/TransformerEngine that referenced this pull request Apr 1, 2023

fix bug in non-FP8 nvfuser path (NVIDIA#81)

393ba99

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by: Charlene Yang <charleney@nvidia.com>

ksivaman deleted the bias_gelu_nvfusion_bug_fix branch July 19, 2023 01:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix bug in non-FP8 nvfuser path#81

fix bug in non-FP8 nvfuser path#81
ksivaman merged 1 commit intoNVIDIA:mainfrom
ksivaman:bias_gelu_nvfusion_bug_fix

ksivaman commented Feb 25, 2023

Uh oh!

ksivaman commented Feb 25, 2023

Uh oh!

ptrendx left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

ksivaman commented Feb 25, 2023

Uh oh!

ksivaman commented Feb 25, 2023

Uh oh!

ptrendx left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments