Skip to content

Conversation

@vinx13
Copy link
Member

@vinx13 vinx13 commented Sep 21, 2022

This PR added a tuple-sum based implementation of layer norm. It performs one-pass reduction to compute mean and variance at the same time.
Reducer pattern is also added to allow LowerCrossThreadReduction to handle this case.
On CUDA, it will generate two kernels: one for reduction and one for elemwise operations. Because of some limitation of compute_at currently we are not able to fuse them into one kernel.

cc @MasterJH5574 @junrushao @AndrewZhaoLuo

Copy link
Contributor

@MasterJH5574 MasterJH5574 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @vinx13 for implementing layer-norm! I just have two tiny nits.

from .bnn import *
from .qnn import *
from .upsampling import *
from .layer_norm import layer_norm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about importing * 👀? Since I see all other imports import *.

Suggested change
from .layer_norm import layer_norm
from .layer_norm import *

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wildcard importing is actually not a good idea though lol

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, so I avoid using wildcard here. perhaps we should clean up this file in the future

@vinx13 vinx13 merged commit 4e783a6 into apache:main Sep 22, 2022
xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 25, 2022
* [TOPI] Add one-pass layer norm using tuple reduction

* Add reducer pattern for LowerCrossThreadReduction

* lint

* update docs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants