Skip to content

Conversation

@altanh
Copy link
Contributor

@altanh altanh commented Jun 23, 2021

Tuples constructions and projections were not handled correctly (in particular, they were not reconstructed using the let bindings of their inputs) which led to expression duplication. Often the CSE pass is able to eliminate this erroneous duplication when first-order AD was paired with ToGNF but sometimes it can't (which we observed in BERT training, leading to many duplicated matmuls).

cc @MarisaKirisame @jroesch

TupleGetItem orig = TupleGetItem(tup->get<ADTensor>().forward, idx);
orig->checked_type_ = op->checked_type();
auto ret = std::make_shared<ADTensor>(ll, orig, diag_ctx);
// for orig = pi(tup, i), pi_grad(tup, i, g) = G where pi(G, i) = g and pi(G, j) = 0 for j != i
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain more here or delete

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, this is just describing how the gradient for a projection is propagated back to the original tuple

@jroesch jroesch merged commit 4f9e614 into apache:main Jun 24, 2021
ylc pushed a commit to ylc/tvm that referenced this pull request Sep 29, 2021
zxy844288792 pushed a commit to zxy844288792/tvm that referenced this pull request Mar 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants