Add interface for differentiating inputs and weights in CG & PCG#1493
Add interface for differentiating inputs and weights in CG & PCG#1493lockshaw merged 6 commits intoflexflow:repo-refactorfrom
Conversation
reyna-abhyankar
left a comment
There was a problem hiding this comment.
Reviewed 75 of 75 files at r1, all commit messages.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @lockshaw and @wmdi)
lib/models/src/models/transformer.cc line 93 at r1 (raw file):
vdim, config.dropout, /*bias=*/false);
Should we make bias false by default?
lib/op-attrs/include/op-attrs/ops/batch_matmul.h line 12 at r1 (raw file):
namespace FlexFlow { CHECK_VALID_OP_ATTR(BatchMatmulAttrs);
What does this do / was there any reason it wasn't there before?
lib/pcg/include/pcg/computation_graph_builder.h line 166 at r1 (raw file):
std::optional<InitializerAttrs> const &bias_initializer = std::nullopt, std::optional<std::string> const &name = std::nullopt, std::optional<std::string> const &projection_name = std::nullopt,
why are names necessary?
lib/pcg/src/pcg/computation_graph.cc line 25 at r1 (raw file):
} LayerAddedResult add_layer(ComputationGraph &computation_graph,
Why a new interface?
lib/op-attrs/test/src/op-attrs/ops/conv_2d.cc line 23 at r1 (raw file):
SUBCASE("with bias") { Conv2DAttrs attrs = make_attrs(true);
Suggestion:
make_attrs(/*use_bias=*/true)lib/op-attrs/test/src/op-attrs/ops/conv_2d.cc line 37 at r1 (raw file):
SUBCASE("without bias") { Conv2DAttrs attrs = make_attrs(false);
Suggestion:
make_attrs(/*use_bias=*/false)
lockshaw
left a comment
There was a problem hiding this comment.
Reviewable status: 74 of 75 files reviewed, 6 unresolved discussions (waiting on @reyna-abhyankar and @wmdi)
lib/models/src/models/transformer.cc line 93 at r1 (raw file):
Previously, reyna-abhyankar (Reyna Abhyankar) wrote…
Should we make
biasfalse by default?
Probably not, since bias=true is the pytorch default: https://pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html
lib/op-attrs/include/op-attrs/ops/batch_matmul.h line 12 at r1 (raw file):
Previously, reyna-abhyankar (Reyna Abhyankar) wrote…
What does this do / was there any reason it wasn't there before?
It just checks a couple basic properties (mainly just value/copy semantics) that honestly probably aren't all that necessary anymore now that we have an actual testing framework in place. I'll leave it in for now for consistency with the other operators, but I've created an issue to eventually go and remove them: #1498
lib/pcg/include/pcg/computation_graph_builder.h line 166 at r1 (raw file):
Previously, reyna-abhyankar (Reyna Abhyankar) wrote…
why are names necessary?
For testing--currently names are the way to retrieve layer_guid_ts from a ComputationGraph built by ComputationGraphBuilder--see https://github.com/lockshaw/FlexFlow/blob/3440b3e669943b69320b092fdd1e2975dbbf5496/lib/pcg/test/src/pcg/computation_graph.cc#L155-L204 for an example.
Currently there's a bit of awkwardness in that ComputationGraphBuilder is a bit high-level for a lot of tests (has the ability to auto-insert cast and broadcast operators, auto-create weights, etc. which makes getting layer_guid_ts and tensor_guid_ts a bit difficult and makes reasoning about what a test actually does a bit more challenging than it should be), but the low-level graph interface is often way too verbose. The solution is likely to add an interface in the middle (between ComputationGraphBuilder and the raw graph interface--tracked in #1499 and somewhat related to #1474), but I'm not yet ready to implement it, so for now we're using a balance of the two interfaces while I figure out a better long-term solution
lib/pcg/src/pcg/computation_graph.cc line 25 at r1 (raw file):
Previously, reyna-abhyankar (Reyna Abhyankar) wrote…
Why a new interface?
This interface actually has been there for a while, it just wasn't implemented. That said, there is good reason for having interfaces between the raw graph interface and ComputationGraphBuilder, as outlined in https://reviewable.io/reviews/flexflow/FlexFlow/1493#-O6UVfTk8CqARIr-WYgQ
lib/op-attrs/test/src/op-attrs/ops/conv_2d.cc line 23 at r1 (raw file):
SUBCASE("with bias") { Conv2DAttrs attrs = make_attrs(true);
Done.
lib/op-attrs/test/src/op-attrs/ops/conv_2d.cc line 37 at r1 (raw file):
SUBCASE("without bias") { Conv2DAttrs attrs = make_attrs(false);
Done.
wmdi
left a comment
There was a problem hiding this comment.
Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @reyna-abhyankar)
reyna-abhyankar
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! all files reviewed, all discussions resolved (waiting on @lockshaw)
lib/pcg/include/pcg/computation_graph_builder.h line 166 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For testing--currently names are the way to retrieve
layer_guid_ts from aComputationGraphbuilt byComputationGraphBuilder--see https://github.com/lockshaw/FlexFlow/blob/3440b3e669943b69320b092fdd1e2975dbbf5496/lib/pcg/test/src/pcg/computation_graph.cc#L155-L204 for an example.Currently there's a bit of awkwardness in that
ComputationGraphBuilderis a bit high-level for a lot of tests (has the ability to auto-insert cast and broadcast operators, auto-create weights, etc. which makes gettinglayer_guid_ts andtensor_guid_ts a bit difficult and makes reasoning about what a test actually does a bit more challenging than it should be), but the low-level graph interface is often way too verbose. The solution is likely to add an interface in the middle (betweenComputationGraphBuilderand the raw graph interface--tracked in #1499 and somewhat related to #1474), but I'm not yet ready to implement it, so for now we're using a balance of the two interfaces while I figure out a better long-term solution
Yeah, that's fair.
Description of changes:
PCGandop-attrstest suite organizingtransformerimplementation (MHAs should not usebias)Related Issues:
Linked Issues:
Issues closed by this PR:
This change is