Add ParallelComputationGraphBuilder#1411
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## repo-refactor #1411 +/- ##
==================================================
+ Coverage 25.60% 39.79% +14.19%
==================================================
Files 228 248 +20
Lines 7484 8744 +1260
Branches 294 320 +26
==================================================
+ Hits 1916 3480 +1564
+ Misses 5568 5264 -304
Flags with carried forward coverage won't be shown. Click here to find out more.
|
reyna-abhyankar
left a comment
There was a problem hiding this comment.
Reviewed 283 of 285 files at r1.
Reviewable status: 283 of 285 files reviewed, 8 unresolved discussions (waiting on @lockshaw)
lib/op-attrs/include/op-attrs/get_output_shapes.h line 119 at r1 (raw file):
ParallelTensorShape get_output_shape(ConcatAttrs const &, std::vector<ParallelTensorShape> const &); ParallelTensorShape get_output_shape(Conv2DAttrs const &,
Why delete?
lib/op-attrs/include/op-attrs/ops/cast.h line 14 at r1 (raw file):
CHECK_VALID_OP_ATTR(CastAttrs); tl::expected<TensorShape, std::string> get_output_shape(CastAttrs const &,
Why is this moved over from the other file?
lib/pcg/include/pcg/computation_graph_builder.h line 285 at r1 (raw file):
std::optional<std::string> const &name = std::nullopt); tensor_guid_t element_unary(ElementUnaryAttrs const &,
Why delete?
lib/pcg/include/pcg/file_format/v1/graphs/v1_operator_graph.struct.toml line 15 at r1 (raw file):
"<vector>", "<unordered_set>", "pcg/file_format/v1/graphs/v1_graph_edge.dtg.h",
Why is there a v1?
lib/pcg/include/pcg/parallel_computation_graph/parallel_computation_graph.h line 9 at r1 (raw file):
namespace FlexFlow {
I think an interface here for getting the topological / reverse topological ordering would be nice since thats what the eventual legion backing will use
lib/pcg/include/pcg/parallel_computation_graph/parallel_computation_graph_builder.h line 10 at r1 (raw file):
namespace FlexFlow { struct ParallelComputationGraphBuilder {
What about the rest of the operators? Is that a future todo
lib/utils/include/utils/fmt.h line 15 at r1 (raw file):
namespace fmt { template <typename T, typename Char>
Why are these chunks deleted / moved over?
lib/op-attrs/src/op-attrs/datatype.cc line 24 at r1 (raw file):
} bool can_strictly_promote_datatype_from_to(DataType src, DataType dst) {
Can we do this more explicitly? If someone changes the ordering of datatypes in the original enum, this can be wrong.
lockshaw
left a comment
There was a problem hiding this comment.
Reviewable status: 269 of 307 files reviewed, 8 unresolved discussions (waiting on @reyna-abhyankar)
lib/op-attrs/include/op-attrs/get_output_shapes.h line 119 at r1 (raw file):
Previously, reyna-abhyankar (Reyna Abhyankar) wrote…
Why delete?
They're now declared in the corresponding header in op-attrs/ops/ (e.g., get_output_shape(CastAttrs const &, ParallelTensorShape const & is now declared in op-attrs/ops/cast.h)
lib/op-attrs/include/op-attrs/ops/cast.h line 14 at r1 (raw file):
Previously, reyna-abhyankar (Reyna Abhyankar) wrote…
Why is this moved over from the other file?
To me it seemed to make more sense to define these per-operator (so that all of the functions for a single operator type are grouped together) rather than define these per-capability (so that all of parallel shape inference is together) since each operator has a lot of shared stuff between the functions and may have extra functions (get_kernel_shape, get_bias_shape, etc.). If we need a common interface to them all that can always be added later by wrapping all of the per-operator functions (it's actually not unlikely this will get added later, but I'd prefer to wait until the use-case is clear)
lib/pcg/include/pcg/computation_graph_builder.h line 285 at r1 (raw file):
Previously, reyna-abhyankar (Reyna Abhyankar) wrote…
Why delete?
With the ElementScalarUnary and ElementUnaryAttrs combined the additional functions felt necessary. I actually should have probably deleted ComputationGraphBuilder::element_scalar_unary too, which I will do unless you object
lib/pcg/include/pcg/file_format/v1/graphs/v1_operator_graph.struct.toml line 15 at r1 (raw file):
Previously, reyna-abhyankar (Reyna Abhyankar) wrote…
Why is there a v1?
V1 refers to the actual json file format--there's separate V1 types for most of PCG to try to make the file format more explicit
lib/pcg/include/pcg/parallel_computation_graph/parallel_computation_graph.h line 9 at r1 (raw file):
Previously, reyna-abhyankar (Reyna Abhyankar) wrote…
I think an interface here for getting the topological / reverse topological ordering would be nice since thats what the eventual legion backing will use
Done.
lib/pcg/include/pcg/parallel_computation_graph/parallel_computation_graph_builder.h line 10 at r1 (raw file):
Previously, reyna-abhyankar (Reyna Abhyankar) wrote…
What about the rest of the operators? Is that a future todo
Yeah, this is just the list of operators we have shape inference for at the moment
lib/utils/include/utils/fmt.h line 15 at r1 (raw file):
Previously, reyna-abhyankar (Reyna Abhyankar) wrote…
Why are these chunks deleted / moved over?
The problem with grouping everything into a single fmt.h file is it means that anything that uses a single of these implementations gets rebuilt anytime any of them change. Having them per-file is a bit less convenient (as users have to choose which types they want to format in the includes) but leads to a lot less "rebuild the world" behavior. If this becomes too impractical I can move it back or add a wrapper header that just pulls the rest in (which may be the best solution), but for now I think utils could use more divided up headers rather than less
lib/op-attrs/src/op-attrs/datatype.cc line 24 at r1 (raw file):
Previously, reyna-abhyankar (Reyna Abhyankar) wrote…
Can we do this more explicitly? If someone changes the ordering of datatypes in the original enum, this can be wrong.
The best I can come up with is to added wrapper types for the arguments (i.e., SrcDataType and DstDataType), but I'm not sure the extra code is worth it as the whole "from, to" order is explicitly in the function name. The real answer here is to use a language with named arguments. If you think it's worth the extra code I'm happy to add it, but taking that approach might get pretty crazy pretty fast
reyna-abhyankar
left a comment
There was a problem hiding this comment.
Reviewed 2 of 285 files at r1, 36 of 36 files at r2, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @lockshaw)
lib/op-attrs/include/op-attrs/get_output_shapes.h line 119 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
They're now declared in the corresponding header in
op-attrs/ops/(e.g.,get_output_shape(CastAttrs const &, ParallelTensorShape const &is now declared inop-attrs/ops/cast.h)
Should that also happen for other operators?
lib/op-attrs/include/op-attrs/ops/cast.h line 14 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
To me it seemed to make more sense to define these per-operator (so that all of the functions for a single operator type are grouped together) rather than define these per-capability (so that all of parallel shape inference is together) since each operator has a lot of shared stuff between the functions and may have extra functions (
get_kernel_shape,get_bias_shape, etc.). If we need a common interface to them all that can always be added later by wrapping all of the per-operator functions (it's actually not unlikely this will get added later, but I'd prefer to wait until the use-case is clear)
Then we should do this for the rest of the operators (that can be a future todo, along with some of the other operator-specific changes made in this PR)
lib/pcg/include/pcg/computation_graph_builder.h line 285 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
With the
ElementScalarUnaryandElementUnaryAttrscombined the additional functions felt necessary. I actually should have probably deletedComputationGraphBuilder::element_scalar_unarytoo, which I will do unless you object
Works for me.
lib/pcg/test/src/pcg/parallel_computation_graph/parallel_computation_graph.cc line 5 at r2 (raw file):
TEST_SUITE(FF_TEST_SUITE) { TEST_CASE("topological_ordering") {
What is the difference between this test and the test above?
lib/op-attrs/src/op-attrs/datatype.cc line 24 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
The best I can come up with is to added wrapper types for the arguments (i.e.,
SrcDataTypeandDstDataType), but I'm not sure the extra code is worth it as the whole "from, to" order is explicitly in the function name. The real answer here is to use a language with named arguments. If you think it's worth the extra code I'm happy to add it, but taking that approach might get pretty crazy pretty fast
Instead of doing that, you can just manually compare each possibility. There's only 6 of them so it shouldn't be that much code
...
} else if (src == DataType::INT32) {
return dst == DataType::INT32 || dst == DataType::INT64 ...
} ...
And you can assert if the datatype is not one of the listed options (so this breaks if a new datatype is added a new datatype). Where is this function used btw?
lib/pcg/test/src/pcg/dataflow_graph/algorithms.cc line 49 at r2 (raw file):
} TEST_CASE("topological_ordering") {
Should this PR test more complex cases or leave that for a future PR?
lockshaw
left a comment
There was a problem hiding this comment.
Reviewable status: 304 of 307 files reviewed, 4 unresolved discussions (waiting on @reyna-abhyankar)
lib/op-attrs/include/op-attrs/get_output_shapes.h line 119 at r1 (raw file):
Previously, reyna-abhyankar (Reyna Abhyankar) wrote…
Should that also happen for other operators?
Eventually--this is being done incrementally as shape inference gets added for various operators
lib/op-attrs/include/op-attrs/ops/cast.h line 14 at r1 (raw file):
Previously, reyna-abhyankar (Reyna Abhyankar) wrote…
Then we should do this for the rest of the operators (that can be a future todo, along with some of the other operator-specific changes made in this PR)
Yeah it's being incrementally done--each PR that adds shape inference moves the corresponding functions
lib/pcg/include/pcg/computation_graph_builder.h line 285 at r1 (raw file):
Previously, reyna-abhyankar (Reyna Abhyankar) wrote…
Works for me.
Done
lib/pcg/test/src/pcg/parallel_computation_graph/parallel_computation_graph.cc line 5 at r2 (raw file):
Previously, reyna-abhyankar (Reyna Abhyankar) wrote…
What is the difference between this test and the test above?
This is for ParallelComputationGraph, not DataflowGraph
lib/op-attrs/src/op-attrs/datatype.cc line 24 at r1 (raw file):
Previously, reyna-abhyankar (Reyna Abhyankar) wrote…
Instead of doing that, you can just manually compare each possibility. There's only 6 of them so it shouldn't be that much code
... } else if (src == DataType::INT32) { return dst == DataType::INT32 || dst == DataType::INT64 ... } ...And you can assert if the datatype is not one of the listed options (so this breaks if a new datatype is added a new datatype). Where is this function used btw?
In ParallelComputationGraphBuilder::as_type and in shape inference for Cast: https://github.com/lockshaw/FlexFlow/blob/a9cee72c2d317c1dd36c63b0bb66880530f5691a/lib/pcg/src/pcg/parallel_computation_graph/parallel_computation_graph_builder.cc#L425-L427
lib/pcg/test/src/pcg/dataflow_graph/algorithms.cc line 49 at r2 (raw file):
Previously, reyna-abhyankar (Reyna Abhyankar) wrote…
Should this PR test more complex cases or leave that for a future PR?
Leave it for now, DataflowGraph is getting moved into utils/graph and there is currently work on improving unit testing there, so this can get dealt with then
|
Previously, lockshaw (Colin Unger) wrote…
I misread your initial comment, this is now done |
reyna-abhyankar
left a comment
There was a problem hiding this comment.
Reviewed 3 of 3 files at r3, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @lockshaw)
lib/op-attrs/test/src/datatype.cc line 30 at r3 (raw file):
RC_PRE(can_strictly_promote_datatype_from_to(d1, d2)); RC_PRE(can_strictly_promote_datatype_from_to(d2, d3)); RC_ASSERT(!can_strictly_promote_datatype_from_to(d1, d3));
Maybe I don't know the exact syntax here, but shouldn't this be
RC_ASSERT(can_strictly_promote_datatype_from_to(d1, d3));?
lib/op-attrs/src/op-attrs/datatype.cc line 31 at r3 (raw file):
break; case DataType::INT32: allowed = {DataType::INT64};
why not float or double for ints? or are we only allowing exact precision
lockshaw
left a comment
There was a problem hiding this comment.
Reviewable status: 301 of 319 files reviewed, 2 unresolved discussions (waiting on @reyna-abhyankar)
lib/op-attrs/test/src/datatype.cc line 30 at r3 (raw file):
Previously, reyna-abhyankar (Reyna Abhyankar) wrote…
Maybe I don't know the exact syntax here, but shouldn't this be
RC_ASSERT(can_strictly_promote_datatype_from_to(d1, d3));?
Yup! Fixed
lib/op-attrs/src/op-attrs/datatype.cc line 31 at r3 (raw file):
Previously, reyna-abhyankar (Reyna Abhyankar) wrote…
why not float or double for ints? or are we only allowing exact precision
That's what I'm currently doing. Tbh I don't have a super clear idea of exactly what the correct behavior here is, so I'm starting with this and I'll figure it out later 🙂
reyna-abhyankar
left a comment
There was a problem hiding this comment.
Reviewed 18 of 18 files at r4, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @lockshaw)
lib/op-attrs/test/src/datatype.cc line 30 at r3 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Yup! Fixed
And did you figure out why it was passing before?
lib/op-attrs/src/op-attrs/datatype.cc line 31 at r3 (raw file):
Previously, lockshaw (Colin Unger) wrote…
That's what I'm currently doing. Tbh I don't have a super clear idea of exactly what the correct behavior here is, so I'm starting with this and I'll figure it out later 🙂
Fair enough. Though if bool can be promoted to anything, then shouldn't int32 also be promotable to everything except bool?
|
Previously, reyna-abhyankar (Reyna Abhyankar) wrote…
Why should it? |
reyna-abhyankar
left a comment
There was a problem hiding this comment.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @lockshaw)
lib/op-attrs/src/op-attrs/datatype.cc line 31 at r3 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Why should it?
I guess if you're going for exact precision then it's fine as is.
lockshaw
left a comment
There was a problem hiding this comment.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @reyna-abhyankar)
lib/op-attrs/test/src/datatype.cc line 30 at r3 (raw file):
Previously, reyna-abhyankar (Reyna Abhyankar) wrote…
And did you figure out why it was passing before?
Yes, that's why stuff is now using RC_SUBCASE--the error wasn't getting properly handed over to doctest
lib/op-attrs/src/op-attrs/datatype.cc line 31 at r3 (raw file):
Previously, reyna-abhyankar (Reyna Abhyankar) wrote…
I guess if you're going for exact precision then it's fine as is.
I'm not entirely sure what the right answer here is, so I figured I'd err on the side of of being too conservative about the behavior and we can tweak if it ends up being unnecessarily restrictive
Description of changes:
ParallelComputationGraphBuilder.dtgento setstructconstructors to beexplicit, both for clarity and to remove a bug in string conversion.Related Issues:
Linked Issues:
Issues closed by this PR:
This change is