Skip to content

Add tool for exporting and visualizing model architectures and SP decompositions#1490

Merged
lockshaw merged 117 commits intoflexflow:repo-refactorfrom
lockshaw:dump-model-arch
Sep 14, 2024
Merged

Add tool for exporting and visualizing model architectures and SP decompositions#1490
lockshaw merged 117 commits intoflexflow:repo-refactorfrom
lockshaw:dump-model-arch

Conversation

@lockshaw
Copy link
Collaborator

@lockshaw lockshaw commented Sep 6, 2024

Description of changes:

  • Add tool in bin/ for exporting and visualizing model architectures (export-model-arch)
  • Add utils/cli library to make a nice interface for the above
  • Fix bug in SP decomposition due to not checking in get_cbc_decomposition that the complete bipartite components were actually complete
  • Fix bug in and optimize transitive_reduction, add optimized transitive_closure implementation
  • Add infrastructure for generating, manipulating, and exporting binary SP decomposition trees
  • Rename serial-parallel to series-parallel to match literature
  • Add input/weight all-to-all insertion preprocessing for computation graphs to prevent inputs and weights from making graphs non-SP
  • Move doctest formatting out of utils/fmt and into test/utils/doctest/fmt, as having it in utils required adding a doctest dependency to utils which prevented binaries from getting built due to Linking errors, when using --no-undefined linking flag or setting default symbol visibility to hidden doctest/doctest#609

To run export-model-arch (from the root of the repository):

$ proj build
$ ./build/normal/bin/export-model-arch/export-model-arch -h

should print a help message.

Related Issues:

Linked Issues:

Issues closed by this PR:

  • Closes #

This change is Reviewable

@codecov
Copy link

codecov bot commented Sep 8, 2024

Codecov Report

Attention: Patch coverage is 55.47445% with 488 lines in your changes missing coverage. Please review.

Project coverage is 70.45%. Comparing base (2b4106f) to head (59d9068).
Report is 9 commits behind head on repo-refactor.

Files with missing lines Patch % Lines
bin/export-model-arch/src/export_model_arch.cc 0.00% 113 Missing ⚠️
...computation_graph_series_parallel_decomposition.cc 0.00% 54 Missing ⚠️
lib/pcg/src/pcg/computation_graph.cc 16.92% 54 Missing ⚠️
...allel/computation_graph_binary_sp_decomposition.cc 0.00% 42 Missing ⚠️
.../graph/instances/unordered_set_undirected_graph.cc 0.00% 33 Missing ⚠️
lib/models/src/models/split_test/split_test.cc 0.00% 26 Missing ⚠️
lib/compiler/src/machine_mapping.cc 0.00% 22 Missing ⚠️
...c/utils/graph/digraph/algorithms/digraph_as_dot.cc 0.00% 16 Missing ⚠️
lib/utils/src/utils/cli/cli_spec.cc 17.64% 14 Missing ⚠️
lib/op-attrs/src/op-attrs/ops/broadcast.cc 0.00% 12 Missing ⚠️
... and 34 more
Additional details and impacted files
@@                Coverage Diff                @@
##           repo-refactor    #1490      +/-   ##
=================================================
+ Coverage          68.49%   70.45%   +1.95%     
=================================================
  Files                612      705      +93     
  Lines              18120    20861    +2741     
  Branches             519      607      +88     
=================================================
+ Hits               12412    14698    +2286     
- Misses              5708     6163     +455     
Flag Coverage Δ
unittests 70.45% <55.47%> (+1.95%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
lib/compiler/include/compiler/graph_utils.h 0.00% <ø> (ø)
lib/compiler/include/compiler/machine_mapping.h 0.00% <ø> (ø)
lib/kernels/include/kernels/array_shape.h 100.00% <ø> (ø)
lib/kernels/include/kernels/attention_kernels.h 0.00% <ø> (ø)
lib/kernels/src/allocation.cc 62.50% <ø> (ø)
...-execution/include/local-execution/cost_estimate.h 0.00% <ø> (ø)
...cal-execution/include/local-execution/op_arg_ref.h 0.00% <ø> (ø)
...ution/include/local-execution/op_task_invocation.h 82.35% <ø> (ø)
lib/local-execution/src/legion_tensor_shape.cc 0.00% <ø> (ø)
lib/local-execution/src/local_slots_backing.cc 82.88% <ø> (ø)
... and 141 more

... and 98 files with indirect coverage changes

@lockshaw lockshaw requested review from Marsella8 and wmdi September 8, 2024 06:31
@lockshaw lockshaw marked this pull request as ready for review September 8, 2024 06:31
Copy link
Collaborator

@wmdi wmdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 232 of 437 files at r1, 18 of 36 files at r2, all commit messages.
Reviewable status: 250 of 441 files reviewed, 2 unresolved discussions (waiting on @lockshaw and @Marsella8)


lib/compiler/include/compiler/series_parallel/computation_graph_binary_sp_decomposition.struct.toml line 22 at r2 (raw file):

[[fields]]
name = "raw_tree"
type = "::FlexFlow::GenericBinarySPDecompositionTree<::FlexFlow::layer_guid_t>"

Then how do we apply SPD for PCGs? Shouldn't we implement SPD for PCGs, and apply SPD for CGs by first converting CGs to PCGs?


lib/utils/include/utils/graph/series_parallel/series_parallel_decomposition.h line 15 at r2 (raw file):

    to_final_ast(std::variant<IntermediateSpDecompositionTree, Node> const &);

std::unordered_multiset<Node> get_nodes(SeriesParallelDecomposition const &sp);

Why is it multiset?

Copy link
Collaborator

@wmdi wmdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 173 of 437 files at r1, 18 of 36 files at r2.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @lockshaw and @Marsella8)

Copy link
Collaborator Author

@lockshaw lockshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @Marsella8 and @wmdi)


lib/compiler/include/compiler/series_parallel/computation_graph_binary_sp_decomposition.struct.toml line 22 at r2 (raw file):

Previously, wmdi (Mengdi Wu) wrote…

Then how do we apply SPD for PCGs? Shouldn't we implement SPD for PCGs, and apply SPD for CGs by first converting CGs to PCGs?

This PR doesn't add SPD for PCGs (SPD for PCGs is more complex due to potential parallel ops being inserted on the weight nodes). Doing CG SPD by converting a CG to a PCG and then doing SPD would be fine, but that would require having PCG SPD implemented, which I'd rather not fold into this already large PR.


lib/utils/include/utils/graph/series_parallel/series_parallel_decomposition.h line 15 at r2 (raw file):

Previously, wmdi (Mengdi Wu) wrote…

Why is it multiset?

Because in approximate sp-decomposition (i.e., @Marsella8's stuff) node duplication can result in having the same node id in multiple places in the SP-decomposition I believe (@Marsella8 is this correct?)

Copy link
Collaborator

@wmdi wmdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @Marsella8)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants