Tweak MPMS materialization to handle some edge cases #624

majosm · 2025-11-14T20:08:31Z

This PR adds a couple of tweaks to the MPMS materialization scheme to try to handle the two major cases of arrays not being materialized when they should be (i.e., they contribute a lot of excess computation), as found when experimenting with mirgecom's gas-in-box with the flop counting from #623. The two cases are:

Indexing operations that heavily reuse the entries of a small array to fill a big array. This occurs in mirgecom simulations when lifting fluxes from the boundary to all faces. To deal with this, I modified the materialization to compute the number of successors based not only on the number of successor nodes but on the amount of entry reuse as well.
Small subexpressions (len(materialized_predecessors) < 2) that are used by a large number of materialized nodes. In mirgecom simulations, this occurs most frequently on powers of temperature, where the subexpression only depends on one materialized array (temperature) and has a single computational node (**), but due to a large amount of reuse (up to 30x) it contributes a lot of flops. I attempted to fix this by modifying the MPMS materialization criterion slightly (nsuccessors > 1 and len(materialized_predecessors) > 1 changed to nsuccessors > 1 and nsuccessors + len(materialized_predecessors) >= 4). This may have the side effect of materializing some operations in the DAG that don't strictly need to be (Reshapes). Not sure yet how much this matters.

Results for gas-in-box:

	Excess flops vs. all arrays materialized	Arrays materialized
base	45.2%	6.4%
fix 1	35.4%	6.9%
fix 1 + fix 2	-6.9%	8.8%

Results for wave (which wasn't that bad to begin with):

	Excess flops vs. all arrays materialized	Arrays materialized
base	7.0%	11.0%
fix 1	1.4%	12.3%
fix 1 + fix 2	-1.0%	17.8%

I'd still like to try this on the full prediction case to see if it behaves any differently from gas-in-box, and also to see how these changes affect timestep time.

Note 1: I moved the materialization code to a new module, because I was originally planning to add this as a separate materializer. I ultimately decided to tweak the existing one instead, but I left the new module. I figure transform/__init__.py is too crowded anyway, it could use some splitting up.

Note 2: Due to the module change, it's likely best to read (and merge) this PR commit by commit.

majosm · 2025-11-22T01:40:13Z

FWIW, here's the data from KS3D (with some slight tweaks to make it more like the actual prediction case) running serially on Tuolumne:

	Excess flops vs. all arrays materialized	Arrays materialized	Compile time (s)	Timestep time (s)
base	76.1%	3.5%	497.2	0.0626
fix 1	69.3%	3.8%	553.7	0.0610
fix 1 + fix 2	-2.0%	5.4%	782.1	0.0626

Turns out these tweaks don't really help reduce the timestep time. I'm guessing since they're reducing the flop count (as well as potentially increasing the number of memory accesses), they're just reducing arithmetic intensity?

I'm tempted to abandon Fix 2 since it has such a big effect on compile time. Fix 1 might be worth keeping though...

majosm · 2025-12-10T21:40:41Z

I removed fix 2 from this PR for now. It's still available at majosm/refine-mpms-materialization-part2.

Otherwise, this looks ready for review (still best to look at it commit by commit).

inducer · 2025-12-15T16:16:02Z

pytato/transform/materialize.py


+    nsuccessors = 0
+    for successor in successors:
+        # Handle indexing with heavy reuse, if the sizes are known ahead of time


Add specific explanation here of the case where this is useful.

… handle indexing with heavy reuse

majosm force-pushed the refine-mpms-materialization branch 2 times, most recently from 3e5b747 to 3326f49 Compare November 22, 2025 01:23

majosm force-pushed the refine-mpms-materialization branch from 3326f49 to 5423002 Compare December 10, 2025 21:04

majosm marked this pull request as ready for review December 10, 2025 21:40

majosm requested a review from inducer December 10, 2025 21:40

inducer reviewed Dec 15, 2025

View reviewed changes

majosm added 4 commits December 15, 2025 11:40

move materialization code to its own module

1eaa42a

add get_list_of_users function in analysis

de8c707

tweak the definition of 'multiple successors' in MPMS materializer to…

8d586c5

… handle indexing with heavy reuse

add more explanation for MPMS reuse tweak

6b5356f

majosm force-pushed the refine-mpms-materialization branch 3 times, most recently from 1339d7a to 6b5356f Compare December 20, 2025 00:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tweak MPMS materialization to handle some edge cases #624

Tweak MPMS materialization to handle some edge cases #624

Uh oh!

majosm commented Nov 14, 2025 •

edited

Loading

Uh oh!

majosm commented Nov 22, 2025

Uh oh!

majosm commented Dec 10, 2025

Uh oh!

inducer Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Tweak MPMS materialization to handle some edge cases #624

Are you sure you want to change the base?

Tweak MPMS materialization to handle some edge cases #624

Uh oh!

Conversation

majosm commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

majosm commented Nov 22, 2025

Uh oh!

majosm commented Dec 10, 2025

Uh oh!

inducer Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

majosm commented Nov 14, 2025 •

edited

Loading