Skip to content

Conversation

@majosm
Copy link
Collaborator

@majosm majosm commented Nov 14, 2025

This PR adds a couple of tweaks to the MPMS materialization scheme to try to handle the two major cases of arrays not being materialized when they should be (i.e., they contribute a lot of excess computation), as found when experimenting with mirgecom's gas-in-box with the flop counting from #623. The two cases are:

  1. Indexing operations that heavily reuse the entries of a small array to fill a big array. This occurs in mirgecom simulations when lifting fluxes from the boundary to all faces. To deal with this, I modified the materialization to compute the number of successors based not only on the number of successor nodes but on the amount of entry reuse as well.
  2. Small subexpressions (len(materialized_predecessors) < 2) that are used by a large number of materialized nodes. In mirgecom simulations, this occurs most frequently on powers of temperature, where the subexpression only depends on one materialized array (temperature) and has a single computational node (**), but due to a large amount of reuse (up to 30x) it contributes a lot of flops. I attempted to fix this by modifying the MPMS materialization criterion slightly (nsuccessors > 1 and len(materialized_predecessors) > 1 changed to nsuccessors > 1 and nsuccessors + len(materialized_predecessors) >= 4). This may have the side effect of materializing some operations in the DAG that don't strictly need to be (Reshapes). Not sure yet how much this matters.

Results for gas-in-box:

Excess flops vs. all arrays materialized Arrays materialized
base 45.2% 6.4%
fix 1 35.4% 6.9%
fix 1 + fix 2 -6.9% 8.8%

Results for wave (which wasn't that bad to begin with):

Excess flops vs. all arrays materialized Arrays materialized
base 7.0% 11.0%
fix 1 1.4% 12.3%
fix 1 + fix 2 -1.0% 17.8%

I'd still like to try this on the full prediction case to see if it behaves any differently from gas-in-box, and also to see how these changes affect timestep time.

Note 1: I moved the materialization code to a new module, because I was originally planning to add this as a separate materializer. I ultimately decided to tweak the existing one instead, but I left the new module. I figure transform/__init__.py is too crowded anyway, it could use some splitting up.

Note 2: Due to the module change, it's likely best to read (and merge) this PR commit by commit.

@majosm majosm force-pushed the refine-mpms-materialization branch 2 times, most recently from 3e5b747 to 3326f49 Compare November 22, 2025 01:23
@majosm
Copy link
Collaborator Author

majosm commented Nov 22, 2025

FWIW, here's the data from KS3D (with some slight tweaks to make it more like the actual prediction case) running serially on Tuolumne:

Excess flops vs. all arrays materialized Arrays materialized Compile time (s) Timestep time (s)
base 76.1% 3.5% 497.2 0.0626
fix 1 69.3% 3.8% 553.7 0.0610
fix 1 + fix 2 -2.0% 5.4% 782.1 0.0626

Turns out these tweaks don't really help reduce the timestep time. I'm guessing since they're reducing the flop count (as well as potentially increasing the number of memory accesses), they're just reducing arithmetic intensity?

I'm tempted to abandon Fix 2 since it has such a big effect on compile time. Fix 1 might be worth keeping though...

@majosm majosm force-pushed the refine-mpms-materialization branch from 3326f49 to 5423002 Compare December 10, 2025 21:04
@majosm majosm marked this pull request as ready for review December 10, 2025 21:40
@majosm majosm requested a review from inducer December 10, 2025 21:40
@majosm
Copy link
Collaborator Author

majosm commented Dec 10, 2025

I removed fix 2 from this PR for now. It's still available at majosm/refine-mpms-materialization-part2.

Otherwise, this looks ready for review (still best to look at it commit by commit).


nsuccessors = 0
for successor in successors:
# Handle indexing with heavy reuse, if the sizes are known ahead of time
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add specific explanation here of the case where this is useful.

@majosm majosm force-pushed the refine-mpms-materialization branch 3 times, most recently from 1339d7a to 6b5356f Compare December 20, 2025 00:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants