Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 24 additions & 20 deletions src/coreclr/jit/lowerwasm.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -552,6 +552,9 @@ void Lowering::AfterLowerBlocks()
{
assert(IsDataFlowRoot(node));
node = StackifyTree(node);
// We don't track liveness of temporaries more precisely since introducing eairler uses
// may interfere with later (by that point already inserted and stackified) stores.
Comment on lines +555 to +556
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in the new comment: "eairler" should be "earlier" (and consider tweaking wording for clarity since this comment explains the rationale for the new lifetime behavior).

Suggested change
// We don't track liveness of temporaries more precisely since introducing eairler uses
// may interfere with later (by that point already inserted and stackified) stores.
// We don't track temporary liveness more precisely because introducing earlier uses
// may interfere with stores that were already inserted and stackified later.

Copilot uses AI. Check for mistakes.
ReleaseTemporaries();
}
Comment on lines 554 to 558
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReleaseTemporaries() is called after every dataflow root, but it rebuilds the entire available-temp lists by iterating from m_minimumTempLclNum up to lvaCount each time. This changes the algorithm from freeing a single temp to O(totalTemps) work per root and could regress JIT throughput on large methods; consider tracking only temps used/created while stackifying the current root (e.g., with the intended bitset / min-max range) and releasing just those.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Analysis shows that this probably isn't a huge issue since the number of stackifier temps will be small, but this could be worth doing.

m_lower->m_block = nullptr;

Expand All @@ -567,7 +570,6 @@ void Lowering::AfterLowerBlocks()
// Simple greedy algorithm working backwards. The invariant is that the stack top must be placed right next
// to (in normal linear order - before) the node we last stackified.
m_stack.Push(&root);
ReleaseTemporariesDefinedBy(root);

GenTree* lastStackified = root->gtNext;
while (m_stack.Height() != initialDepth)
Expand Down Expand Up @@ -668,8 +670,6 @@ void Lowering::AfterLowerBlocks()
*use = lclNode;

JITDUMP("Replaced [%06u] with a temporary:\n", Compiler::dspTreeID(node));
DISPNODE(node);
DISPNODE(lclNode);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove the dumps?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, accidental, will revert.


if ((node->gtLIRFlags & LIR::Flags::MultiplyUsed) == LIR::Flags::MultiplyUsed)
{
Expand Down Expand Up @@ -704,33 +704,37 @@ void Lowering::AfterLowerBlocks()
return lclNum;
}

void ReleaseTemporariesDefinedBy(GenTree* node)
void ReleaseTemporaries()
{
// We rely in this function on the lifetime of temporaries beginning (recall this is backwards traversal)
// at exactly "node"'s position, and not shrinking or extending after this call. This is currently true
// because we never move dataflow roots, and we only begin processing them after all subsequent nodes
// have already been stackified and thus won't move either.
assert(IsDataFlowRoot(node));
if (!node->OperIs(GT_STORE_LCL_VAR))
if (m_minimumTempLclNum == m_compiler->lvaCount)
{
// No temporaries were created
return;
}
assert(m_minimumTempLclNum < m_compiler->lvaCount);

unsigned lclNum = node->AsLclVar()->GetLclNum();
if (lclNum < m_minimumTempLclNum)
// Recycle all available temporaries as unused nodes
for (int i = 0; i < TYP_COUNT; i++)
{
return;
while (m_availableTemps[i] != nullptr)
{
Temporary* temp = Remove(&m_availableTemps[i]);
Append(&m_unusedTempNodes, temp);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understanding you're working with the model from the previous algorithm where we had piece-wise node addition and removal, but at this point it would make more sense to use the more traditional "available/in-use" free lists, i. e. make m_unusedTempNodes per-type (m_inUseTempNodes), allocate the nodes when requesting temporaries, and then only do the remove/append here. That would be O(1) plus more obvious I think.

}
}

Temporary* local = Remove(&m_unusedTempNodes); // See if we have any free nodes in the pool.
if (local == nullptr)
for (unsigned lclNum = m_minimumTempLclNum; lclNum < m_compiler->lvaCount; lclNum++)
{
local = new (m_compiler, CMK_Lower) Temporary();
}
local->LclNum = lclNum;
Temporary* local = Remove(&m_unusedTempNodes); // See if we have any free nodes in the pool.
if (local == nullptr)
{
local = new (m_compiler, CMK_Lower) Temporary();
}
local->LclNum = lclNum;

JITDUMP("Temporary V%02u is now free and can be re-used\n", lclNum);
Append(&m_availableTemps[genActualType(node->TypeGet())], local);
JITDUMP("Temporary V%02u is now free and can be re-used\n", lclNum);
Append(&m_availableTemps[genActualType(m_compiler->lvaGetDesc(lclNum)->TypeGet())], local);
}
}
Comment on lines +707 to 738
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description says this fix “adds a bit set which tracks temporaries that can be freed after tree processing completes”, but the current implementation doesn’t add such a bitset and instead recycles all temps in [m_minimumTempLclNum, lvaCount) on every root. Either update the description to match the implementation, or implement the described per-tree tracking to avoid unintended behavior/perf costs.

Copilot uses AI. Check for mistakes.

Temporary* Remove(Temporary** pTemps)
Expand Down
Loading