[Relax][PyTorch] Support gru op for ExportedProgram importer #18360

tlopex · 2025-10-06T05:17:47Z

This pr supports gru.input for ExportedProgram importer.
This links to issue #18356

gemini-code-assist · 2025-10-06T05:18:02Z

Summary of Changes

Hello @tlopex, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the PyTorch ExportedProgram importer by adding comprehensive support for the GRU (Gated Recurrent Unit) operator. This enables the successful translation and execution of PyTorch models that incorporate GRU layers, thereby broadening the range of supported recurrent neural network architectures within the system. The implementation meticulously reconstructs the GRU's internal logic using Relax operations and includes dedicated tests to ensure functional correctness.

Highlights

GRU Operator Support: Implemented the _gru method in the ExportedProgramTranslator to handle PyTorch's gru.input operation, enabling the translation of GRU layers.
GRU Logic Implementation: The _gru method includes detailed logic for GRU computation, such as parsing input arguments, initializing hidden states, handling single and multi-layer configurations, and performing gate calculations (reset, update, new gates) using Relax operations.
Bidirectional GRU Limitation: Explicitly added a NotImplementedError for bidirectional GRU, indicating that this feature is not yet supported.
Test Coverage: Added new test cases in test_frontend_from_exported_program.py to verify the correctness of the GRU implementation for both batch_first=True and batch_first=False scenarios.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds support for the GRU operation in the PyTorch ExportedProgram importer. The implementation correctly follows the GRU equations and includes basic tests. However, I've identified a significant performance issue in the GRU implementation due to redundant computations within a loop. Additionally, the tests could be more comprehensive to cover multi-layer GRUs and other configurations. I've provided specific suggestions to address these points.

gemini-code-assist · 2025-10-06T05:19:57Z

python/tvm/relax/frontend/torch/exported_program_translator.py

+                # Reset gate weights
+                weight_ih_r = self.block_builder.emit(
+                    relax.op.strided_slice(weight_ih, axes=[0], begin=[0], end=[gate_size])
+                )
+                weight_hh_r = self.block_builder.emit(
+                    relax.op.strided_slice(weight_hh, axes=[0], begin=[0], end=[gate_size])
+                )
+
+                # Update gate weights
+                weight_ih_z = self.block_builder.emit(
+                    relax.op.strided_slice(
+                        weight_ih, axes=[0], begin=[gate_size], end=[2 * gate_size]
+                    )
+                )
+                weight_hh_z = self.block_builder.emit(
+                    relax.op.strided_slice(
+                        weight_hh, axes=[0], begin=[gate_size], end=[2 * gate_size]
+                    )
+                )
+
+                # New gate weights
+                weight_ih_n = self.block_builder.emit(
+                    relax.op.strided_slice(
+                        weight_ih, axes=[0], begin=[2 * gate_size], end=[3 * gate_size]
+                    )
+                )
+                weight_hh_n = self.block_builder.emit(
+                    relax.op.strided_slice(
+                        weight_hh, axes=[0], begin=[2 * gate_size], end=[3 * gate_size]
+                    )
+                )
+
+                # Transpose weights for matmul
+                weight_ih_r_t = self.block_builder.emit(
+                    relax.op.permute_dims(weight_ih_r, axes=[1, 0])
+                )
+                weight_hh_r_t = self.block_builder.emit(
+                    relax.op.permute_dims(weight_hh_r, axes=[1, 0])
+                )
+                weight_ih_z_t = self.block_builder.emit(
+                    relax.op.permute_dims(weight_ih_z, axes=[1, 0])
+                )
+                weight_hh_z_t = self.block_builder.emit(
+                    relax.op.permute_dims(weight_hh_z, axes=[1, 0])
+                )
+                weight_ih_n_t = self.block_builder.emit(
+                    relax.op.permute_dims(weight_ih_n, axes=[1, 0])
+                )
+                weight_hh_n_t = self.block_builder.emit(
+                    relax.op.permute_dims(weight_hh_n, axes=[1, 0])
+                )


The weight slicing and transposition operations are performed inside the time-step loop (for t in range(seq_len)). Since these weights do not depend on the time step t, these computations are redundant and highly inefficient, especially for long sequences. They should be hoisted out of the time-step loop and computed only once per layer. The same applies to bias slicing (e.g., lines 583-588, 607-616, 635-644). This will result in a much smaller and more efficient computation graph.

gemini-code-assist · 2025-10-06T05:19:57Z

python/tvm/relax/frontend/torch/exported_program_translator.py

+            if num_layers > 1:
+                # Multi-layer: params[0] is first layer's weight_ih
+                weight_ih = params[0]
+            else:
+                # Single layer: params[0] is weight_ih
+                weight_ih = params[0]


This if/else block is redundant as both branches execute the same code (weight_ih = params[0]). This can be simplified to improve code clarity and maintainability.

# For multi-layer, we need to extract the first layer's weights # to determine hidden size. params[0] is the first layer's weight_ih # for both single and multi-layer cases. weight_ih = params[0]

gemini-code-assist · 2025-10-06T05:19:58Z

tests/python/relax/test_frontend_from_exported_program.py

+def test_gru():
+    class BasicGRU(nn.Module):
+        def __init__(self):
+            super().__init__()
+            self.gru = nn.GRU(
+                input_size=4,
+                hidden_size=8,
+                num_layers=1,
+                batch_first=True,
+                bidirectional=False,
+            )
+
+        def forward(self, x):
+            y, _ = self.gru(x)
+            return y
+
+    torch.manual_seed(42)
+    x = torch.randn(2, 3, 4, dtype=torch.float32)
+    model = BasicGRU()
+    with torch.no_grad():
+        pytorch_output = model(x)
+    exported_program = export(model, args=(x,))
+    mod = from_exported_program(exported_program)
+    target = tvm.target.Target("llvm")
+    ex = relax.build(mod, target)
+    vm = relax.VirtualMachine(ex, tvm.cpu())
+    x_tvm = tvm.runtime.tensor(x.numpy())
+    tvm_output = vm["main"](x_tvm)
+    if hasattr(tvm_output, "numpy"):
+        tvm_output_np = tvm_output.numpy()
+    else:
+        tvm_output_np = tvm_output[0].numpy()
+    assert (
+        pytorch_output.shape == tvm_output_np.shape
+    ), f"Shape mismatch: PyTorch {pytorch_output.shape} vs TVM {tvm_output_np.shape}"
+    np.testing.assert_allclose(pytorch_output.numpy(), tvm_output_np, rtol=1e-4, atol=1e-5)
+
+    class SeqFirstGRU(nn.Module):
+        def __init__(self):
+            super().__init__()
+            self.gru = nn.GRU(
+                input_size=3,
+                hidden_size=6,
+                num_layers=1,
+                batch_first=False,
+                bidirectional=False,
+            )
+
+        def forward(self, x):
+            y, _ = self.gru(x)
+            return y
+
+    torch.manual_seed(43)
+    x2 = torch.randn(4, 2, 3, dtype=torch.float32)
+    model2 = SeqFirstGRU()
+    with torch.no_grad():
+        pytorch_output2 = model2(x2)
+    exported_program2 = export(model2, args=(x2,))
+    mod2 = from_exported_program(exported_program2)
+    ex2 = relax.build(mod2, target)
+    vm2 = relax.VirtualMachine(ex2, tvm.cpu())
+    x2_tvm = tvm.runtime.tensor(x2.numpy())
+    tvm_output2 = vm2["main"](x2_tvm)
+    if hasattr(tvm_output2, "numpy"):
+        tvm_output2_np = tvm_output2.numpy()
+    else:
+        tvm_output2_np = tvm_output2[0].numpy()
+    assert pytorch_output2.shape == tvm_output2_np.shape
+    np.testing.assert_allclose(pytorch_output2.numpy(), tvm_output2_np, rtol=1e-4, atol=1e-5)


The test function test_gru contains two very similar blocks of code for testing BasicGRU (batch_first=True) and SeqFirstGRU (batch_first=False). This code duplication makes the test harder to read and maintain. Consider refactoring the common testing logic into a helper function that can be called for both GRU configurations. This helper could take the model class, input data, and other relevant parameters as arguments.

gemini-code-assist · 2025-10-06T05:19:58Z

tests/python/relax/test_frontend_from_exported_program.py

+def test_gru():
+    class BasicGRU(nn.Module):
+        def __init__(self):
+            super().__init__()
+            self.gru = nn.GRU(
+                input_size=4,
+                hidden_size=8,
+                num_layers=1,
+                batch_first=True,
+                bidirectional=False,
+            )
+
+        def forward(self, x):
+            y, _ = self.gru(x)
+            return y
+
+    torch.manual_seed(42)
+    x = torch.randn(2, 3, 4, dtype=torch.float32)
+    model = BasicGRU()
+    with torch.no_grad():
+        pytorch_output = model(x)
+    exported_program = export(model, args=(x,))
+    mod = from_exported_program(exported_program)
+    target = tvm.target.Target("llvm")
+    ex = relax.build(mod, target)
+    vm = relax.VirtualMachine(ex, tvm.cpu())
+    x_tvm = tvm.runtime.tensor(x.numpy())
+    tvm_output = vm["main"](x_tvm)
+    if hasattr(tvm_output, "numpy"):
+        tvm_output_np = tvm_output.numpy()
+    else:
+        tvm_output_np = tvm_output[0].numpy()
+    assert (
+        pytorch_output.shape == tvm_output_np.shape
+    ), f"Shape mismatch: PyTorch {pytorch_output.shape} vs TVM {tvm_output_np.shape}"
+    np.testing.assert_allclose(pytorch_output.numpy(), tvm_output_np, rtol=1e-4, atol=1e-5)
+
+    class SeqFirstGRU(nn.Module):
+        def __init__(self):
+            super().__init__()
+            self.gru = nn.GRU(
+                input_size=3,
+                hidden_size=6,
+                num_layers=1,
+                batch_first=False,
+                bidirectional=False,
+            )
+
+        def forward(self, x):
+            y, _ = self.gru(x)
+            return y
+
+    torch.manual_seed(43)
+    x2 = torch.randn(4, 2, 3, dtype=torch.float32)
+    model2 = SeqFirstGRU()
+    with torch.no_grad():
+        pytorch_output2 = model2(x2)
+    exported_program2 = export(model2, args=(x2,))
+    mod2 = from_exported_program(exported_program2)
+    ex2 = relax.build(mod2, target)
+    vm2 = relax.VirtualMachine(ex2, tvm.cpu())
+    x2_tvm = tvm.runtime.tensor(x2.numpy())
+    tvm_output2 = vm2["main"](x2_tvm)
+    if hasattr(tvm_output2, "numpy"):
+        tvm_output2_np = tvm_output2.numpy()
+    else:
+        tvm_output2_np = tvm_output2[0].numpy()
+    assert pytorch_output2.shape == tvm_output2_np.shape
+    np.testing.assert_allclose(pytorch_output2.numpy(), tvm_output2_np, rtol=1e-4, atol=1e-5)


The _gru implementation supports multi-layer GRUs, GRUs with an initial hidden state (hx), and GRUs with/without biases. However, the tests only cover single-layer GRUs without an initial hidden state and with biases. To ensure the implementation is robust and prevent future regressions, please add test cases for:

Multi-layer GRU (num_layers > 1).

GRU with a provided initial hidden state (hx).

GRU without biases (bias=False in nn.GRU).

mshr-h · 2025-10-06T09:14:16Z

@tlopex
The root problem is in

tvm/python/tvm/relax/frontend/torch/exported_program_translator.py

Line 964 in f30b29c

exported_program.run_decompositions()

, where run_decompositions() isn't an in-place operation. We need to change to something like this so that the decomposed ep is used.

    decomposed_ep = exported_program.run_decompositions()

    return ExportedProgramImporter().from_exported_program(
        decomposed_ep,
        keep_params_as_input,
        unwrap_unit_return_tuple,
        no_bind_return_tuple,
    )

I wrote the original code and I think it was wrong.

mshr-h · 2025-10-06T09:26:04Z

With proper decomposition, almost 40% (75 failed, 101 passed, 2 warnings) of the exported program frontend tests will fail so I think it might be easier to migrate gradually.

tlopex · 2025-10-06T14:34:35Z

@mshr-h Got it! I'll consider how to update it. Shall we merge this pr first and then get on it?

mshr-h · 2025-10-06T15:00:04Z

@tlopex Okay!

tlopex · 2025-10-24T01:38:23Z

@mshr-h Sorry for late reply. I checked the code and found there will be 40% of the exported program frontend tests will fail if modified correctly. Maybe the first step is try to support unsupported ops because of decomposition step by step? But during the fixing, the tests may remain fail. If you think this is good to do, I can get on it this week

mshr-h · 2025-10-24T18:32:19Z

@tlopex Thanks.
One possible approach is to add a new importer flag, such as run_ep_decomposition, defaulting to False. And then gradually switch it to True in unittests. Once every unittests has been set to True, we can remove the flag entirely.

tlopex · 2025-10-24T18:50:22Z

Get it! Let me start fixing it @mshr-h

tlopex added 5 commits October 2, 2025 16:37

finish1

a78fd05

finish2

c8503c4

add unittest

5f8f9d5

Merge branch 'main' of https://github.com/apache/tvm

87b5d25

finish1

a4a116c

gemini-code-assist bot reviewed Oct 6, 2025

View reviewed changes

mshr-h approved these changes Oct 6, 2025

View reviewed changes

mshr-h merged commit 3b8d324 into apache:main Oct 6, 2025
13 checks passed

ysh329 mentioned this pull request Oct 24, 2025

[Release] v0.22.0 Release Candidate Notes #18391

Closed

[Relax][PyTorch] Support gru op for ExportedProgram importer #18360

[Relax][PyTorch] Support gru op for ExportedProgram importer #18360

Uh oh!

Conversation

tlopex commented Oct 6, 2025

Uh oh!

gemini-code-assist bot commented Oct 6, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

mshr-h commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mshr-h commented Oct 6, 2025

Uh oh!

tlopex commented Oct 6, 2025

Uh oh!

Uh oh!

mshr-h commented Oct 6, 2025

Uh oh!

tlopex commented Oct 24, 2025

Uh oh!

mshr-h commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tlopex commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mshr-h commented Oct 6, 2025 •

edited

Loading

mshr-h commented Oct 24, 2025 •

edited

Loading