Add create new mlp variation with two gates by klei22 · Pull Request #795 · ReaLLMASIC/ReaLLM-Forge

klei22 · 2026-04-11T17:21:28Z

This pull request introduces a new MLP variant called swiglu_2gate_pre_act, expands the configuration and experiment setup to compare this and other MLP activation variants under parameter-matched conditions, and makes minor improvements to argument parsing and configuration handling. The main focus is on enabling and evaluating the new two-gate SwiGLU pre-activation architecture alongside other variants.

Key changes:

New MLP variant and integration

Implemented the SwiGLUTwoGatesPreAct class in mlp_variations.py, which introduces a SwiGLU variant with two gates applied before the non-linearity, including all relevant quantization, normalization, and offset logic. This is now available as swiglu_2gate_pre_act in the activation dictionary and MLP instantiation logic. [1] [2] [3]
Added "swiglu_2gate_pre_act" to the list of supported MLP variants in the argument parser in train_args.py, so it can be selected via CLI/config.

Experimental configuration and comparison

Added a new experiment YAML file mlp_equal_params_vs_swiglu_minipile.yaml that sets up a comprehensive comparison of regular SwiGLU, dual-path, and parameter-matched plain MLP variants (with various activations) on the minipile dataset. This includes rationale for parameter matching, and defines multiple named groups for systematic exploration.

Configuration and usability improvements

Changed the default device argument in train_args.py from 'cuda' to 'cuda:0' for more explicit device selection.
Added support for l2_norm_print_dims in the MLP config initialization for potential debugging or logging.

Copilot

Pull request overview

This PR adds a new MLP variant (swiglu_2gate_pre_act) to the model-variation system and wires it into CLI/config + exploration tooling to enable parameter-matched comparisons against existing MLP/SwiGLU variants.

Changes:

Introduces SwiGLUTwoGatesPreAct and registers it as swiglu_2gate_pre_act in the MLP factory.
Extends CLI argument choices to allow selecting the new MLP variant and makes the default --device more explicit (cuda:0).
Adds a new exploration YAML to run parameter-matched sweeps comparing SwiGLU, the new 2-gate variant, and plain MLP activations on minipile.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
variations/mlp_variations.py	Implements and registers the new 2-gate pre-activation SwiGLU MLP module.
train_args.py	Exposes the new MLP variant in CLI choices and updates default device string.
explorations/mlp_equal_params_vs_swiglu_minipile.yaml	Adds an experiment grid to compare MLP activation variants under (approx.) parameter-matched settings.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-11T17:24:30Z

+        if self.quantization_mlp_dict["quantize_mlp_act_activation_input"]:
+            num_bits = self.quantization_mlp_dict["quantize_mlp_act_activation_input_bits"]
+            quant_method = self.quantization_mlp_dict["activations_quant_method"]
+            x_main = fake_quantize_act(self, "mlp_act_activation_input", x_main, num_bits, quant_method, iter_num)
+
+        gate1 = self._up_project(x, self.c_fc_gate1)
+        gate2 = self._up_project(x, self.c_fc_gate2)
+
+        x_out = (x_main * gate1) * gate2


In SwiGLUTwoGatesPreAct.forward, quantize_mlp_act_activation_input is applied to x_main, but the activation function is actually applied to x_out (after multiplying by both gates). This means the “activation input” quantization is quantizing the wrong tensor for this variant and will make quantization runs inconsistent with other MLPs (e.g., DualPathSwiglu quantizes the true pre-activation tensor). Apply this quantization step to x_out right before calling self.activation_variant(...) (and consider whether the gates themselves should be quantized under the same flag).

Suggested change

if self.quantization_mlp_dict["quantize_mlp_act_activation_input"]:

num_bits = self.quantization_mlp_dict["quantize_mlp_act_activation_input_bits"]

quant_method = self.quantization_mlp_dict["activations_quant_method"]

x_main = fake_quantize_act(self, "mlp_act_activation_input", x_main, num_bits, quant_method, iter_num)

gate1 = self._up_project(x, self.c_fc_gate1)

gate2 = self._up_project(x, self.c_fc_gate2)

x_out = (x_main * gate1) * gate2

gate1 = self._up_project(x, self.c_fc_gate1)

gate2 = self._up_project(x, self.c_fc_gate2)

x_out = (x_main * gate1) * gate2

if self.quantization_mlp_dict["quantize_mlp_act_activation_input"]:

num_bits = self.quantization_mlp_dict["quantize_mlp_act_activation_input_bits"]

quant_method = self.quantization_mlp_dict["activations_quant_method"]

x_out = fake_quantize_act(self, "mlp_act_activation_input", x_out, num_bits, quant_method, iter_num)

klei22 and others added 3 commits April 10, 2026 19:01

Add two-gate pre-activation SwiGLU variant and minipile config

ed8cfe5

Add dual gate swiglu and comparison sweep

ea8c23e

Add fix for Dual Path MLP

b104ec1

klei22 requested review from Copilot and gkielian April 11, 2026 17:21

Copilot started reviewing on behalf of klei22 April 11, 2026 17:22 View session

Copilot AI reviewed Apr 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add create new mlp variation with two gates#795

Add create new mlp variation with two gates#795
klei22 wants to merge 3 commits intoReaLLMASIC:masterfrom
klei22:add-create-new-mlp-variation-with-two-gates

klei22 commented Apr 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

klei22 commented Apr 11, 2026

New MLP variant and integration

Experimental configuration and comparison

Configuration and usability improvements

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants