feat: Neuron initialization and rescaling from Variance Transfert by TheoRudkiewicz · Pull Request #237 · growingnet/gromo

TheoRudkiewicz · 2026-04-04T11:28:31Z

Adds variance-transfer (VT) weight rescaling and (V,V)/(Z,-Z) neuron pairing to the growing-module extension workflow, with a comprehensive test suite validating correctness and edge cases.

See https://theorudkiewicz.github.io/gromo/tech_notes.html for updated tech notes.

codecov · 2026-04-04T11:30:11Z

Codecov Report

❌ Patch coverage is 93.38843% with 8 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/gromo/modules/growing_module.py	92.38%	4 Missing and 4 partials ⚠️

Flag	Coverage Δ
unittests	`94.31% <93.38%> (-0.03%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
src/gromo/containers/growing_block.py	`98.94% <100.00%> (+0.02%)`	⬆️
src/gromo/modules/conv2d_growing_module.py	`92.04% <100.00%> (+0.03%)`	⬆️
src/gromo/modules/linear_growing_module.py	`98.61% <100.00%> (+0.01%)`	⬆️
src/gromo/modules/growing_module.py	`96.22% <92.38%> (-0.42%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

Adds variance-transfer (VT) weight rescaling and (V,V)/(Z,-Z) neuron pairing to the growing-module extension workflow, with a comprehensive test suite validating correctness and edge cases.

Changes:

Implemented VT rescaling strategies and neuron pairing utilities in GrowingModule, and integrated them into create_layer_extensions.
Exposed rescaling/pairing controls at the GrowingBlock container level for block-wide usage and FOGRO-style workflows.
Added an extensive new test module covering smoke, semantic, variance, BatchNorm, edge cases, and standalone method usage.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
`src/gromo/modules/growing_module.py`	Adds `apply_rescaling`, `apply_neuron_pairing`, and integrates rescaling/pairing into extension creation.
`src/gromo/containers/growing_block.py`	Threads new `rescaling` / `neuron_pairing` options through the block API and adds delegation helpers.
`tests/test_variance_transfer.py`	New tests validating VT rescaling + pairing behavior, including BatchNorm running-stat scaling and edge cases.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-04T14:27:38Z

+            self.apply_neuron_pairing(neuron_pairing=neuron_pairing)
+


create_layer_extensions(..., neuron_pairing="vv_z_negz") doubles extended_output_layer / extended_input_layer sizes, but nothing updates extension_size bookkeeping (e.g., eigenvalues_extension or a dedicated field). Downstream, apply_change() uses extension_size (or infers it from eigenvalues_extension) to grow sized post_layer_functions (e.g., GrowingBatchNorm / GrowableIdentity). If pairing is enabled and callers keep passing the pre-pairing size (or pass None), the model can end up with mismatched channel counts in post-layer functions.

Suggested change

self.apply_neuron_pairing(neuron_pairing=neuron_pairing)

self.apply_neuron_pairing(neuron_pairing=neuron_pairing)

self._sync_extension_size_bookkeeping()

def _sync_extension_size_bookkeeping(self) -> None:

"""

Synchronize extension-size bookkeeping with the current extension layers.

Neuron pairing can change the effective number of added neurons by

modifying the extension layers after they have been created. Downstream

code may rely on bookkeeping fields such as ``extension_size`` or infer

the size from ``eigenvalues_extension``, so we keep them aligned with the

actual extension-layer shape here.

"""

def _layer_extension_size(layer: torch.nn.Module | None) -> int | None:

if layer is None:

return None

if hasattr(layer, "out_features"):

return int(layer.out_features)

if hasattr(layer, "out_channels"):

return int(layer.out_channels)

weight = getattr(layer, "weight", None)

if isinstance(weight, torch.Tensor) and weight.ndim > 0:

return int(weight.shape[0])

return None

extension_size = _layer_extension_size(

getattr(self, "extended_input_layer", None)

)

if extension_size is None:

previous_module = getattr(self, "previous_module", None)

extension_size = _layer_extension_size(

getattr(previous_module, "extended_output_layer", None)

)

if extension_size is None:

return

self.extension_size = extension_size

eigenvalues_extension = getattr(self, "eigenvalues_extension", None)

if isinstance(eigenvalues_extension, torch.Tensor):

current_size = int(eigenvalues_extension.numel())

if current_size == 0 or current_size == extension_size:

return

if extension_size % current_size == 0:

repeat_factor = extension_size // current_size

self.eigenvalues_extension = eigenvalues_extension.repeat(

repeat_factor

)

I think it's not a problem since for now there is no plan to use pairing with eigenvalues extension used.

stephane-rivaud

I did not preview the docs changes nor did I thoroughly investigate the test cases, but the functional changes look good. I approve this PR.

TheoRudkiewicz · 2026-04-20T20:30:59Z

Before merging, I need to check that initialisation scale is good even with pairing.

stephane-rivaud · 2026-04-20T20:45:48Z

Before merging, I need to check that initialisation scale is good even with pairing.

In apply_rescaling, we roughly define:

effective_size = 2 * size if neuron_paring == vt_negz else size
fan_in = old_fan_in + effective_size * feature_size

The bottom of the line is that we actually consider a tensor of size $2n$ in the resulting calculations.
This could be removed if we essentially prevented $n$ from being odd because we are indeed adding 2*n neurons.
I really wonder how the paper actually implements it.

TheoRudkiewicz · 2026-04-21T12:22:37Z

Before merging, I need to check that initialization scale is good even with pairing.

There was indeed a problem, I should have fixed it in d2d0d41129562d8e45b330377d01eaa79a3f5fa1.

TheoRudkiewicz · 2026-04-21T12:23:43Z

Before merging, I need to check that initialisation scale is good even with pairing.

In apply_rescaling, we roughly define:
* `effective_size = 2 * size if neuron_paring == vt_negz else size`

* `fan_in = old_fan_in + effective_size * feature_size`
The bottom of the line is that we actually consider a tensor of size 2 n in the resulting calculations. This could be removed if we essentially prevented n from being odd because we are indeed adding 2*n neurons. I really wonder how the paper actually implements it.

Yes I chose to ask for growth of $n$ and with neuron pairing get a growth of $2n$.

TheoRudkiewicz · 2026-04-21T16:14:01Z

Remark from @alexdavey : we should probably include the "gain" in the re-scaling (even if variance transfer does not do it ?).

alexdavey

Thanks @TheoRudkiewicz, here are a few comments :)

alexdavey · 2026-04-18T16:52:27Z

+
+        * ``"default_vt"`` (Strategy A): beta = sqrt(fan_in_old / fan_in_new),
+          alpha = 1 (the previous layer input is not extended).
+        * ``"vt_constraint_old_shape"`` (Strategy B): alpha and beta chosen so


Why would we want to enforce variance with respect to the previous fan_in? If we consider a sequence of growth steps, the behaviour we get is:

default_vt: Variance \propto 1/current_fan_in (with no training).

vt_constraint_new_shape: Variance \propto 1/current_fan_in.

vt_constraint_old_shape: Variance \propto 1/penultimate_fan_in.

The behaviour of the last option vt_constraint_old_shape does not really make sense to me.

Also: Should we consider making the gain a parameter in the constraint strategies? So that we can use e.g. kaiming sqrt(2)/fan_in instead?

Why ?
Short answer it's the one propose by the paper VT.
Long answer: https://theorudkiewicz.github.io/gromo/tech_notes/variance_transfer.html#part-3-combined-analysis

About the gain parameter: probably.

TheoRudkiewicz · 2026-04-24T13:34:52Z

Before merging, I need to check that initialisation scale is good even with pairing.

In apply_rescaling, we roughly define:
* `effective_size = 2 * size if neuron_paring == vt_negz else size`

* `fan_in = old_fan_in + effective_size * feature_size`
The bottom of the line is that we actually consider a tensor of size 2 n in the resulting calculations. This could be removed if we essentially prevented n from being odd because we are indeed adding 2*n neurons. I really wonder how the paper actually implements it.
Before merging, I need to check that initialisation scale is good even with pairing.

In apply_rescaling, we roughly define:
* `effective_size = 2 * size if neuron_paring == vt_negz else size`

* `fan_in = old_fan_in + effective_size * feature_size`
The bottom of the line is that we actually consider a tensor of size 2 n in the resulting calculations. This could be removed if we essentially prevented n from being odd because we are indeed adding 2*n neurons. I really wonder how the paper actually implements it.
Yes I chose to ask for growth of n and with neuron pairing get a growth of 2 n .

In the end we add $n$ neuron if we ask for $n$ neurons even with pairing.

TheoRudkiewicz · 2026-04-24T13:36:57Z

Remark from @alexdavey : we should probably include the "gain" in the re-scaling (even if variance transfer does not do it ?).

I leave this for future work as it should be integrated in many places like create_layer_extensions with kaiming, apply_rescaling and normalize_optimal_updates fallback.

…ptation

Co-authored-by: Copilot <copilot@github.com>

…ven with pairing

Co-authored-by: Copilot <copilot@github.com>

TheoRudkiewicz requested a review from Copilot April 4, 2026 14:20

Copilot started reviewing on behalf of TheoRudkiewicz April 4, 2026 14:21 View session

Copilot AI reviewed Apr 4, 2026

View reviewed changes

TheoRudkiewicz changed the title ~~feat: Neuron initialization an drescaling from Variance Transfert~~ feat: Neuron initialization and rescaling from Variance Transfert Apr 4, 2026

TheoRudkiewicz self-assigned this Apr 4, 2026

TheoRudkiewicz requested review from alexdavey, julien-mille and stephane-rivaud April 13, 2026 07:47

TheoRudkiewicz force-pushed the vt-init branch from 1132b9b to a2f0f58 Compare April 13, 2026 07:56

stephane-rivaud approved these changes Apr 20, 2026

View reviewed changes

alexdavey reviewed Apr 22, 2026

View reviewed changes

TheoRudkiewicz force-pushed the vt-init branch from d2d0d41 to 0a0ae07 Compare April 23, 2026 22:37

TheoRudkiewicz requested a review from alexdavey April 24, 2026 13:37

TheoRudkiewicz added 9 commits April 28, 2026 15:17

enh: Implementation of re-scaling and neuron pairing

4f16891

test: Test for variance transfer

66db909

fix: Fix edge case of 0 weights

4347b9f

fix: Rescale second potential batch norm

f79e82c

enh: Minor

e6dde12

tets: Complete the coverage

fdf59d8

fix: Fix the bias offset

f91b427

docs: Correct a docstring

e4bd625

docs: Update tech notes

03f147f

TheoRudkiewicz and others added 11 commits April 28, 2026 15:17

docs: What's new

a4ab385

enh: Add the possibility to add noise

d96c5d8

test: Add a test to show that neuron pairing require apply change ada…

100b968

…ptation

!fix: Correct the kaiming fan_in in when nueron pairing is used

09de2c1

style: Fixed the semicolon spacing

0bf2891

enh: Remove useless safeguard

253c8f9

enh: Allow to use get_fan_in_from_layer with a number of neurons

360a2c3

tests: Add tests for get_fan_in_from_layer

004598d

Co-authored-by: Copilot <copilot@github.com>

!enh: create_layer_extensions create the correct extension_size e…

f6a32d8

…ven with pairing

tests: Improve tests

d570256

Co-authored-by: Copilot <copilot@github.com>

docs: Fix typo

bc963e8

TheoRudkiewicz force-pushed the vt-init branch from 9397d45 to bc963e8 Compare April 28, 2026 13:17

-            self.apply_neuron_pairing(neuron_pairing=neuron_pairing)
+            self.apply_neuron_pairing(neuron_pairing=neuron_pairing)
+            self._sync_extension_size_bookkeeping()
+    def _sync_extension_size_bookkeeping(self) -> None:
+        """
+        Synchronize extension-size bookkeeping with the current extension layers.
+        Neuron pairing can change the effective number of added neurons by
+        modifying the extension layers after they have been created. Downstream
+        code may rely on bookkeeping fields such as ``extension_size`` or infer
+        the size from ``eigenvalues_extension``, so we keep them aligned with the
+        actual extension-layer shape here.
+        """
+        def _layer_extension_size(layer: torch.nn.Module | None) -> int | None:
+            if layer is None:
+                return None
+            if hasattr(layer, "out_features"):
+                return int(layer.out_features)
+            if hasattr(layer, "out_channels"):
+                return int(layer.out_channels)
+            weight = getattr(layer, "weight", None)
+            if isinstance(weight, torch.Tensor) and weight.ndim > 0:
+                return int(weight.shape[0])
+            return None
+        extension_size = _layer_extension_size(
+            getattr(self, "extended_input_layer", None)
+        )
+        if extension_size is None:
+            previous_module = getattr(self, "previous_module", None)
+            extension_size = _layer_extension_size(
+                getattr(previous_module, "extended_output_layer", None)
+            )
+        if extension_size is None:
+            return
+        self.extension_size = extension_size
+        eigenvalues_extension = getattr(self, "eigenvalues_extension", None)
+        if isinstance(eigenvalues_extension, torch.Tensor):
+            current_size = int(eigenvalues_extension.numel())
+            if current_size == 0 or current_size == extension_size:
+                return
+            if extension_size % current_size == 0:
+                repeat_factor = extension_size // current_size
+                self.eigenvalues_extension = eigenvalues_extension.repeat(
+                    repeat_factor
+                )

Conversation

TheoRudkiewicz commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

TheoRudkiewicz Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

stephane-rivaud left a comment

Choose a reason for hiding this comment

Uh oh!

TheoRudkiewicz commented Apr 20, 2026

Uh oh!

stephane-rivaud commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TheoRudkiewicz commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TheoRudkiewicz commented Apr 21, 2026

Uh oh!

TheoRudkiewicz commented Apr 21, 2026

Uh oh!

alexdavey left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

alexdavey Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

TheoRudkiewicz Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TheoRudkiewicz commented Apr 24, 2026

Uh oh!

TheoRudkiewicz commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

TheoRudkiewicz commented Apr 4, 2026 •

edited

Loading

codecov Bot commented Apr 4, 2026 •

edited

Loading

stephane-rivaud commented Apr 20, 2026 •

edited

Loading

TheoRudkiewicz commented Apr 21, 2026 •

edited

Loading