doc: NeST by stephane-rivaud · Pull Request #11 · growingnet/growing_wiki

stephane-rivaud · 2026-04-14T09:26:32Z

Summary

rewrite docs/algorithms/nest.rst into a fuller algorithm page with a TLDR, setup/notation, and clearer separation between connection growth, neuron growth, feature-map growth, and the grow-prune loop
clean up the notation so the page uses explicit l-1 / l-2 layer references, consistent batch indexing, and a shape-consistent neuron contribution update
add a compact empirical snapshot plus limitations/open questions while keeping the scope focused on the NeST page itself

Verification

make -C docs html

Notes

This PR intentionally contains only the docs/algorithms/nest.rst update.
Local/dev artifacts such as .env, .cursor/, and scratch images were left out of the PR.
Earlier branch-local workflow/tooling files were not included so the PR stays focused on the page integration.

Made with Cursor

Made-with: Cursor

Copilot

Pull request overview

Expands the NeST algorithm documentation into a full, self-contained page with clearer notation and a structured breakdown of NeST’s growth policies (connections, neurons, feature maps) and the grow–prune loop.

Changes:

Rewrites docs/algorithms/nest.rst with expanded sections (TLDR, setup/notation, connection growth, neuron growth, feature-map growth, training loop, snapshot, limitations).
Clarifies notation (layer indices, batch indexing) and presents shape-consistent update formulas.
Adds a compact “empirical snapshot” plus limitations/open questions scoped to NeST.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+
+   \epsilon &\sim \mathrm{Uniform}(\{-1, 1\}),\\
+   \psi_{i^*} &= \epsilon \, \operatorname{sgn}\!\left(B^{(l-2)}_{i^*,j^*}\right)\sqrt{\left|B^{(l-2)}_{i^*,j^*}\right|},\\
+   \omega_{j^*} &= \epsilon \sqrt{\left|B^{(l-2)}_{i^*,j^*}\right|},


+
+   \boldsymbol{\psi} &\leftarrow \alpha \, \boldsymbol{\psi} \, \frac{\bar{a}(\boldsymbol{W}^{(l-1)})}{\bar{a}(\boldsymbol{\psi})},\\
+   \boldsymbol{\omega} &\leftarrow \alpha \, \boldsymbol{\omega} \, \frac{\bar{a}(\boldsymbol{W}^{(l)})}{\bar{a}(\boldsymbol{\omega})}.


TheoRudkiewicz · 2026-04-14T18:56:57Z

+The full NeST method alternates the growth rules above with magnitude-based
+removal of weak connections and weak neurons. Effective
+(batch-normalized) weights may be used when judging magnitudes
+:cite:p:`daiNeSTNeuralNetwork2019`. As part of the broader


Suggested change

The full NeST method alternates the growth rules above with magnitude-based

removal of weak connections and weak neurons. Effective

(batch-normalized) weights may be used when judging magnitudes

:cite:p:`daiNeSTNeuralNetwork2019`. As part of the broader

The full NeST method first growths the network using the rules above and then does magnitude-based

removal of weak connections and weak neurons. Effective

(batch-normalized) weights may be used when judging magnitudes

:cite:p:`daiNeSTNeuralNetwork2019`. As part of the broader

In my understanding Nest does not alternate. It does growing then pruning, and does not cylce.

TheoRudkiewicz · 2026-04-14T18:58:55Z

+Empirical snapshot
+------------------
+
+Within this page's scope, the paper contributes three practically relevant
+messages. First, sparse growth can be organized around activation-gradient
+correlations :math:`\boldsymbol{B}` rather than function-preserving morphisms.
+Second, the one-sparse bridging rule is mainly pedagogical: the published
+algorithm aggregates over a top-:math:`\beta` set and then rescales by
+:math:`\alpha`. Third, feature-map growth is treated separately from the
+fully connected score, with an explicit candidate search over forward losses
+:cite:p:`daiNeSTNeuralNetwork2019`.


Does not seems particularly relevant to me.

TheoRudkiewicz · 2026-04-14T19:00:29Z

+- NeST mixes two different types of growth rules: activation-gradient scoring
+  for connections and neurons, but a forward loss comparison for feature maps.
+  This makes the method less uniform than purely function-preserving approaches.


Suggested change

- NeST mixes two different types of growth rules: activation-gradient scoring

for connections and neurons, but a forward loss comparison for feature maps.

This makes the method less uniform than purely function-preserving approaches.

- NeST proposes a close form formula for neuron addition in fully-connected layers but rely on trial/error for channels growth.

TheoRudkiewicz · 2026-04-14T19:00:58Z

+- The one-sparse neuron rule is useful for explanation, but the paper's
+  practical algorithm is denser and therefore somewhat less transparent.


I think it's already largely discussed above.

TheoRudkiewicz · 2026-04-14T19:01:46Z

+- The broader grow-prune loop raises the same scheduling questions discussed in
+  [[When to grow?|when_to_grow]] and [[Where to grow?|where_to_grow]]:
+  how often should growth be triggered, and where should sparse capacity be
+  added?


Yes but the main point is that in my understanding the full Nest pipeline does not use any gradient descent !

It seems they do gradient descent "and then retrain the whole DNN to recover its performance"

- Fix aligned math (aligned env) for Copilot/MathJax - Align synthesis narrative with paper: growth then pruning phases - Clarify gradient-based weight training vs growth initialization - Add policies table; document Policy 4 partial-area convolution pruning - Add compact experimental results from paper; tighten limitations Made-with: Cursor

- Add TikZ sources and SVGs for Policies 1–3 (connection, neuron, conv) - Wire docs Makefile html/stricthtml/livehtml to scripts/build_figures.sh - Add visual quality gate helpers (eval_tikz_constraints, summarize, run gate) - Embed figures in nest.rst under PR prose (bf4b1d2 baseline) Made-with: Cursor

- Expand nest.rst: blockquote TLDR, roadmap, notation, VT-style light/dark figures, growth-prune and optimization sections, results table, hyperparameters, Algorithm 1 gloss, split limitations vs open questions. - Add TikZ/SVG dark variants for Policies 1–3; document -dark naming in figures README. - Make build_figures.sh skip TikZ when pdflatex is absent if outputs exist; warn on stale .tex vs SVG; fail only when outputs missing. Made-with: Cursor

…thon3 in build_figures - Replace malformed grid results table with list-table (tab-nest-results) - Convert Policy 1–3 container figures to paired .. figure:: (light named, dark) - Add numrefs for figures and table; wire eq labels in policies table and prose - Clarify partial-area γ: percentile prose + Algorithm 2 order-statistic rule - Bridge sign convention between dL/dW and |B| scoring - Prefer python3 over python in build_figures.sh for .py generators Made-with: Cursor

Two .. figure:: directives each produced a numbered figcaption; only-light/dark hide images, not the extra <figure>. Revert Policy 1–3 blocks to the same container+two-images pattern as variance_transfer.rst and use :ref: in the intro. Made-with: Cursor

Made-with: Cursor

…ering) Made-with: Cursor

Made-with: Cursor

…ection Made-with: Cursor

doc: NeST

5f66078

Made-with: Cursor

Copilot AI review requested due to automatic review settings April 14, 2026 09:26

Copilot started reviewing on behalf of stephane-rivaud April 14, 2026 09:27 View session

Copilot AI reviewed Apr 14, 2026

View reviewed changes

TheoRudkiewicz reviewed Apr 14, 2026

View reviewed changes

stephane-rivaud added 13 commits April 16, 2026 14:28

docs(nest): T1 — dedupe intro scope and This page openings

1a11d86

Made-with: Cursor

docs(nest): T2 — growth/pruning phase H2 wrappers and heading demotions

e0ea909

Made-with: Cursor

docs(nest): T3 — magnitude pruning subsection (Policy 4)

5175edc

Made-with: Cursor

docs(nest): T4 — native figure directives and numref for NeST figures

3d4d284

Made-with: Cursor

docs(nest): T5 — polish batch-size wording, cdot, stopping criteria

24034fe

Made-with: Cursor

docs(nest): T1 — single captioned figure per concept (fix double-numb…

6d298f4

…ering) Made-with: Cursor

docs(nest): T2 — merge intro table/figure reference into one sentence

cb1122d

Made-with: Cursor

docs(nest): T3 — drop redundant Policy 4 framing in Partial-area subs…

327ae28

…ection Made-with: Cursor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc: NeST#11

doc: NeST#11
stephane-rivaud wants to merge 14 commits intomainfrom
pr/nest-page-clean

stephane-rivaud commented Apr 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

TheoRudkiewicz Apr 14, 2026

Uh oh!

TheoRudkiewicz Apr 14, 2026

Uh oh!

TheoRudkiewicz Apr 14, 2026

Uh oh!

TheoRudkiewicz Apr 14, 2026

Uh oh!

TheoRudkiewicz Apr 14, 2026

Uh oh!

TheoRudkiewicz Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		\boldsymbol{\psi} &\leftarrow \alpha \, \boldsymbol{\psi} \, \frac{\bar{a}(\boldsymbol{W}^{(l-1)})}{\bar{a}(\boldsymbol{\psi})},\\
		\boldsymbol{\omega} &\leftarrow \alpha \, \boldsymbol{\omega} \, \frac{\bar{a}(\boldsymbol{W}^{(l)})}{\bar{a}(\boldsymbol{\omega})}.

		- The one-sparse neuron rule is useful for explanation, but the paper's
		practical algorithm is denser and therefore somewhat less transparent.

Conversation

stephane-rivaud commented Apr 14, 2026

Summary

Verification

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

TheoRudkiewicz Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

TheoRudkiewicz Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

TheoRudkiewicz Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

TheoRudkiewicz Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

TheoRudkiewicz Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

TheoRudkiewicz Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants