Skip to content

doc: NeST#11

Open
stephane-rivaud wants to merge 14 commits intomainfrom
pr/nest-page-clean
Open

doc: NeST#11
stephane-rivaud wants to merge 14 commits intomainfrom
pr/nest-page-clean

Conversation

@stephane-rivaud
Copy link
Copy Markdown
Collaborator

Summary

  • rewrite docs/algorithms/nest.rst into a fuller algorithm page with a TLDR, setup/notation, and clearer separation between connection growth, neuron growth, feature-map growth, and the grow-prune loop
  • clean up the notation so the page uses explicit l-1 / l-2 layer references, consistent batch indexing, and a shape-consistent neuron contribution update
  • add a compact empirical snapshot plus limitations/open questions while keeping the scope focused on the NeST page itself

Verification

  • make -C docs html

Notes

  • This PR intentionally contains only the docs/algorithms/nest.rst update.
  • Local/dev artifacts such as .env, .cursor/, and scratch images were left out of the PR.
  • Earlier branch-local workflow/tooling files were not included so the PR stays focused on the page integration.

Made with Cursor

Made-with: Cursor
Copilot AI review requested due to automatic review settings April 14, 2026 09:26
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Expands the NeST algorithm documentation into a full, self-contained page with clearer notation and a structured breakdown of NeST’s growth policies (connections, neurons, feature maps) and the grow–prune loop.

Changes:

  • Rewrites docs/algorithms/nest.rst with expanded sections (TLDR, setup/notation, connection growth, neuron growth, feature-map growth, training loop, snapshot, limitations).
  • Clarifies notation (layer indices, batch indexing) and presents shape-consistent update formulas.
  • Adds a compact “empirical snapshot” plus limitations/open questions scoped to NeST.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docs/algorithms/nest.rst Outdated
Comment on lines +114 to +117

\epsilon &\sim \mathrm{Uniform}(\{-1, 1\}),\\
\psi_{i^*} &= \epsilon \, \operatorname{sgn}\!\left(B^{(l-2)}_{i^*,j^*}\right)\sqrt{\left|B^{(l-2)}_{i^*,j^*}\right|},\\
\omega_{j^*} &= \epsilon \sqrt{\left|B^{(l-2)}_{i^*,j^*}\right|},
Comment thread docs/algorithms/nest.rst Outdated
Comment on lines +141 to +143

\boldsymbol{\psi} &\leftarrow \alpha \, \boldsymbol{\psi} \, \frac{\bar{a}(\boldsymbol{W}^{(l-1)})}{\bar{a}(\boldsymbol{\psi})},\\
\boldsymbol{\omega} &\leftarrow \alpha \, \boldsymbol{\omega} \, \frac{\bar{a}(\boldsymbol{W}^{(l)})}{\bar{a}(\boldsymbol{\omega})}.
Comment thread docs/algorithms/nest.rst Outdated
Comment on lines +172 to +175
The full NeST method alternates the growth rules above with magnitude-based
removal of weak connections and weak neurons. Effective
(batch-normalized) weights may be used when judging magnitudes
:cite:p:`daiNeSTNeuralNetwork2019`. As part of the broader
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The full NeST method alternates the growth rules above with magnitude-based
removal of weak connections and weak neurons. Effective
(batch-normalized) weights may be used when judging magnitudes
:cite:p:`daiNeSTNeuralNetwork2019`. As part of the broader
The full NeST method first growths the network using the rules above and then does magnitude-based
removal of weak connections and weak neurons. Effective
(batch-normalized) weights may be used when judging magnitudes
:cite:p:`daiNeSTNeuralNetwork2019`. As part of the broader

In my understanding Nest does not alternate. It does growing then pruning, and does not cylce.

Comment thread docs/algorithms/nest.rst Outdated
Comment on lines +180 to +190
Empirical snapshot
------------------

Within this page's scope, the paper contributes three practically relevant
messages. First, sparse growth can be organized around activation-gradient
correlations :math:`\boldsymbol{B}` rather than function-preserving morphisms.
Second, the one-sparse bridging rule is mainly pedagogical: the published
algorithm aggregates over a top-:math:`\beta` set and then rescales by
:math:`\alpha`. Third, feature-map growth is treated separately from the
fully connected score, with an explicit candidate search over forward losses
:cite:p:`daiNeSTNeuralNetwork2019`.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not seems particularly relevant to me.

Comment thread docs/algorithms/nest.rst Outdated
Comment on lines +195 to +197
- NeST mixes two different types of growth rules: activation-gradient scoring
for connections and neurons, but a forward loss comparison for feature maps.
This makes the method less uniform than purely function-preserving approaches.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- NeST mixes two different types of growth rules: activation-gradient scoring
for connections and neurons, but a forward loss comparison for feature maps.
This makes the method less uniform than purely function-preserving approaches.
- NeST proposes a close form formula for neuron addition in fully-connected layers but rely on trial/error for channels growth.

Comment thread docs/algorithms/nest.rst Outdated
Comment on lines +198 to +199
- The one-sparse neuron rule is useful for explanation, but the paper's
practical algorithm is denser and therefore somewhat less transparent.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's already largely discussed above.

Comment thread docs/algorithms/nest.rst Outdated
Comment on lines +202 to +205
- The broader grow-prune loop raises the same scheduling questions discussed in
[[When to grow?|when_to_grow]] and [[Where to grow?|where_to_grow]]:
how often should growth be triggered, and where should sparse capacity be
added?
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but the main point is that in my understanding the full Nest pipeline does not use any gradient descent !

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems they do gradient descent "and then retrain the whole DNN to recover its performance"

- Fix aligned math (aligned env) for Copilot/MathJax
- Align synthesis narrative with paper: growth then pruning phases
- Clarify gradient-based weight training vs growth initialization
- Add policies table; document Policy 4 partial-area convolution pruning
- Add compact experimental results from paper; tighten limitations

Made-with: Cursor
- Add TikZ sources and SVGs for Policies 1–3 (connection, neuron, conv)
- Wire docs Makefile html/stricthtml/livehtml to scripts/build_figures.sh
- Add visual quality gate helpers (eval_tikz_constraints, summarize, run gate)
- Embed figures in nest.rst under PR prose (bf4b1d2 baseline)

Made-with: Cursor
- Expand nest.rst: blockquote TLDR, roadmap, notation, VT-style light/dark
  figures, growth-prune and optimization sections, results table, hyperparameters,
  Algorithm 1 gloss, split limitations vs open questions.
- Add TikZ/SVG dark variants for Policies 1–3; document -dark naming in figures README.
- Make build_figures.sh skip TikZ when pdflatex is absent if outputs exist; warn on
  stale .tex vs SVG; fail only when outputs missing.

Made-with: Cursor
…thon3 in build_figures

- Replace malformed grid results table with list-table (tab-nest-results)
- Convert Policy 1–3 container figures to paired .. figure:: (light named, dark)
- Add numrefs for figures and table; wire eq labels in policies table and prose
- Clarify partial-area γ: percentile prose + Algorithm 2 order-statistic rule
- Bridge sign convention between dL/dW and |B| scoring
- Prefer python3 over python in build_figures.sh for .py generators

Made-with: Cursor
Two .. figure:: directives each produced a numbered figcaption; only-light/dark
hide images, not the extra <figure>. Revert Policy 1–3 blocks to the same
container+two-images pattern as variance_transfer.rst and use :ref: in the intro.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants