doc: NeST#11
Conversation
There was a problem hiding this comment.
Pull request overview
Expands the NeST algorithm documentation into a full, self-contained page with clearer notation and a structured breakdown of NeST’s growth policies (connections, neurons, feature maps) and the grow–prune loop.
Changes:
- Rewrites
docs/algorithms/nest.rstwith expanded sections (TLDR, setup/notation, connection growth, neuron growth, feature-map growth, training loop, snapshot, limitations). - Clarifies notation (layer indices, batch indexing) and presents shape-consistent update formulas.
- Adds a compact “empirical snapshot” plus limitations/open questions scoped to NeST.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| \epsilon &\sim \mathrm{Uniform}(\{-1, 1\}),\\ | ||
| \psi_{i^*} &= \epsilon \, \operatorname{sgn}\!\left(B^{(l-2)}_{i^*,j^*}\right)\sqrt{\left|B^{(l-2)}_{i^*,j^*}\right|},\\ | ||
| \omega_{j^*} &= \epsilon \sqrt{\left|B^{(l-2)}_{i^*,j^*}\right|}, |
|
|
||
| \boldsymbol{\psi} &\leftarrow \alpha \, \boldsymbol{\psi} \, \frac{\bar{a}(\boldsymbol{W}^{(l-1)})}{\bar{a}(\boldsymbol{\psi})},\\ | ||
| \boldsymbol{\omega} &\leftarrow \alpha \, \boldsymbol{\omega} \, \frac{\bar{a}(\boldsymbol{W}^{(l)})}{\bar{a}(\boldsymbol{\omega})}. |
| The full NeST method alternates the growth rules above with magnitude-based | ||
| removal of weak connections and weak neurons. Effective | ||
| (batch-normalized) weights may be used when judging magnitudes | ||
| :cite:p:`daiNeSTNeuralNetwork2019`. As part of the broader |
There was a problem hiding this comment.
| The full NeST method alternates the growth rules above with magnitude-based | |
| removal of weak connections and weak neurons. Effective | |
| (batch-normalized) weights may be used when judging magnitudes | |
| :cite:p:`daiNeSTNeuralNetwork2019`. As part of the broader | |
| The full NeST method first growths the network using the rules above and then does magnitude-based | |
| removal of weak connections and weak neurons. Effective | |
| (batch-normalized) weights may be used when judging magnitudes | |
| :cite:p:`daiNeSTNeuralNetwork2019`. As part of the broader |
In my understanding Nest does not alternate. It does growing then pruning, and does not cylce.
| Empirical snapshot | ||
| ------------------ | ||
|
|
||
| Within this page's scope, the paper contributes three practically relevant | ||
| messages. First, sparse growth can be organized around activation-gradient | ||
| correlations :math:`\boldsymbol{B}` rather than function-preserving morphisms. | ||
| Second, the one-sparse bridging rule is mainly pedagogical: the published | ||
| algorithm aggregates over a top-:math:`\beta` set and then rescales by | ||
| :math:`\alpha`. Third, feature-map growth is treated separately from the | ||
| fully connected score, with an explicit candidate search over forward losses | ||
| :cite:p:`daiNeSTNeuralNetwork2019`. |
There was a problem hiding this comment.
Does not seems particularly relevant to me.
| - NeST mixes two different types of growth rules: activation-gradient scoring | ||
| for connections and neurons, but a forward loss comparison for feature maps. | ||
| This makes the method less uniform than purely function-preserving approaches. |
There was a problem hiding this comment.
| - NeST mixes two different types of growth rules: activation-gradient scoring | |
| for connections and neurons, but a forward loss comparison for feature maps. | |
| This makes the method less uniform than purely function-preserving approaches. | |
| - NeST proposes a close form formula for neuron addition in fully-connected layers but rely on trial/error for channels growth. |
| - The one-sparse neuron rule is useful for explanation, but the paper's | ||
| practical algorithm is denser and therefore somewhat less transparent. |
There was a problem hiding this comment.
I think it's already largely discussed above.
| - The broader grow-prune loop raises the same scheduling questions discussed in | ||
| [[When to grow?|when_to_grow]] and [[Where to grow?|where_to_grow]]: | ||
| how often should growth be triggered, and where should sparse capacity be | ||
| added? |
There was a problem hiding this comment.
Yes but the main point is that in my understanding the full Nest pipeline does not use any gradient descent !
There was a problem hiding this comment.
It seems they do gradient descent "and then retrain the whole DNN to recover its performance"
- Fix aligned math (aligned env) for Copilot/MathJax - Align synthesis narrative with paper: growth then pruning phases - Clarify gradient-based weight training vs growth initialization - Add policies table; document Policy 4 partial-area convolution pruning - Add compact experimental results from paper; tighten limitations Made-with: Cursor
- Add TikZ sources and SVGs for Policies 1–3 (connection, neuron, conv) - Wire docs Makefile html/stricthtml/livehtml to scripts/build_figures.sh - Add visual quality gate helpers (eval_tikz_constraints, summarize, run gate) - Embed figures in nest.rst under PR prose (bf4b1d2 baseline) Made-with: Cursor
- Expand nest.rst: blockquote TLDR, roadmap, notation, VT-style light/dark figures, growth-prune and optimization sections, results table, hyperparameters, Algorithm 1 gloss, split limitations vs open questions. - Add TikZ/SVG dark variants for Policies 1–3; document -dark naming in figures README. - Make build_figures.sh skip TikZ when pdflatex is absent if outputs exist; warn on stale .tex vs SVG; fail only when outputs missing. Made-with: Cursor
…thon3 in build_figures - Replace malformed grid results table with list-table (tab-nest-results) - Convert Policy 1–3 container figures to paired .. figure:: (light named, dark) - Add numrefs for figures and table; wire eq labels in policies table and prose - Clarify partial-area γ: percentile prose + Algorithm 2 order-statistic rule - Bridge sign convention between dL/dW and |B| scoring - Prefer python3 over python in build_figures.sh for .py generators Made-with: Cursor
Two .. figure:: directives each produced a numbered figcaption; only-light/dark hide images, not the extra <figure>. Revert Policy 1–3 blocks to the same container+two-images pattern as variance_transfer.rst and use :ref: in the intro. Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
…ering) Made-with: Cursor
Made-with: Cursor
…ection Made-with: Cursor
Summary
docs/algorithms/nest.rstinto a fuller algorithm page with a TLDR, setup/notation, and clearer separation between connection growth, neuron growth, feature-map growth, and the grow-prune loopl-1/l-2layer references, consistent batch indexing, and a shape-consistent neuron contribution updateVerification
make -C docs htmlNotes
docs/algorithms/nest.rstupdate..env,.cursor/, and scratch images were left out of the PR.Made with Cursor