Allow preprocessing of NotebookNode

### Description / Summary

I would like to be able to preprocess my notebooks before they are parsed.
My wish is to have similar behavior to `nbconvert` preprocessors`.

### Value / benefit

The main motivation for such an extension is that it would be possible to easily include any notebook transformations and previously written `nbconvert` preprocessors.
An example of a custom preprocessor:
Instead of having to write cell metadata into the _special_ metadata field, one could simply write a magic comment that would automatically be converted to the metadata:

```python
# remove-output
print("hello world")
```

Could be preprocessed to:
```python
print("hello word")
```

with the metadata tag: `remove-output`.


I know that it would be possible to simply write the metadata directly into the metadata field, but this is different for `jupyterlab`, `VSCode`, etc...
Also, it is often not easy to see what metadata has been written to each cell.
A different solution is to use a preprocessing script to modify the notebook before being parsed, but that requires this extra step outside of the _normal_ pipeline and some preprocessors are not idempotent (for example, the metadata writer processor removes the metadata line to not pollute the output).

Maybe this is a bit far-fetched, but I hope that it makes a little bit of sense. :sweat_smile: 
A top-down motivation would be for the use inside of jupyter-book, where it would be nice to use the metadata preprocessor that I've shown above to select what inputs/outputs to show _inside_ of the code cell and have this _magic comment_ removed to not pollute the printed output.

A different, simpler use-case would be to use a `RegexRemovePreprocessor` that would remove all cells that match the regular expression.

### Implementation details

I believe that this should be done **after** executing the notebook.
Then there is no difference between the _cached_ view and the original view, which might introduce some issues.
Like, when we use the RegexRemovePreprocessor and it removes a cell that produces an error.

Specifically, I would execute the preprocessors [here](https://github.com/executablebooks/MyST-NB/blob/6be37cd66d1b7668b16a4bab385222b58f38930d/myst_nb/parser.py#L139).

But then there might be an issue with the source-map/pseudo-line numeration when the preprocessor deletes entires cells.

**PS**: Maybe I am too focused/used to `NotebookNode` and should rather consider working on the `SyntaxTree`.


### Tasks to complete

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow preprocessing of NotebookNode #360

Description / Summary

Value / benefit

Implementation details

Tasks to complete

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Allow preprocessing of NotebookNode #360

Description

Description / Summary

Value / benefit

Implementation details

Tasks to complete

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions