-
Notifications
You must be signed in to change notification settings - Fork 0
Consolidate architecture docs #124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
leynos
merged 4 commits into
main
from
codex/combine-architecture-documents-into-one-file
Jul 23, 2025
Merged
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,307 @@ | ||
| # Architecture | ||
|
|
||
| ## Contents | ||
|
|
||
| - [Markdown stream processor](#markdown-stream-processor) | ||
| - [Footnote conversion](#footnote-conversion) | ||
| - [HTML table support](#html-table-support-in-mdtablefix) | ||
| - [Module relationships](#module-relationships) | ||
| - [Concurrency with `rayon`](#concurrency-with-rayon) | ||
| - [Unicode width handling](#unicode-width-handling) | ||
|
|
||
| ## Markdown stream processor | ||
|
|
||
| `process_stream_inner` orchestrates line-by-line rewriting. The full | ||
| implementation lives in [src/process.rs](../src/process.rs). Its signature is: | ||
|
|
||
| ```rust | ||
| pub fn process_stream_inner(lines: &[String], opts: Options) -> Vec<String> | ||
| ``` | ||
|
|
||
| The function combines several helpers documented in `docs/`: | ||
|
|
||
| - `fences::compress_fences` and `attach_orphan_specifiers` normalize code block | ||
| delimiters. | ||
| - `html::convert_html_tables` transforms basic HTML tables into Markdown so \ | ||
| they can be reflowed like regular tables. See \ | ||
| [HTML table support](#html-table-support-in-mdtablefix). | ||
| - `wrap::wrap_text` applies optional line wrapping. It relies on the | ||
| `unicode-width` crate for accurate character widths. | ||
|
|
||
| The function maintains a small state machine that tracks whether it is inside a | ||
| Markdown table, an HTML table, or a fenced code block. The state determines how | ||
| incoming lines are buffered or emitted. Once the end of a table or fence is | ||
| reached, buffered lines are flushed and possibly reformatted. The simplified | ||
| behaviour is illustrated below. | ||
|
|
||
| ```mermaid | ||
| stateDiagram-v2 | ||
|
|
||
| [*] --> Streaming: Start | ||
|
|
||
| Streaming: Default state—processing lines individually | ||
|
|
||
| InMarkdownTable: Buffering lines of a Markdown table | ||
|
|
||
| InHtmlTable: Buffering lines of an HTML table | ||
|
|
||
| InCodeFence: Passing through lines within a fenced code block | ||
|
|
||
| Streaming --> InMarkdownTable: Line starts with "|" | ||
| Streaming --> InHtmlTable: Line contains table HTML tag | ||
| Streaming --> InCodeFence: Line is a fence delimiter ("```" or "~~~") | ||
|
|
||
| InMarkdownTable --> Streaming: Flush buffer and reflow table on non-table line (e.g., blank, heading) | ||
| InMarkdownTable --> InMarkdownTable: Line contains "|" or separator pattern | ||
|
|
||
| InHtmlTable --> Streaming: Flush buffer and convert table on final table HTML closing tag | ||
| InHtmlTable --> InHtmlTable: Line inside table tag | ||
|
|
||
| InCodeFence --> Streaming: Line is a fence delimiter | ||
| ``` | ||
|
|
||
| Before: | ||
|
|
||
| ```markdown | ||
| |A|B| | ||
| |---|---| | ||
| |1|22| | ||
| <table><tr><td>3</td><td>4</td></tr></table> | ||
| ``` | ||
|
|
||
| After: | ||
|
|
||
| ```markdown | ||
| | A | B | | ||
| | --- | --- | | ||
| | 1 | 22 | | ||
| | 3 | 4 | | ||
| ``` | ||
|
|
||
| Code fences are passed through verbatim: | ||
|
|
||
| ```rust | ||
| | not | a | table | | ||
| ``` | ||
|
|
||
| After scanning all lines, the processor performs optional post-processing steps | ||
| such as ellipsis replacement and footnote conversion. See \ | ||
| [footnote conversion](#footnote-conversion) for details. The function then | ||
| returns the updated stream for writing to disk or further manipulation. | ||
|
|
||
| ## Footnote Conversion | ||
|
|
||
| `mdtablefix` can optionally convert bare numeric references into | ||
| GitHub-flavoured Markdown footnotes. The `convert_footnotes` function performs | ||
| this operation and is exposed via the higher-level `process_stream_opts` | ||
| helper. Set `Options { footnotes: true, ..Default::default() }` when calling | ||
| `process_stream_opts` to enable the conversion logic. | ||
|
|
||
| Inline references that appear after punctuation are rewritten as footnote links. | ||
|
|
||
| Before: | ||
|
|
||
| ```markdown | ||
| A useful tip.1 | ||
| ``` | ||
|
|
||
| After: | ||
|
|
||
| ```markdown | ||
| A useful tip.[^1] | ||
| ``` | ||
|
|
||
| Numbers inside inline code or parentheses are ignored. | ||
|
|
||
| Before: | ||
|
|
||
| ```markdown | ||
| Look at `code 1` for details. | ||
| Refer to equation (1) for context. | ||
| ``` | ||
|
|
||
| After: | ||
|
|
||
| ```markdown | ||
| Look at `code 1` for details. | ||
| Refer to equation (1) for context. | ||
| ``` | ||
|
|
||
| When the final lines of a document form a numbered list, they are replaced with | ||
| footnote definitions. | ||
|
|
||
| Before: | ||
|
|
||
| ```markdown | ||
| Text. | ||
|
|
||
| 1. First note | ||
| 2. Second note | ||
| ``` | ||
|
|
||
| After: | ||
|
|
||
| ```markdown | ||
| Text. | ||
|
|
||
| [^1] First note | ||
| [^2] Second note | ||
| ``` | ||
|
|
||
| `convert_footnotes` only processes the final contiguous list of numeric | ||
| references. | ||
|
|
||
| ## HTML Table Support in `mdtablefix` | ||
|
|
||
| `mdtablefix` can format simple HTML `<table>` elements embedded in Markdown. | ||
| These HTML tables are transformed into Markdown before the main table reflow | ||
| logic runs. That preprocessing is handled by the `convert_html_tables` function. | ||
|
|
||
| Only straightforward tables with `<tr>`, `<th>` and `<td>` tags are detected. | ||
| Attributes and tag casing are ignored, and complex nested or styled tables are | ||
| not supported. After conversion, each HTML table is represented as a Markdown | ||
| table, so the usual reflow algorithm can align its columns consistently with | ||
| the rest of the document. | ||
|
|
||
| ```html | ||
| <table> | ||
| <tr><th>A</th><th>B</th></tr> | ||
| <tr><td>1</td><td>2</td></tr> | ||
| </table> | ||
| ``` | ||
|
|
||
| The converter checks the first table row for `<th>` cells or for `<strong>` or | ||
| `<b>` tags inside `<td>` elements to decide whether it is a header. If no such | ||
| markers exist and the table contains multiple rows, the first row is still | ||
| treated as the header, so the Markdown output includes a separator line. This | ||
| last-resort behaviour keeps simple tables readable after conversion. | ||
|
|
||
| ## Module Relationships | ||
|
|
||
| This diagram illustrates the connections between the crate's modules. | ||
|
|
||
| ```mermaid | ||
| classDiagram | ||
| class lib { | ||
| <<module>> | ||
| } | ||
| class html { | ||
| <<module>> | ||
| +convert_html_tables() | ||
| +html_table_to_markdown() | ||
| } | ||
| class table { | ||
| <<module>> | ||
| +reflow_table() | ||
| +split_cells() | ||
| +SEP_RE | ||
| } | ||
| class wrap { | ||
| <<module>> | ||
| +wrap_text() | ||
| +is_fence() | ||
| } | ||
| class lists { | ||
| <<module>> | ||
| +renumber_lists() | ||
| } | ||
| class breaks { | ||
| <<module>> | ||
| +format_breaks() | ||
| +THEMATIC_BREAK_LEN | ||
| } | ||
| class ellipsis { | ||
| <<module>> | ||
| +replace_ellipsis() | ||
| } | ||
| class fences { | ||
| <<module>> | ||
| +compress_fences() | ||
| +attach_orphan_specifiers() | ||
| } | ||
| class footnotes { | ||
| <<module>> | ||
| +convert_footnotes() | ||
| } | ||
| class process { | ||
| <<module>> | ||
| +process_stream() | ||
| +process_stream_no_wrap() | ||
| } | ||
| class io { | ||
| <<module>> | ||
| +rewrite() | ||
| +rewrite_no_wrap() | ||
| } | ||
| lib --> html | ||
| lib --> table | ||
| lib --> wrap | ||
| lib --> lists | ||
| lib --> breaks | ||
| lib --> ellipsis | ||
| lib --> fences | ||
| lib --> process | ||
| lib --> io | ||
| html ..> wrap : uses is_fence | ||
| table ..> reflow : uses parse_rows, etc. | ||
| lists ..> wrap : uses is_fence | ||
| breaks ..> wrap : uses is_fence | ||
| ellipsis ..> wrap : uses tokenize_markdown | ||
| process ..> html : uses convert_html_tables | ||
| process ..> table : uses reflow_table | ||
| process ..> wrap : uses wrap_text, is_fence | ||
| process ..> fences : uses compress_fences, attach_orphan_specifiers | ||
| process ..> ellipsis : uses replace_ellipsis | ||
| process ..> footnotes : uses convert_footnotes | ||
| io ..> process : uses process_stream, process_stream_no_wrap | ||
| ``` | ||
|
|
||
| The `lib` module re-exports the public API from the other modules. The | ||
| `ellipsis` module performs text normalization. The `process` module provides | ||
| streaming helpers that combine the lower-level functions, including ellipsis | ||
| replacement and footnote conversion. The `io` module handles filesystem | ||
| operations, delegating the text processing to `process`. | ||
|
|
||
| ## Concurrency with `rayon` | ||
|
|
||
| `mdtablefix` uses the `rayon` crate to process multiple files concurrently. | ||
| `rayon` provides a work-stealing thread pool and simple parallel iterators. The | ||
| tool relies on Rayon's global thread pool so that no manual setup is required. | ||
| The dependency is specified as `^1.0` in `Cargo.toml` to track stable API | ||
| changes within the same major release. | ||
|
|
||
| Parallelism is enabled automatically whenever more than one file path is | ||
| provided on the command line. Each worker gathers its output before printing, | ||
| so results appear in the original order. This buffering increases memory usage | ||
| and may reduce performance if many tiny files are processed. | ||
|
|
||
| ```mermaid | ||
| sequenceDiagram | ||
| participant User as actor User | ||
| participant CLI as CLI Main | ||
| participant FileHandler as handle_file | ||
| participant Stdout as Stdout | ||
| participant Stderr as Stderr | ||
|
|
||
| User->>CLI: Run CLI with multiple files (not in-place) | ||
| CLI->>FileHandler: handle_file(file1) | ||
| CLI->>FileHandler: handle_file(file2) | ||
| CLI->>FileHandler: handle_file(file3) | ||
| Note over CLI,FileHandler: Files processed in parallel | ||
| FileHandler-->>CLI: Result (Ok(Some(output)) or Err(error)) | ||
| loop For each file in input order | ||
| CLI->>Stdout: Print output (if Ok) | ||
| CLI->>Stderr: Print error (if Err) | ||
| end | ||
| CLI-->>User: Exit (with error if any file errored) | ||
| ``` | ||
|
|
||
| ## Unicode Width Handling | ||
|
|
||
| `mdtablefix` wraps paragraphs and list items while respecting the display width | ||
| of Unicode characters. The `unicode-width` crate is used to compute the width | ||
| of strings when deciding where to break lines. This prevents emojis or other | ||
| multibyte characters from causing unexpected wraps or truncation. | ||
|
|
||
| Whenever wrapping logic examines the length of a token, it relies on | ||
| `UnicodeWidthStr::width` to measure visible columns rather than byte length. | ||
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Wrap long list items to ≤ 80 columns.
Lines in this range overshoot the style limit for prose files. Hard-wrap to maintain consistency with the project’s Markdown guidelines.
🤖 Prompt for AI Agents