leynos · leynos · Jul 23, 2025 · Jul 22, 2025 · Jul 22, 2025 · Jul 22, 2025
diff --git a/README.md b/README.md
@@ -159,12 +159,14 @@ Only simple tables composed of `<tr>`, `<th>`, and `<td>` tags are supported.
 Tag case and attributes are ignored. After conversion, they are reformatted
 alongside regular Markdown tables.
 
-See [HTML table support for more details](docs/html-table-support.md).
+See
+[HTML&nbsp;table&nbsp;support&nbsp;for&nbsp;more&nbsp;details](docs/architecture.md#html-table-support-in-mdtablefix)
+.
 
 ## Module structure
 
-For an overview of how the crate's internal modules relate to each other, see
-[Module relationships](docs/module-relationships.md).
+For an overview of how the crate's internal modules relate to each other, see \
+[Module relationships](docs/architecture.md#module-relationships).
 
 ## Testing
 

diff --git a/docs/architecture.md b/docs/architecture.md
@@ -0,0 +1,307 @@
+# Architecture
+
+## Contents
+
+- [Markdown stream processor](#markdown-stream-processor)
+- [Footnote conversion](#footnote-conversion)
+- [HTML table support](#html-table-support-in-mdtablefix)
+- [Module relationships](#module-relationships)
+- [Concurrency with `rayon`](#concurrency-with-rayon)
+- [Unicode width handling](#unicode-width-handling)
+
+## Markdown stream processor
+
+`process_stream_inner` orchestrates line-by-line rewriting. The full
+implementation lives in [src/process.rs](../src/process.rs). Its signature is:
+
+```rust
+pub fn process_stream_inner(lines: &[String], opts: Options) -> Vec<String>
+```
+
+The function combines several helpers documented in `docs/`:
+
+- `fences::compress_fences` and `attach_orphan_specifiers` normalize code block
+  delimiters.
+- `html::convert_html_tables` transforms basic HTML tables into Markdown so \
+    they can be reflowed like regular tables. See \
+    [HTML table support](#html-table-support-in-mdtablefix).
+- `wrap::wrap_text` applies optional line wrapping. It relies on the
+  `unicode-width` crate for accurate character widths.
+
+The function maintains a small state machine that tracks whether it is inside a
+Markdown table, an HTML table, or a fenced code block. The state determines how
+incoming lines are buffered or emitted. Once the end of a table or fence is
+reached, buffered lines are flushed and possibly reformatted. The simplified
+behaviour is illustrated below.
+
+```mermaid
+stateDiagram-v2
+
+    [*] --> Streaming: Start
+
+    Streaming: Default state—processing lines individually
+
+    InMarkdownTable: Buffering lines of a Markdown table
+
+    InHtmlTable: Buffering lines of an HTML table
+
+    InCodeFence: Passing through lines within a fenced code block
+
+    Streaming --> InMarkdownTable: Line starts with "|"
+    Streaming --> InHtmlTable: Line contains table HTML tag
+    Streaming --> InCodeFence: Line is a fence delimiter ("```" or "~~~")
+
+    InMarkdownTable --> Streaming: Flush buffer and reflow table on non-table line (e.g., blank, heading)
+    InMarkdownTable --> InMarkdownTable: Line contains "|" or separator pattern
+
+    InHtmlTable --> Streaming: Flush buffer and convert table on final table HTML closing tag
+    InHtmlTable --> InHtmlTable: Line inside table tag
+
+    InCodeFence --> Streaming: Line is a fence delimiter
+```
+
+Before:
+
+```markdown
+|A|B|
+|---|---|
+|1|22|
+<table><tr><td>3</td><td>4</td></tr></table>
+```
+
+After:
+
+```markdown
+| A | B  |
+| --- | --- |
+| 1 | 22 |
+| 3 | 4  |
+```
+
+Code fences are passed through verbatim:
+
+```rust
+| not | a | table |
+```
+
+After scanning all lines, the processor performs optional post-processing steps
+such as ellipsis replacement and footnote conversion. See \
+[footnote conversion](#footnote-conversion) for details. The function then
+returns the updated stream for writing to disk or further manipulation.
+
+## Footnote Conversion
+
+`mdtablefix` can optionally convert bare numeric references into
+GitHub-flavoured Markdown footnotes. The `convert_footnotes` function performs
+this operation and is exposed via the higher-level `process_stream_opts`
+helper. Set `Options { footnotes: true, ..Default::default() }` when calling
+`process_stream_opts` to enable the conversion logic.
+
+Inline references that appear after punctuation are rewritten as footnote links.
+
+Before:
+
+```markdown
+A useful tip.1
+```
+
+After:
+
+```markdown
+A useful tip.[^1]
+```
+
+Numbers inside inline code or parentheses are ignored.
+
+Before:
+
+```markdown
+Look at `code 1` for details.
+Refer to equation (1) for context.
+```
+
+After:
+
+```markdown
+Look at `code 1` for details.
+Refer to equation (1) for context.
+```
+
+When the final lines of a document form a numbered list, they are replaced with
+footnote definitions.
+
+Before:
+
+```markdown
+Text.
+
+ 1. First note
+ 2. Second note
+```
+
+After:
+
+```markdown
+Text.
+
+ [^1] First note
+[^2] Second note
+```
+
+`convert_footnotes` only processes the final contiguous list of numeric
+references.
+
+## HTML Table Support in `mdtablefix`
+
+`mdtablefix` can format simple HTML `<table>` elements embedded in Markdown.
+These HTML tables are transformed into Markdown before the main table reflow
+logic runs. That preprocessing is handled by the `convert_html_tables` function.
+
+Only straightforward tables with `<tr>`, `<th>` and `<td>` tags are detected.
+Attributes and tag casing are ignored, and complex nested or styled tables are
+not supported. After conversion, each HTML table is represented as a Markdown
+table, so the usual reflow algorithm can align its columns consistently with
+the rest of the document.
+
+```html
+<table>
+  <tr><th>A</th><th>B</th></tr>
+  <tr><td>1</td><td>2</td></tr>
+</table>
+```
+
+The converter checks the first table row for `<th>` cells or for `<strong>` or
+`<b>` tags inside `<td>` elements to decide whether it is a header. If no such
+markers exist and the table contains multiple rows, the first row is still
+treated as the header, so the Markdown output includes a separator line. This
+last-resort behaviour keeps simple tables readable after conversion.
+
+## Module Relationships
+
+This diagram illustrates the connections between the crate's modules.
+
+```mermaid
+classDiagram
+    class lib {
+        <<module>>
+    }
+    class html {
+        <<module>>
+        +convert_html_tables()
+        +html_table_to_markdown()
+    }
+    class table {
+        <<module>>
+        +reflow_table()
+        +split_cells()
+        +SEP_RE
+    }
+    class wrap {
+        <<module>>
+        +wrap_text()
+        +is_fence()
+    }
+    class lists {
+        <<module>>
+        +renumber_lists()
+    }
+    class breaks {
+        <<module>>
+        +format_breaks()
+        +THEMATIC_BREAK_LEN
+    }
+    class ellipsis {
+        <<module>>
+        +replace_ellipsis()
+    }
+    class fences {
+        <<module>>
+        +compress_fences()
+        +attach_orphan_specifiers()
+    }
+    class footnotes {
+        <<module>>
+        +convert_footnotes()
+    }
+    class process {
+        <<module>>
+        +process_stream()
+        +process_stream_no_wrap()
+    }
+    class io {
+        <<module>>
+        +rewrite()
+        +rewrite_no_wrap()
+    }
+    lib --> html
+    lib --> table
+    lib --> wrap
+    lib --> lists
+    lib --> breaks
+    lib --> ellipsis
+    lib --> fences
+    lib --> process
+    lib --> io
+    html ..> wrap : uses is_fence
+    table ..> reflow : uses parse_rows, etc.
+    lists ..> wrap : uses is_fence
+    breaks ..> wrap : uses is_fence
+    ellipsis ..> wrap : uses tokenize_markdown
+    process ..> html : uses convert_html_tables
+    process ..> table : uses reflow_table
+    process ..> wrap : uses wrap_text, is_fence
+    process ..> fences : uses compress_fences, attach_orphan_specifiers
+    process ..> ellipsis : uses replace_ellipsis
+    process ..> footnotes : uses convert_footnotes
+    io ..> process : uses process_stream, process_stream_no_wrap
+```
+
+The `lib` module re-exports the public API from the other modules. The
+`ellipsis` module performs text normalization. The `process` module provides
+streaming helpers that combine the lower-level functions, including ellipsis
+replacement and footnote conversion. The `io` module handles filesystem
+operations, delegating the text processing to `process`.
+
+## Concurrency with `rayon`
+
+`mdtablefix` uses the `rayon` crate to process multiple files concurrently.
+`rayon` provides a work-stealing thread pool and simple parallel iterators. The
+tool relies on Rayon's global thread pool so that no manual setup is required.
+The dependency is specified as `^1.0` in `Cargo.toml` to track stable API
+changes within the same major release.
+
+Parallelism is enabled automatically whenever more than one file path is
+provided on the command line. Each worker gathers its output before printing,
+so results appear in the original order. This buffering increases memory usage
+and may reduce performance if many tiny files are processed.
+
+```mermaid
+sequenceDiagram
+    participant User as actor User
+    participant CLI as CLI Main
+    participant FileHandler as handle_file
+    participant Stdout as Stdout
+    participant Stderr as Stderr
+
+    User->>CLI: Run CLI with multiple files (not in-place)
+    CLI->>FileHandler: handle_file(file1)
+    CLI->>FileHandler: handle_file(file2)
+    CLI->>FileHandler: handle_file(file3)
+    Note over CLI,FileHandler: Files processed in parallel
+    FileHandler-->>CLI: Result (Ok(Some(output)) or Err(error))
+    loop For each file in input order
+        CLI->>Stdout: Print output (if Ok)
+        CLI->>Stderr: Print error (if Err)
+    end
+    CLI-->>User: Exit (with error if any file errored)
+```
+
+## Unicode Width Handling
+
+`mdtablefix` wraps paragraphs and list items while respecting the display width
+of Unicode characters. The `unicode-width` crate is used to compute the width
+of strings when deciding where to break lines. This prevents emojis or other
+multibyte characters from causing unexpected wraps or truncation.
+
+Whenever wrapping logic examines the length of a token, it relies on
+`UnicodeWidthStr::width` to measure visible columns rather than byte length.
diff --git a/docs/footnote-conversion.md b/docs/footnote-conversion.md