Skip to content

Cargo binstall support#265

Merged
leynos merged 2 commits intomainfrom
mdtablefix-cargo-binstall-support
Apr 18, 2026
Merged

Cargo binstall support#265
leynos merged 2 commits intomainfrom
mdtablefix-cargo-binstall-support

Conversation

@leynos
Copy link
Copy Markdown
Owner

@leynos leynos commented Apr 18, 2026

Summary by Sourcery

Add cargo-binstall packaging support and refactor core Markdown processing utilities while preserving existing behavior.

New Features:

  • Publish cargo-binstall-compatible tarball archives for Linux GNU targets and configure Cargo binstall metadata for mdtablefix.

Enhancements:

  • Refactor footnote renumbering to extract reusable helpers and introduce small context/accumulator structs for clearer state management.
  • Restructure paragraph wrapping into a ParagraphState/ParagraphWriter abstraction and factor out helpers for prefix handling and line break detection.
  • Extract helpers for merging inline code and emphasis, simplifying fix_code_emphasis token handling logic.
  • Introduce small state structs and helpers for list renumbering, HTML table conversion, process_stream buffering, and table reflow width calculation to clarify control flow.
  • Refine inline wrapping and tokenization utilities by introducing range-based spans, a SplitContext helper, and clearer scanning helpers for backticks and plain text.
  • Simplify frontmatter and scanning tests by consolidating parameters into case structs for better readability and reuse.
  • Factor footnote inline conversion into small data structs and helpers and mark DefinitionParts as Copy where appropriate.

Build:

  • Extend the release workflow matrix with cargo-binstall archive flags and generate versioned tarball artifacts plus checksums for supported Linux targets.

Documentation:

  • Document cargo-binstall support, Linux-specific archive targets, and the revised set of release targets in the release process docs and README installation instructions.

Tests:

  • Refactor wrapping, scanning, frontmatter, and common test helpers into smaller reusable functions and table-driven cases without changing coverage or behavior.

leynos added 2 commits April 18, 2026 15:31
Declare the Linux GNU `cargo-binstall` package metadata in
`Cargo.toml` and teach the release workflow to build matching
`.tar.gz` archives for `x86_64-unknown-linux-gnu` and
`aarch64-unknown-linux-gnu`.

Document the new installation path in the README and release
process notes so the published assets and the user-facing
instructions stay aligned.
Break up the remaining high-nesting and high-arity helpers across
wrapping, footnote, HTML, and list processing so `make lint`
passes without suppressing the configured warnings.

Reshape the affected tests and shared test helpers alongside the
production refactors so the new helper boundaries remain covered
while preserving existing behaviour.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 18, 2026

Overview

This PR adds cargo-binstall support to mdtablefix while conducting a comprehensive internal refactoring of the markdown processing pipeline. The changes maintain backward compatibility and preserve all existing formatting behaviour.

Cargo-binstall Support

New Installation Method:

  • Added [package.metadata.binstall] configuration to Cargo.toml with repository URL and archive instructions for Linux GNU targets (x86_64 and aarch64)
  • Extended GitHub Actions release workflow to build .tar.gz archives for supported Linux targets alongside existing binaries
  • Documented the new installation path (cargo binstall mdtablefix) in README and release-process documentation

Release Workflow Updates:

  • Modified the build matrix to include a cargo_binstall_archive flag (enabled for Linux GNU x86_64/aarch64; disabled for FreeBSD)
  • Updated artifact preparation with strict shell settings and logic to conditionally generate tar.gz archives with corresponding SHA-256 checksums
  • Release documentation narrowed to reflect actual supported targets (Linux x86_64/aarch64 and FreeBSD x86_64 only)

Markdown Processing Refactoring

The PR extensively refactors internal helpers across multiple modules to improve code organization and satisfy linting constraints:

Emphasis & Code Token Processing (src/code_emphasis.rs):

  • Extracted adjacency detection, text/code token handling, emphasis folding, and affix consumption into reusable helpers
  • Maintains identical outward behaviour for emphasis merging and code normalisation

Fence Handling (src/fences.rs):

  • Introduced helpers to locate orphan specifier targets and verify fence language attributes
  • Simplified control flow while preserving orphan specifier attachment logic

Footnote Processing (src/footnotes/):

  • Introduced InlineFootnote struct to bundle captured regex components instead of 5-tuples (inline.rs)
  • Added Copy derivation to DefinitionParts for implicit copying (parsing.rs)
  • Refactored definition collection into DefinitionScanContext, DefinitionAccumulator, and helper functions for structured scanning and numeric candidate processing (renumber.rs)

HTML Table Support (src/html.rs):

  • Replaced free-function table buffering with stateful HtmlTableState struct encapsulating buffer, depth tracking, and conversion logic
  • Centralised whitespace-collapsing behaviour in push_collapsed_text_char helper
  • Improved table state transitions with append_html_table_line, flush_completed_html_table, and flush_raw methods

List Renumbering (src/lists.rs):

  • Introduced ListState struct to encapsulate indent stack and counter tracking, replacing threaded mutable references
  • Refactored list state management into struct methods while preserving renumbering semantics

Paragraph Wrapping (src/wrap.rs, src/wrap/paragraph.rs):

  • Introduced ParagraphState and ParagraphWriter types to centralise paragraph buffering and wrapping logic
  • Added PrefixLine struct carrying prefix (as Cow), remaining text, and repeat-prefix flag
  • Introduced helpers: is_table_or_separator, is_passthrough_block, prefix_line, and line_break_parts for clearer control flow
  • Replaced direct line emission with push_verbatim path for passthrough blocks

Inline Wrapping & Tokenisation (src/wrap/inline.rs, src/wrap/tokenize/mod.rs):

  • Refactored push_span_with_carry to accept Range<usize> instead of separate indices
  • Introduced SplitContext wrapper in src/wrap/line_buffer.rs to bundle output buffer and width
  • Extracted scan_plain_text_end, should_stop_plain_text, and append_escaped_backtick helpers to improve inline tokenisation control flow

Stream Processing (src/process.rs):

  • Replaced free helper functions and external state variables with internal ProcessBuffer struct
  • Methods (flush, push_verbatim, handle_fence_line, handle_table_line) preserve operational boundaries while centralising buffering logic

Test Refactoring

  • Refactored frontmatter tests to use PrefixEmptyCase and FrontmatterSplitCase case structs instead of multiple parameters (src/frontmatter.rs)
  • Updated inline tokenisation tests to use ScanCollectCase struct for test case composition (src/wrap/tokenize/scanning.rs)
  • Updated wrap tests to pass SplitContext and range arguments to refactored helper functions (src/wrap/tests.rs)
  • Refactored common test utility assert_wrapped_list_item to use helpers for code-span detection (tests/common/mod.rs)

Documentation

Updated docs/release-process.md to document the cargo-binstall distribution model, including archive naming conventions and the configuration location in [package.metadata.binstall]. The supported target list was refined to reflect actual release targets.

Walkthrough

The PR extends release infrastructure to support cargo-binstall distribution of prebuilt Linux GNU binaries whilst refactoring internal code through systematic extraction of helper functions and introduction of state-carrying structs to consolidate mutable parameter threading. No user-facing functional changes introduced.

Changes

Cohort / File(s) Summary
Release & Distribution Infrastructure
.github/workflows/release.yml, Cargo.toml, README.md, docs/release-process.md
Extended build matrix with cargo_binstall_archive flag; updated workflow to conditionally produce .tar.gz archives with SHA-256 checksums for Linux GNU targets. Added binstall metadata to Cargo.toml, documented installation method in README, and updated release process documentation to reflect expanded Linux distribution support whilst narrowing macOS, Windows, and OpenBSD targets.
Code Processing Helpers
src/code_emphasis.rs, src/fences.rs, src/wrap/tokenize/mod.rs
Extracted internal helper functions to centralise adjacency detection, text token handling, and plain-text scanning logic. Preserved outward behaviour whilst improving code organisation and readability.
Footnotes Module Refactoring
src/footnotes/inline.rs, src/footnotes/parsing.rs, src/footnotes/renumber.rs
Introduced InlineFootnote struct to bundle regex captures; added Copy derivation to DefinitionParts; refactored definition-update collection with new DefinitionScanContext, DefinitionAccumulator, and helper functions for structured numeric-list processing.
State-Carrying Struct Introductions
src/lists.rs, src/process.rs, src/html.rs, src/table.rs
Introduced ListState, ProcessBuffer, and HtmlTableState structs to encapsulate mutable parameters previously threaded through free functions. Centralised control flow whilst maintaining identical operational semantics.
Text Wrapping Module Restructure
src/wrap.rs, src/wrap/paragraph.rs, src/wrap/line_buffer.rs, src/wrap/fence.rs, src/wrap/inline.rs
Introduced ParagraphState, ParagraphWriter, and SplitContext to replace scattered mutable-parameter threading. Refactored fence-line handling and span-based token indexing using Range<usize>. Centralised prefix-line processing and paragraph buffering logic.
Test Case Refactoring
src/frontmatter.rs, src/wrap/tokenize/scanning.rs, tests/common/mod.rs
Replaced multiple individual test parameters with single case struct instances (PrefixEmptyCase, FrontmatterSplitCase, ScanCollectCase). Improved test readability and maintainability without altering assertions or control flow.

Poem

📦 Code grows cleaner, structs align,
Helper functions draw the line,
State flows through with purpose clear,
Release archives appear here,
Binstall brings the binaries near,
Refactored dreams now crystalline! ✨

🚥 Pre-merge checks | ✅ 5 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Testing ⚠️ Warning PR lacks explicit test coverage for edge case logic at line 124 of src/code_emphasis.rs and dedicated unit tests for new internal structures (ParagraphState, ParagraphWriter, SplitContext, ListState). Add explicit unit test for prefix-reset condition in code_emphasis.rs and create dedicated unit tests for internal state structures to verify correct behaviour in isolation.
Developer Documentation ⚠️ Warning Pull request introduces substantial internal refactorings with multiple new state abstractions (ParagraphState, ProcessBuffer, HtmlTableState, ListState, SplitContext, footnote processing contexts) that are not documented in docs/architecture.md. Extend docs/architecture.md with a new section documenting all introduced internal state abstractions, including their problems solved, responsibilities, and pipeline integration.
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Cargo binstall support' directly and clearly summarises the primary change: adding cargo-binstall packaging support and metadata configuration for the project.
Description check ✅ Passed The description comprehensively details all major changes: new cargo-binstall features, refactorings across markdown processing utilities, workflow enhancements, documentation updates, and test improvements—all aligned with the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
User-Facing Documentation ✅ Passed The cargo-binstall installation capability is documented in README.md with supported Linux targets and command instructions. Code changes preserve existing functionality.
Module-Level Documentation ✅ Passed All 19 modified modules carry appropriate module-level docstrings using Rust documentation conventions (//!), each clearly explaining the module's purpose.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch mdtablefix-cargo-binstall-support

Comment @coderabbitai help to get the list of available commands and usage tips.

@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Apr 18, 2026

Reviewer's Guide

Refactors multiple markdown processing subsystems (footnotes, wrapping, inline tokenization, lists, HTML tables, process pipeline) into smaller helpers and state structs while adding cargo-binstall metadata and Linux tar.gz release artifacts, plus corresponding docs and tests updates.

Class diagram for refactored paragraph wrapping and tokenization

classDiagram
    class PrefixLine {
      +prefix: Cow<str>
      +rest: &str
      +repeat_prefix: bool
    }

    class ParagraphState {
      +buf: Vec<(String,bool)>
      +indent: String
      +clear()
      +note_indent(line: &str)
      +push(text: String, hard_break: bool)
    }

    class ParagraphWriter {
      -out: &mut Vec<String>
      -width: usize
      +new(out: &mut Vec<String>, width: usize) ParagraphWriter
      +flush_paragraph(state: &mut ParagraphState)
      +push_verbatim(state: &mut ParagraphState, line: &str)
      +handle_prefix_line(state: &mut ParagraphState, prefix_line: &PrefixLine)
      -append_wrapped_with_prefix(line: &PrefixLine)
      -push_wrapped_segment(indent: &str, segment: &str)
    }

    class LineBuffer {
      -text: String
      -width: usize
      -last_split: Option<usize>
      +new() LineBuffer
      +push_token(token: &str)
      +push_span(tokens: &[String], start: usize, end: usize)
      +split_with_span(ctx: &mut SplitContext, tokens: &[String], span: Range<usize>) bool
      +flush_trailing_whitespace(lines: &mut Vec<String>, tokens: &[String], span: Range<usize>) bool
      +flush_into(lines: &mut Vec<String>)
      +width() usize
    }

    class SplitContext {
      +lines: &mut Vec<String>
      +width: usize
    }

    class WrapInline {
      <<module>>
      +wrap_preserving_code(text: &str, width: usize) Vec<String>
      +attach_punctuation_to_previous_line(lines: &mut Vec<String>)
      -push_span_with_carry(buffer: &mut LineBuffer, tokens: &[String], span: Range<usize>, carried_whitespace: &mut String)
    }

    class WrapTokenizeMod {
      <<module>>
      +segment_inline(text: &str) Vec<String>
      -append_escaped_backtick(tokens: &mut Vec<String>)
      -scan_plain_text_end(text: &str, bytes: &[u8], index: usize) usize
      -should_stop_plain_text(text: &str, bytes: &[u8], index: usize, current: (char,bool)) bool
    }

    class WrapTopLevel {
      <<module>>
      +wrap_text(lines: &[String], width: usize) Vec<String>
      -is_table_or_separator(line: &str) bool
      -is_passthrough_block(line: &str) bool
      -prefix_line(line: &str) Option<PrefixLine>
      -line_break_parts(line: &str) (String,bool)
    }

    WrapTopLevel ..> ParagraphState : manages
    WrapTopLevel ..> ParagraphWriter : uses
    WrapTopLevel ..> PrefixLine : constructs
    WrapTopLevel ..> WrapInline : calls
    WrapTopLevel ..> WrapTokenizeMod : via WrapInline

    ParagraphWriter ..> PrefixLine : wraps
    ParagraphWriter ..> WrapInline : uses

    WrapInline ..> LineBuffer : buffers
    WrapInline ..> SplitContext : configures

    WrapTokenizeMod ..> WrapInline : token source
Loading

Class diagram for refactored footnote renumbering and inline handling

classDiagram
    class DefinitionParts {
      <<Copy>>
      +prefix: &str
      +number: usize
      +rest: &str
    }

    class DefinitionLine {
      +index: usize
      +new_number: usize
      +line: String
    }

    class DefinitionUpdates {
      +definitions: Vec<DefinitionLine>
      +is_definition_line: Vec<bool>
    }

    class DefinitionScanContext {
      +mapping: &mut HashMap<usize,usize>
      +next_number: &mut usize
      +numeric_list_range: Option<(usize,usize)>
      +skip_numeric_conversion: bool
    }

    class DefinitionAccumulator {
      +definitions: Vec<DefinitionLine>
      +is_definition_line: Vec<bool>
    }

    class NumericCandidate {
      +index: usize
      +number: usize
      +indent: String
      +whitespace: String
      +rest: String
    }

    class FootnoteRenumberMod {
      <<module>>
      +collect_reference_mapping(lines: &[String]) HashMap<usize,usize>
      -collect_reference_mapping_from_text(text: &str, mapping: &mut HashMap<usize,usize>, next: &mut usize)
      -collect_definition_updates(lines: &[String], mapping: &mut HashMap<usize,usize>) DefinitionUpdates
      -collect_scan_updates(lines: &[String], ctx: &mut DefinitionScanContext) (DefinitionAccumulator, Vec<NumericCandidate>)
      -finalize_numeric_candidates(numeric_candidates: Vec<NumericCandidate>, ctx: &mut DefinitionScanContext, acc: &mut DefinitionAccumulator)
      -definition_line_from_parts(index: usize, parts: DefinitionParts, mapping: &mut HashMap<usize,usize>, next_number: &mut usize) DefinitionLine
      -numeric_candidate_from_line(line: &str, index: usize) Option<NumericCandidate>
      -assign_new_number(mapping: &mut HashMap<usize,usize>, number: usize, next: &mut usize) usize
      -rewrite_tokens(rest: &str, mapping: &HashMap<usize,usize>) String
    }

    class InlineFootnote {
      +pre: &str
      +punc: &str
      +style: &str
      +num: &str
      +boundary: &str
    }

    class FootnoteInlineMod {
      <<module>>
      +convert_inline(text: &str) String
      -capture_parts(caps: &Captures) InlineFootnote
      -build_footnote(parts: InlineFootnote) String
    }

    FootnoteRenumberMod ..> DefinitionParts : uses
    FootnoteRenumberMod ..> DefinitionScanContext : configures
    FootnoteRenumberMod ..> DefinitionAccumulator : accumulates
    FootnoteRenumberMod ..> NumericCandidate : converts
    FootnoteRenumberMod ..> DefinitionUpdates : returns

    DefinitionAccumulator *-- DefinitionLine : aggregates
    DefinitionUpdates *-- DefinitionLine : aggregates

    FootnoteInlineMod ..> InlineFootnote : builds
Loading

Class diagram for refactored processing pipeline, HTML tables, and lists

classDiagram
    class ProcessBuffer {
      +out: Vec<String>
      +buf: Vec<String>
      +in_table: bool
      +flush()
      +push_verbatim(line: &str)
      +handle_fence_line(line: &str, fences: &mut FenceTracker) bool
      +handle_table_line(line: &str) bool
    }

    class HtmlTableState {
      +buf: Vec<String>
      +depth: usize
      +in_html: bool
      +flush_raw(out: &mut Vec<String>)
      +push_html_line(line: &str, out: &mut Vec<String>)
    }

    class HtmlMod {
      <<module>>
      +convert_html_tables(lines: &[String]) Vec<String>
      +html_table_to_markdown(lines: &[String]) Vec<String>
      -append_html_table_line(line: &str, buf: &mut Vec<String>, depth: &mut usize)
      -flush_completed_html_table(buf: &mut Vec<String>, depth: usize, out: &mut Vec<String>) bool
      -push_collapsed_text_char(ch: char, out: &mut String, last_space: &mut bool)
    }

    class ListState {
      +indent_stack: Vec<usize>
      +counters: HashMap<usize,usize>
      +prune_deeper(indent: usize, inclusive: bool)
      +handle_paragraph_restart(indent: usize, line: &str, prev_blank: bool) bool
    }

    class ListsMod {
      <<module>>
      +renumber_lists(lines: &[String]) Vec<String>
      -is_plain_paragraph_line(line: &str) bool
      -prune_deeper(indent: usize, inclusive: bool, indent_stack: &mut Vec<usize>, counters: &mut HashMap<usize,usize>)
    }

    class ProcessMod {
      <<module>>
      +process_stream_inner(lines: &[String], opts: Options) Vec<String>
    }

    class FenceTracker {
      +observe(line: &str) bool
      +in_fence() bool
    }

    class TableMod {
      <<module>>
      +reflow_table(lines: &[String]) Vec<String>
      -calculate_and_format(parsed: &ParsedTable, indent: &str) Vec<String>
    }

    class ParsedTable {
      +cleaned: Vec<Vec<String>>
      +output_rows: Vec<Vec<String>>
      +sep_cells: Option<Vec<String>>
      +max_cols: usize
    }

    ProcessMod ..> ProcessBuffer : manages
    ProcessMod ..> FenceTracker : tracks
    ProcessMod ..> HtmlMod : calls convert_html_tables

    ProcessBuffer ..> FenceTracker : uses
    ProcessBuffer ..> TableMod : uses reflow_table

    HtmlMod ..> HtmlTableState : uses

    ListsMod ..> ListState : manages

    TableMod ..> ParsedTable : consumes
Loading

File-Level Changes

Change Details Files
Refactor footnote renumbering to use reusable helpers and scan/accumulator context structs.
  • Extract collect_reference_mapping_from_text to handle mapping collection for text tokens
  • Introduce DefinitionScanContext and DefinitionAccumulator to manage mapping, numeric range, and collected definitions
  • Split definition scanning into collect_scan_updates and finalize_numeric_candidates for clearer numeric candidate handling
  • Factor definition_line_from_parts and numeric_candidate_from_line helpers to construct DefinitionLine and NumericCandidate values
src/footnotes/renumber.rs
src/footnotes/parsing.rs
Restructure paragraph wrapping into stateful writer and helpers, including fence and prefix handling.
  • Add ParagraphState and ParagraphWriter to encapsulate paragraph buffering, indent tracking, and flushing
  • Extract is_passthrough_block, prefix_line, and line_break_parts helpers for block classification, prefix extraction, and hard-break detection
  • Update wrap_text to operate via ParagraphWriter/ParagraphState and delegate fence handling to wrap::fence
  • Adjust wrap::paragraph helpers to methods on ParagraphWriter and use PrefixLine with Cow for prefixes
src/wrap.rs
src/wrap/paragraph.rs
src/wrap/fence.rs
Improve code/emphasis merging logic by extracting helpers and simplifying fix_code_emphasis.
  • Add has_code_emphasis_adjacent guard to avoid unnecessary processing
  • Introduce handle_text_token, try_fold_matching_emphasis, consume_code_affixes, and handle_code_token to encapsulate text/code handling
  • Refactor fix_code_emphasis main loop to delegate to the new helpers while preserving behavior
src/code_emphasis.rs
Refactor inline tokenization and scanning utilities for clarity and reuse.
  • Add append_escaped_backtick, scan_plain_text_end, and should_stop_plain_text to simplify segment_inline’s control flow
  • Extract ScanCollectCase struct in scanning tests to consolidate parameters and reduce repetition
src/wrap/tokenize/mod.rs
src/wrap/tokenize/scanning.rs
Introduce range-based line buffer splitting context for inline wrapping.
  • Add SplitContext struct to carry lines and width for splitting operations
  • Change LineBuffer::split_with_span and flush_trailing_whitespace to accept Range instead of start/end pairs
  • Update wrap::inline and wrap tests to use SplitContext and Range-based calls
src/wrap/line_buffer.rs
src/wrap/inline.rs
src/wrap/tests.rs
Refactor list renumbering into a ListState helper struct.
  • Introduce ListState to hold indent_stack and counters with helper methods prune_deeper and handle_paragraph_restart
  • Adjust renumber_lists to use ListState instead of passing raw stacks and maps to helpers
src/lists.rs
Refine HTML whitespace collapsing and table conversion using dedicated helpers and state structs.
  • Extract push_collapsed_text_char to handle whitespace collapsing in collect_text
  • Factor HTML table line accumulation into append_html_table_line and flush_completed_html_table helpers
  • Introduce HtmlTableState to manage HTML table buffering, depth, and flush behavior in convert_html_tables
src/html.rs
Simplify process pipeline buffering around tables and fences using a ProcessBuffer struct.
  • Add ProcessBuffer to own out/buf/in_table with methods flush, push_verbatim, handle_fence_line, and handle_table_line
  • Update process_stream_inner to route all buffering/table handling through ProcessBuffer
src/process.rs
Tidy up tests by using case structs and shared helpers.
  • Replace many rstest parameter lists with PrefixEmptyCase, FrontmatterSplitCase, and ScanCollectCase structs
  • Extract scan_code_spans, count_backticks, and toggle_code_span helpers for list-wrapping assertions in common test utilities
src/frontmatter.rs
src/wrap/tokenize/scanning.rs
tests/common/mod.rs
Streamline table reflow formatting interface.
  • Collapse calculate_and_format into a ParsedTable-based helper that uses its fields directly
  • Update reflow_table to call the new calculate_and_format signature
src/table.rs
Improve fence specifier attachment by extracting orphan target helpers.
  • Add orphan_specifier_target and orphan_specifier_target_without_language helpers to locate applicable fences
  • Use these helpers in attach_orphan_specifiers to simplify control flow and language checks
src/fences.rs
Add cargo-binstall support and Linux tar.gz release artifacts in CI.
  • Add package.metadata.binstall configuration in Cargo.toml for Linux GNU x86_64/aarch64 with tgz archives and bin-dir mapping
  • Extend release workflow matrix with cargo_binstall_archive flag per target
  • Update artifact preparation step to optionally build versioned --.tar.gz plus checksum for supported Linux targets
Cargo.toml
.github/workflows/release.yml
Update documentation for cargo-binstall and adjusted release targets.
  • Document cargo-binstall installation for Linux x86_64/aarch64 in README
  • Explain cargo-binstall archives, their naming, and matrix semantics in release-process docs; note Linux/FreeBSD support set
README.md
docs/release-process.md

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@leynos leynos marked this pull request as ready for review April 18, 2026 14:53
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 5 issues, and left some high level feedback:

  • The new ListState type still exposes its indent_stack and counters to external mutation in a few places (e.g. direct clear() calls); consider encapsulating these operations behind ListState methods so list-renumbering behaviour remains centralized and easier to reason about.
  • In the HTML table handling, HtmlTableState::flush_raw and the manual if !html_state.buf.is_empty() { out.extend(html_state.buf); } at the end of convert_html_tables now duplicate behaviour; you could reuse flush_raw there to keep the flushing logic in a single place.
  • The introduction of multiple small helpers in wrap::inline and wrap::line_buffer (e.g. SplitContext, passing Range instead of start/end indices) improves clarity but also adds boilerplate at call sites; you might consider providing thin convenience wrappers for the common cases (e.g. constructing SplitContext once per wrap call) to keep the main wrapping loop easier to scan.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new `ListState` type still exposes its `indent_stack` and `counters` to external mutation in a few places (e.g. direct `clear()` calls); consider encapsulating these operations behind `ListState` methods so list-renumbering behaviour remains centralized and easier to reason about.
- In the HTML table handling, `HtmlTableState::flush_raw` and the manual `if !html_state.buf.is_empty() { out.extend(html_state.buf); }` at the end of `convert_html_tables` now duplicate behaviour; you could reuse `flush_raw` there to keep the flushing logic in a single place.
- The introduction of multiple small helpers in `wrap::inline` and `wrap::line_buffer` (e.g. `SplitContext`, passing `Range` instead of start/end indices) improves clarity but also adds boilerplate at call sites; you might consider providing thin convenience wrappers for the common cases (e.g. constructing `SplitContext` once per wrap call) to keep the main wrapping loop easier to scan.

## Individual Comments

### Comment 1
<location path="Cargo.toml" line_range="16" />
<code_context>
+
+[package.metadata.binstall.overrides.'cfg(all(target_os = "linux", any(target_arch = "x86_64", target_arch = "aarch64"), target_env = "gnu"))']
+pkg-url = "{ repo }/releases/download/v{ version }/{ name }-{ version }-{ target }.tar.gz"
+bin-dir = "{ bin }{ binary-ext }"
+pkg-fmt = "tgz"
+
</code_context>
<issue_to_address>
**issue (bug_risk):** The `bin-dir` value in the cargo-binstall metadata looks like a file path instead of a directory, which may break installation.

In cargo-binstall metadata, `bin-dir` should point to the directory inside the archive that contains the binary (e.g. `"."` or `"bin"`), not the full binary path. With `bin-dir = "{ bin }{ binary-ext }"`, this expands to the full filename, while your `.tar.gz` places the binary at the archive root. To match the archive layout and let cargo-binstall find the binary, this should be `bin-dir = "."` (or another actual directory path if you change the archive structure).
</issue_to_address>

### Comment 2
<location path="src/footnotes/renumber.rs" line_range="282" />
<code_context>
     is_definition_line: Vec<bool>,
 }

+struct DefinitionScanContext<'a> {
+    mapping: &'a mut HashMap<usize, usize>,
+    next_number: &'a mut usize,
</code_context>
<issue_to_address>
**issue (complexity):** Consider collapsing the new scan context and accumulator into a single state struct and driving the scan/finalize flow directly from `collect_definition_updates` to reduce indirection.

You can keep the new helpers/behavior but reduce indirection by collapsing the scan context/accumulator and merging the two-phase scan/finalize into a single function-local flow.

### 1. Collapse `DefinitionScanContext` + `DefinitionAccumulator` into one local state

Instead of passing a `DefinitionScanContext` and `DefinitionAccumulator` between helpers, you can use a single local state struct inside `collect_definition_updates`:

```rust
struct ScanState<'a> {
    mapping: &'a mut HashMap<usize, usize>,
    next_number: &'a mut usize,
    numeric_list_range: Option<(usize, usize)>,
    skip_numeric_conversion: bool,

    definitions: Vec<DefinitionLine>,
    is_definition_line: Vec<bool>,
    numeric_candidates: Vec<NumericCandidate>,
}
```

Then `collect_definition_updates` owns the control flow and state:

```rust
fn collect_definition_updates(
    lines: &[String],
    mapping: &mut HashMap<usize, usize>,
) -> DefinitionUpdates {
    let mut next_number = mapping.values().copied().max().unwrap_or(0) + 1;
    let numeric_list_range = footnote_block_range(lines);
    let skip_numeric_conversion = numeric_list_range
        .as_ref()
        .is_some_and(|(start, _)| has_existing_footnote_block(lines, *start));

    let mut state = ScanState {
        mapping,
        next_number: &mut next_number,
        numeric_list_range,
        skip_numeric_conversion,
        definitions: Vec::new(),
        is_definition_line: vec![false; lines.len()],
        numeric_candidates: Vec::new(),
    };

    collect_scan_updates(lines, &mut state);
    finalize_numeric_candidates(&mut state);

    DefinitionUpdates {
        definitions: state.definitions,
        is_definition_line: state.is_definition_line,
    }
}
```

### 2. Simplify `collect_scan_updates` / `finalize_numeric_candidates` signatures

With the unified `ScanState`, the helpers become simpler and avoid tuple juggling and separate accumulator types:

```rust
fn collect_scan_updates(lines: &[String], state: &mut ScanState<'_>) {
    let mut in_fence = false;

    for (index, line) in lines.iter().enumerate() {
        if is_fence_line(line) {
            in_fence = !in_fence;
            continue;
        }
        if in_fence {
            continue;
        }

        if let Some(parts) = parse_definition(line) {
            state.definitions.push(definition_line_from_parts(
                index,
                parts,
                state.mapping,
                state.next_number,
            ));
            state.is_definition_line[index] = true;
            continue;
        }

        if !should_convert_numeric_line(
            index,
            state.numeric_list_range,
            state.skip_numeric_conversion,
        ) {
            continue;
        }
        if state.mapping.is_empty() && state.definitions.is_empty() {
            continue;
        }
        if let Some(candidate) = numeric_candidate_from_line(line, index) {
            state.numeric_candidates.push(candidate);
        }
    }
}

fn finalize_numeric_candidates(state: &mut ScanState<'_>) {
    for candidate in state.numeric_candidates.drain(..).rev() {
        let new_number = assign_new_number(state.mapping, candidate.number, state.next_number);
        let rewritten_rest = rewrite_tokens(&candidate.rest, state.mapping);
        let mut line = String::with_capacity(
            candidate.indent.len() + candidate.whitespace.len() + rewritten_rest.len() + 8,
        );
        line.push_str(&candidate.indent);
        write!(&mut line, "[^{new_number}]:").expect("write to string cannot fail");
        line.push_str(&candidate.whitespace);
        line.push_str(&rewritten_rest);
        state.definitions.push(DefinitionLine {
            index: candidate.index,
            new_number,
            line,
        });
        state.is_definition_line[candidate.index] = true;
    }
}
```

This keeps:

- `definition_line_from_parts` and `numeric_candidate_from_line` as reusable helpers.
- The “scan then reverse-apply numeric candidates” behavior.

But it:

- Removes `DefinitionScanContext` and `DefinitionAccumulator` as separate concepts.
- Avoids `(acc, numeric_candidates)` tuples and multiple lifetimes being threaded around.
- Keeps all the logic reachable from one primary function (`collect_definition_updates`), which matches the original linear mental model while retaining your refactoring improvements.
</issue_to_address>

### Comment 3
<location path="src/wrap.rs" line_range="124" />
<code_context>
 #[must_use]
 pub fn wrap_text(lines: &[String], width: usize) -> Vec<String> {
     let mut out = Vec::new();
-    let mut buf: Vec<(String, bool)> = Vec::new();
</code_context>
<issue_to_address>
**issue (complexity):** Consider merging `ParagraphState` into `ParagraphWriter`, simplifying `PrefixLine`, and inlining passthrough checks to make `wrap_text`’s control flow and APIs more linear and self‑contained.

1. Merge `ParagraphState` into `ParagraphWriter` to reduce cross‑type coupling  
Right now every call site has to thread `&mut writer` and `&mut state` together, and behavior depends on both. You can keep the same external behavior while collapsing them into a single type that owns the paragraph buffer and indentation, and exposes the same high‑level methods.

Example of the shape of the change:

```rust
// before
let mut state = ParagraphState::default();
let mut writer = ParagraphWriter::new(&mut out, width);

for line in lines {
    if fence::handle_fence_line(line, &mut writer, &mut state, &mut fence_tracker) { … }
    if fence_tracker.in_fence() {
        writer.push_verbatim(&mut state, line);
        continue;
    }
    if is_passthrough_block(line) {
        writer.push_verbatim(&mut state, line);
        continue;
    }
    if let Some(prefix_line) = prefix_line(line) {
        writer.handle_prefix_line(&mut state, &prefix_line);
        continue;
    }

    state.note_indent(line);
    let (text, hard_break) = line_break_parts(line);
    state.push(text, hard_break);
}

writer.flush_paragraph(&mut state);
```

```rust
// after: single struct owns the state
pub struct ParagraphWriter<'a> {
    out: &'a mut Vec<String>,
    width: usize,
    // moved fields from ParagraphState:
    buf: Vec<(String, bool)>,
    indent: String,
    // any other state...
}

impl<'a> ParagraphWriter<'a> {
    pub fn new(out: &'a mut Vec<String>, width: usize) -> Self { … }

    pub fn push_verbatim(&mut self, line: &str) { … }            // internally flushes buf/indent as needed
    pub fn handle_prefix_line(&mut self, prefix_line: &PrefixLine<'_>) { … }
    pub fn note_indent(&mut self, line: &str) { … }
    pub fn push_wrapped(&mut self, text: String, hard_break: bool) { … }
    pub fn flush_paragraph(&mut self) { … }
}
```

```rust
// wrap_text becomes simpler to read:
let mut writer = ParagraphWriter::new(&mut out, width);

for line in lines {
    if fence::handle_fence_line(line, &mut writer, &mut fence_tracker) {
        continue;
    }
    if fence_tracker.in_fence() {
        writer.push_verbatim(line);
        continue;
    }
    if is_passthrough_block(line) {
        writer.push_verbatim(line);
        continue;
    }
    if let Some(prefix_line) = prefix_line(line) {
        writer.handle_prefix_line(&prefix_line);
        continue;
    }

    writer.note_indent(line);
    let (text, hard_break) = line_break_parts(line);
    writer.push_wrapped(text, hard_break);
}

writer.flush_paragraph();
```

This keeps all current semantics (the logic inside `ParagraphState` just moves into `ParagraphWriter`), but reduces the number of moving parts and hidden early returns (only one object mutates paragraph state).

---

2. Simplify `PrefixLine` by avoiding `Cow` unless you really need it  
The only owned case is the footnote prefix `format!("{prefix}{marker}")`. If that optimization isn’t performance‑critical, you can drop `Cow` from public signatures and keep ownership concerns inside the writer. This makes lifetimes and method signatures easier to follow.

Example:

```rust
// before
pub struct PrefixLine<'a> {
    pub prefix: Cow<'a, str>,
    pub rest: &'a str,
    pub repeat_prefix: bool,
}
```

```rust
// after: only borrows; writer can allocate when necessary
pub struct PrefixLine<'a> {
    pub prefix: &'a str,
    pub rest: &'a str,
    pub repeat_prefix: bool,
}
```

```rust
fn prefix_line(line: &str) -> Option<PrefixLine<'_>> {
    if let Some(cap) = BULLET_RE.captures(line) {
        return Some(PrefixLine {
            prefix: cap.get(1).expect("bullet regex capture").as_str(),
            rest: cap.get(2).expect("bullet regex remainder capture").as_str(),
            repeat_prefix: false,
        });
    }

    if let Some(cap) = FOOTNOTE_RE.captures(line) {
        let prefix = cap.get(1).expect("footnote prefix capture").as_str();
        let marker = cap.get(2).expect("footnote marker capture").as_str();
        let rest = cap
            .get(3)
            .expect("footnote regex remainder capture")
            .as_str();
        // pass prefix + marker separately; concatenate inside writer
        return Some(PrefixLine {
            prefix,
            rest,
            repeat_prefix: false,
        });
    }

    BLOCKQUOTE_RE.captures(line).map(|cap| PrefixLine {
        prefix: cap.get(1).expect("blockquote prefix capture").as_str(),
        rest: cap.get(2).expect("blockquote regex remainder capture").as_str(),
        repeat_prefix: true,
    })
}
```

Then let `ParagraphWriter::handle_prefix_line` do the one allocation for footnotes when emitting the first line of the paragraph, instead of requiring `Cow` throughout the API.

---

3. Make passthrough control flow more linear in `wrap_text`  
The small helpers are fine, but you can make the early‑return cases easier to scan by inlining `is_table_or_separator` into `is_passthrough_block` and keeping all passthrough checks together:

```rust
fn is_passthrough_block(line: &str) -> bool {
    let trimmed = line.trim();
    line.trim_start().starts_with('|')
        || crate::table::SEP_RE.is_match(trimmed)
        || matches!(
            classify_block(line),
            Some(BlockKind::Heading | BlockKind::MarkdownlintDirective)
        )
        || trimmed.is_empty()
        || is_indented_code_line(line)
}
```

This keeps `wrap_text`’s top section as a clear sequence of “fence → passthrough → prefix → paragraph” checks without requiring the reader to jump between multiple tiny predicates.
</issue_to_address>

### Comment 4
<location path="src/wrap/paragraph.rs" line_range="18" />
<code_context>
+    true
+}
+
+#[derive(Default)]
+struct HtmlTableState {
+    buf: Vec<String>,
</code_context>
<issue_to_address>
**issue (complexity):** Consider merging `ParagraphState` into `ParagraphWriter`, inlining the prefix-wrapping logic into `handle_prefix_line`, and potentially simplifying `PrefixLine` to reduce cognitive overhead while preserving behavior.

You can retain the new behavior while reducing the number of moving parts by collapsing `ParagraphState` into `ParagraphWriter` and inlining some of the prefix handling. This removes the “two objects + lifetimes” cognitive load without changing semantics.

### 1. Merge `ParagraphState` into `ParagraphWriter`

Instead of passing `ParagraphState` around, let `ParagraphWriter` own the paragraph state. That way `wrap_text` only needs a single `&mut ParagraphWriter`.

```rust
pub(super) struct ParagraphWriter<'a> {
    out: &'a mut Vec<String>,
    width: usize,
    buf: Vec<(String, bool)>,
    indent: String,
}

impl<'a> ParagraphWriter<'a> {
    pub(super) fn new(out: &'a mut Vec<String>, width: usize) -> Self {
        Self {
            out,
            width,
            buf: Vec::new(),
            indent: String::new(),
        }
    }

    pub(super) fn clear(&mut self) {
        self.buf.clear();
        self.indent.clear();
    }

    pub(super) fn note_indent(&mut self, line: &str) {
        if self.buf.is_empty() {
            self.indent = line.chars().take_while(|c| c.is_whitespace()).collect();
        }
    }

    pub(super) fn push(&mut self, text: String, hard_break: bool) {
        self.buf.push((text, hard_break));
    }

    pub(super) fn flush_paragraph(&mut self) {
        if self.buf.is_empty() {
            return;
        }

        let mut segment = String::new();
        for (text, hard_break) in &self.buf {
            if !segment.is_empty() {
                segment.push(' ');
            }
            segment.push_str(text);
            if *hard_break {
                self.push_wrapped_segment(&self.indent, &segment);
                segment.clear();
            }
        }

        if !segment.is_empty() {
            self.push_wrapped_segment(&self.indent, &segment);
        }

        self.clear();
    }

    fn push_wrapped_segment(&mut self, indent: &str, segment: &str) {
        for line in wrap_preserving_code(segment, self.width - indent.len()) {
            self.out.push(format!("{indent}{line}"));
        }
    }

    pub(super) fn push_verbatim(&mut self, line: &str) {
        self.flush_paragraph();
        self.out.push(line.to_string());
    }
}
```

Callers that currently pass `&mut ParagraphState` can instead just use `&mut ParagraphWriter` (e.g. `writer.push(...)`, `writer.note_indent(...)`, etc.), and `flush_paragraph`/`push_verbatim` no longer need a `state` parameter.

### 2. Inline `append_wrapped_with_prefix` into `handle_prefix_line`

If `append_wrapped_with_prefix` is only used from `handle_prefix_line`, you can inline it to avoid one extra layer of indirection:

```rust
impl<'a> ParagraphWriter<'a> {
    pub(super) fn handle_prefix_line(&mut self, prefix_line: &PrefixLine<'_>) {
        self.flush_paragraph();

        let prefix = prefix_line.prefix.as_ref();
        let prefix_width = UnicodeWidthStr::width(prefix);
        let available = self.width.saturating_sub(prefix_width).max(1);

        let indent_str: String =
            prefix.chars().take_while(|c| c.is_whitespace()).collect();
        let indent_width = UnicodeWidthStr::width(indent_str.as_str());
        let wrapped_indent = if prefix_line.repeat_prefix {
            prefix.to_string()
        } else {
            format!("{}{}", indent_str, " ".repeat(prefix_width - indent_width))
        };

        let lines = wrap_preserving_code(prefix_line.rest, available);
        if lines.is_empty() {
            self.out.push(prefix.to_string());
            return;
        }

        for (index, wrapped_line) in lines.iter().enumerate() {
            if index == 0 {
                self.out.push(format!("{prefix}{wrapped_line}"));
            } else {
                self.out.push(format!("{wrapped_indent}{wrapped_line}"));
            }
        }
    }
}
```

This keeps the behavior identical but localizes all “prefix” behavior in one method.

### 3. Consider simplifying `PrefixLine` if `Cow` isn’t pulling its weight

If you don’t actually need to pass owned prefixes frequently, you can simplify the type:

```rust
pub(super) struct PrefixLine<'a> {
    pub(super) prefix: &'a str,
    pub(super) rest: &'a str,
    pub(super) repeat_prefix: bool,
}
```

Callers that need an owned prefix can still build a `String` and hold it locally, then pass `&prefix_string` into `PrefixLine`. This keeps the public API simpler while preserving all capabilities.
</issue_to_address>

### Comment 5
<location path="src/html.rs" line_range="228" />
<code_context>
+    true
+}
+
+#[derive(Default)]
+struct HtmlTableState {
+    buf: Vec<String>,
</code_context>
<issue_to_address>
**issue (complexity):** Consider consolidating table parsing into a single `HtmlTableState` abstraction that owns its helpers and state so callers no longer juggle parallel `buf + depth` logic and flags.

You now have two parallel abstractions for table state (`buf + depth` in `html_table_to_markdown` and `HtmlTableState` in `convert_html_tables`) plus tiny free helpers. You can keep the refactor while reducing moving parts by:

1. **Move the free helpers into `HtmlTableState` and hide the `depth` contract.**

```rust
impl HtmlTableState {
    fn push_html_line(&mut self, line: &str, out: &mut Vec<String>) -> bool {
        self.buf.push(line.to_string());
        self.depth += TABLE_START_RE.find_iter(line).count();
        if TABLE_END_RE.is_match(line) {
            self.depth = self.depth.saturating_sub(TABLE_END_RE.find_iter(line).count());
        }
        if self.depth == 0 {
            out.extend(table_lines_to_markdown(&self.buf));
            self.buf.clear();
            self.in_html = false;
            return true;
        }
        true
    }
}
```

Then `convert_html_tables` doesn’t need to know about `append_html_table_line` / `flush_completed_html_table`:

```rust
if html_state.in_html {
    html_state.push_html_line(line, &mut out);
    continue;
}

if TABLE_START_RE.is_match(line.trim_start()) {
    html_state.in_html = true;
    html_state.push_html_line(line, &mut out);
    continue;
}
```

`append_html_table_line` and `flush_completed_html_table` can then be private helpers *inside* the impl or removed entirely.

2. **Reuse `HtmlTableState` in `html_table_to_markdown` to avoid duplicating the state machine.**

```rust
pub(crate) fn html_table_to_markdown(lines: &[String]) -> Vec<String> {
    let mut out = Vec::new();
    let mut state = HtmlTableState::default();

    for line in lines {
        if state.depth > 0 || TABLE_START_RE.is_match(line.trim_start()) {
            state.in_html = true;
            state.push_html_line(line, &mut out);
            continue;
        }
        out.push(line.clone());
    }

    state.flush_raw(&mut out);
    out
}
```

This removes the second manual `buf + depth` pair and keeps all table behavior in one place.

3. **Consider dropping `in_html` and derive it from state.**

If you don’t have any “active but empty” table state, you can use `depth > 0` or `!buf.is_empty()` instead of a separate flag:

```rust
impl HtmlTableState {
    fn in_html(&self) -> bool {
        self.depth > 0
    }
}
```

Then `convert_html_tables` becomes:

```rust
if html_state.in_html() {
    html_state.push_html_line(line, &mut out);
    continue;
}

if TABLE_START_RE.is_match(line.trim_start()) {
    html_state.push_html_line(line, &mut out);
    continue;
}
```

This keeps the new organization but collapses the table handling into a single, self-contained abstraction with a clear API.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread Cargo.toml

[package.metadata.binstall.overrides.'cfg(all(target_os = "linux", any(target_arch = "x86_64", target_arch = "aarch64"), target_env = "gnu"))']
pkg-url = "{ repo }/releases/download/v{ version }/{ name }-{ version }-{ target }.tar.gz"
bin-dir = "{ bin }{ binary-ext }"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): The bin-dir value in the cargo-binstall metadata looks like a file path instead of a directory, which may break installation.

In cargo-binstall metadata, bin-dir should point to the directory inside the archive that contains the binary (e.g. "." or "bin"), not the full binary path. With bin-dir = "{ bin }{ binary-ext }", this expands to the full filename, while your .tar.gz places the binary at the archive root. To match the archive layout and let cargo-binstall find the binary, this should be bin-dir = "." (or another actual directory path if you change the archive structure).

Comment thread src/footnotes/renumber.rs
is_definition_line: Vec<bool>,
}

struct DefinitionScanContext<'a> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider collapsing the new scan context and accumulator into a single state struct and driving the scan/finalize flow directly from collect_definition_updates to reduce indirection.

You can keep the new helpers/behavior but reduce indirection by collapsing the scan context/accumulator and merging the two-phase scan/finalize into a single function-local flow.

1. Collapse DefinitionScanContext + DefinitionAccumulator into one local state

Instead of passing a DefinitionScanContext and DefinitionAccumulator between helpers, you can use a single local state struct inside collect_definition_updates:

struct ScanState<'a> {
    mapping: &'a mut HashMap<usize, usize>,
    next_number: &'a mut usize,
    numeric_list_range: Option<(usize, usize)>,
    skip_numeric_conversion: bool,

    definitions: Vec<DefinitionLine>,
    is_definition_line: Vec<bool>,
    numeric_candidates: Vec<NumericCandidate>,
}

Then collect_definition_updates owns the control flow and state:

fn collect_definition_updates(
    lines: &[String],
    mapping: &mut HashMap<usize, usize>,
) -> DefinitionUpdates {
    let mut next_number = mapping.values().copied().max().unwrap_or(0) + 1;
    let numeric_list_range = footnote_block_range(lines);
    let skip_numeric_conversion = numeric_list_range
        .as_ref()
        .is_some_and(|(start, _)| has_existing_footnote_block(lines, *start));

    let mut state = ScanState {
        mapping,
        next_number: &mut next_number,
        numeric_list_range,
        skip_numeric_conversion,
        definitions: Vec::new(),
        is_definition_line: vec![false; lines.len()],
        numeric_candidates: Vec::new(),
    };

    collect_scan_updates(lines, &mut state);
    finalize_numeric_candidates(&mut state);

    DefinitionUpdates {
        definitions: state.definitions,
        is_definition_line: state.is_definition_line,
    }
}

2. Simplify collect_scan_updates / finalize_numeric_candidates signatures

With the unified ScanState, the helpers become simpler and avoid tuple juggling and separate accumulator types:

fn collect_scan_updates(lines: &[String], state: &mut ScanState<'_>) {
    let mut in_fence = false;

    for (index, line) in lines.iter().enumerate() {
        if is_fence_line(line) {
            in_fence = !in_fence;
            continue;
        }
        if in_fence {
            continue;
        }

        if let Some(parts) = parse_definition(line) {
            state.definitions.push(definition_line_from_parts(
                index,
                parts,
                state.mapping,
                state.next_number,
            ));
            state.is_definition_line[index] = true;
            continue;
        }

        if !should_convert_numeric_line(
            index,
            state.numeric_list_range,
            state.skip_numeric_conversion,
        ) {
            continue;
        }
        if state.mapping.is_empty() && state.definitions.is_empty() {
            continue;
        }
        if let Some(candidate) = numeric_candidate_from_line(line, index) {
            state.numeric_candidates.push(candidate);
        }
    }
}

fn finalize_numeric_candidates(state: &mut ScanState<'_>) {
    for candidate in state.numeric_candidates.drain(..).rev() {
        let new_number = assign_new_number(state.mapping, candidate.number, state.next_number);
        let rewritten_rest = rewrite_tokens(&candidate.rest, state.mapping);
        let mut line = String::with_capacity(
            candidate.indent.len() + candidate.whitespace.len() + rewritten_rest.len() + 8,
        );
        line.push_str(&candidate.indent);
        write!(&mut line, "[^{new_number}]:").expect("write to string cannot fail");
        line.push_str(&candidate.whitespace);
        line.push_str(&rewritten_rest);
        state.definitions.push(DefinitionLine {
            index: candidate.index,
            new_number,
            line,
        });
        state.is_definition_line[candidate.index] = true;
    }
}

This keeps:

  • definition_line_from_parts and numeric_candidate_from_line as reusable helpers.
  • The “scan then reverse-apply numeric candidates” behavior.

But it:

  • Removes DefinitionScanContext and DefinitionAccumulator as separate concepts.
  • Avoids (acc, numeric_candidates) tuples and multiple lifetimes being threaded around.
  • Keeps all the logic reachable from one primary function (collect_definition_updates), which matches the original linear mental model while retaining your refactoring improvements.

Comment thread src/wrap.rs
/// # Panics
/// Panics if regex captures fail unexpectedly.
#[must_use]
pub fn wrap_text(lines: &[String], width: usize) -> Vec<String> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider merging ParagraphState into ParagraphWriter, simplifying PrefixLine, and inlining passthrough checks to make wrap_text’s control flow and APIs more linear and self‑contained.

  1. Merge ParagraphState into ParagraphWriter to reduce cross‑type coupling
    Right now every call site has to thread &mut writer and &mut state together, and behavior depends on both. You can keep the same external behavior while collapsing them into a single type that owns the paragraph buffer and indentation, and exposes the same high‑level methods.

Example of the shape of the change:

// before
let mut state = ParagraphState::default();
let mut writer = ParagraphWriter::new(&mut out, width);

for line in lines {
    if fence::handle_fence_line(line, &mut writer, &mut state, &mut fence_tracker) {}
    if fence_tracker.in_fence() {
        writer.push_verbatim(&mut state, line);
        continue;
    }
    if is_passthrough_block(line) {
        writer.push_verbatim(&mut state, line);
        continue;
    }
    if let Some(prefix_line) = prefix_line(line) {
        writer.handle_prefix_line(&mut state, &prefix_line);
        continue;
    }

    state.note_indent(line);
    let (text, hard_break) = line_break_parts(line);
    state.push(text, hard_break);
}

writer.flush_paragraph(&mut state);
// after: single struct owns the state
pub struct ParagraphWriter<'a> {
    out: &'a mut Vec<String>,
    width: usize,
    // moved fields from ParagraphState:
    buf: Vec<(String, bool)>,
    indent: String,
    // any other state...
}

impl<'a> ParagraphWriter<'a> {
    pub fn new(out: &'a mut Vec<String>, width: usize) -> Self {}

    pub fn push_verbatim(&mut self, line: &str) {}            // internally flushes buf/indent as needed
    pub fn handle_prefix_line(&mut self, prefix_line: &PrefixLine<'_>) {}
    pub fn note_indent(&mut self, line: &str) {}
    pub fn push_wrapped(&mut self, text: String, hard_break: bool) {}
    pub fn flush_paragraph(&mut self) {}
}
// wrap_text becomes simpler to read:
let mut writer = ParagraphWriter::new(&mut out, width);

for line in lines {
    if fence::handle_fence_line(line, &mut writer, &mut fence_tracker) {
        continue;
    }
    if fence_tracker.in_fence() {
        writer.push_verbatim(line);
        continue;
    }
    if is_passthrough_block(line) {
        writer.push_verbatim(line);
        continue;
    }
    if let Some(prefix_line) = prefix_line(line) {
        writer.handle_prefix_line(&prefix_line);
        continue;
    }

    writer.note_indent(line);
    let (text, hard_break) = line_break_parts(line);
    writer.push_wrapped(text, hard_break);
}

writer.flush_paragraph();

This keeps all current semantics (the logic inside ParagraphState just moves into ParagraphWriter), but reduces the number of moving parts and hidden early returns (only one object mutates paragraph state).


  1. Simplify PrefixLine by avoiding Cow unless you really need it
    The only owned case is the footnote prefix format!("{prefix}{marker}"). If that optimization isn’t performance‑critical, you can drop Cow from public signatures and keep ownership concerns inside the writer. This makes lifetimes and method signatures easier to follow.

Example:

// before
pub struct PrefixLine<'a> {
    pub prefix: Cow<'a, str>,
    pub rest: &'a str,
    pub repeat_prefix: bool,
}
// after: only borrows; writer can allocate when necessary
pub struct PrefixLine<'a> {
    pub prefix: &'a str,
    pub rest: &'a str,
    pub repeat_prefix: bool,
}
fn prefix_line(line: &str) -> Option<PrefixLine<'_>> {
    if let Some(cap) = BULLET_RE.captures(line) {
        return Some(PrefixLine {
            prefix: cap.get(1).expect("bullet regex capture").as_str(),
            rest: cap.get(2).expect("bullet regex remainder capture").as_str(),
            repeat_prefix: false,
        });
    }

    if let Some(cap) = FOOTNOTE_RE.captures(line) {
        let prefix = cap.get(1).expect("footnote prefix capture").as_str();
        let marker = cap.get(2).expect("footnote marker capture").as_str();
        let rest = cap
            .get(3)
            .expect("footnote regex remainder capture")
            .as_str();
        // pass prefix + marker separately; concatenate inside writer
        return Some(PrefixLine {
            prefix,
            rest,
            repeat_prefix: false,
        });
    }

    BLOCKQUOTE_RE.captures(line).map(|cap| PrefixLine {
        prefix: cap.get(1).expect("blockquote prefix capture").as_str(),
        rest: cap.get(2).expect("blockquote regex remainder capture").as_str(),
        repeat_prefix: true,
    })
}

Then let ParagraphWriter::handle_prefix_line do the one allocation for footnotes when emitting the first line of the paragraph, instead of requiring Cow throughout the API.


  1. Make passthrough control flow more linear in wrap_text
    The small helpers are fine, but you can make the early‑return cases easier to scan by inlining is_table_or_separator into is_passthrough_block and keeping all passthrough checks together:
fn is_passthrough_block(line: &str) -> bool {
    let trimmed = line.trim();
    line.trim_start().starts_with('|')
        || crate::table::SEP_RE.is_match(trimmed)
        || matches!(
            classify_block(line),
            Some(BlockKind::Heading | BlockKind::MarkdownlintDirective)
        )
        || trimmed.is_empty()
        || is_indented_code_line(line)
}

This keeps wrap_text’s top section as a clear sequence of “fence → passthrough → prefix → paragraph” checks without requiring the reader to jump between multiple tiny predicates.

Comment thread src/wrap/paragraph.rs
pub(super) repeat_prefix: bool,
}

#[derive(Default)]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider merging ParagraphState into ParagraphWriter, inlining the prefix-wrapping logic into handle_prefix_line, and potentially simplifying PrefixLine to reduce cognitive overhead while preserving behavior.

You can retain the new behavior while reducing the number of moving parts by collapsing ParagraphState into ParagraphWriter and inlining some of the prefix handling. This removes the “two objects + lifetimes” cognitive load without changing semantics.

1. Merge ParagraphState into ParagraphWriter

Instead of passing ParagraphState around, let ParagraphWriter own the paragraph state. That way wrap_text only needs a single &mut ParagraphWriter.

pub(super) struct ParagraphWriter<'a> {
    out: &'a mut Vec<String>,
    width: usize,
    buf: Vec<(String, bool)>,
    indent: String,
}

impl<'a> ParagraphWriter<'a> {
    pub(super) fn new(out: &'a mut Vec<String>, width: usize) -> Self {
        Self {
            out,
            width,
            buf: Vec::new(),
            indent: String::new(),
        }
    }

    pub(super) fn clear(&mut self) {
        self.buf.clear();
        self.indent.clear();
    }

    pub(super) fn note_indent(&mut self, line: &str) {
        if self.buf.is_empty() {
            self.indent = line.chars().take_while(|c| c.is_whitespace()).collect();
        }
    }

    pub(super) fn push(&mut self, text: String, hard_break: bool) {
        self.buf.push((text, hard_break));
    }

    pub(super) fn flush_paragraph(&mut self) {
        if self.buf.is_empty() {
            return;
        }

        let mut segment = String::new();
        for (text, hard_break) in &self.buf {
            if !segment.is_empty() {
                segment.push(' ');
            }
            segment.push_str(text);
            if *hard_break {
                self.push_wrapped_segment(&self.indent, &segment);
                segment.clear();
            }
        }

        if !segment.is_empty() {
            self.push_wrapped_segment(&self.indent, &segment);
        }

        self.clear();
    }

    fn push_wrapped_segment(&mut self, indent: &str, segment: &str) {
        for line in wrap_preserving_code(segment, self.width - indent.len()) {
            self.out.push(format!("{indent}{line}"));
        }
    }

    pub(super) fn push_verbatim(&mut self, line: &str) {
        self.flush_paragraph();
        self.out.push(line.to_string());
    }
}

Callers that currently pass &mut ParagraphState can instead just use &mut ParagraphWriter (e.g. writer.push(...), writer.note_indent(...), etc.), and flush_paragraph/push_verbatim no longer need a state parameter.

2. Inline append_wrapped_with_prefix into handle_prefix_line

If append_wrapped_with_prefix is only used from handle_prefix_line, you can inline it to avoid one extra layer of indirection:

impl<'a> ParagraphWriter<'a> {
    pub(super) fn handle_prefix_line(&mut self, prefix_line: &PrefixLine<'_>) {
        self.flush_paragraph();

        let prefix = prefix_line.prefix.as_ref();
        let prefix_width = UnicodeWidthStr::width(prefix);
        let available = self.width.saturating_sub(prefix_width).max(1);

        let indent_str: String =
            prefix.chars().take_while(|c| c.is_whitespace()).collect();
        let indent_width = UnicodeWidthStr::width(indent_str.as_str());
        let wrapped_indent = if prefix_line.repeat_prefix {
            prefix.to_string()
        } else {
            format!("{}{}", indent_str, " ".repeat(prefix_width - indent_width))
        };

        let lines = wrap_preserving_code(prefix_line.rest, available);
        if lines.is_empty() {
            self.out.push(prefix.to_string());
            return;
        }

        for (index, wrapped_line) in lines.iter().enumerate() {
            if index == 0 {
                self.out.push(format!("{prefix}{wrapped_line}"));
            } else {
                self.out.push(format!("{wrapped_indent}{wrapped_line}"));
            }
        }
    }
}

This keeps the behavior identical but localizes all “prefix” behavior in one method.

3. Consider simplifying PrefixLine if Cow isn’t pulling its weight

If you don’t actually need to pass owned prefixes frequently, you can simplify the type:

pub(super) struct PrefixLine<'a> {
    pub(super) prefix: &'a str,
    pub(super) rest: &'a str,
    pub(super) repeat_prefix: bool,
}

Callers that need an owned prefix can still build a String and hold it locally, then pass &prefix_string into PrefixLine. This keeps the public API simpler while preserving all capabilities.

Comment thread src/html.rs
true
}

#[derive(Default)]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider consolidating table parsing into a single HtmlTableState abstraction that owns its helpers and state so callers no longer juggle parallel buf + depth logic and flags.

You now have two parallel abstractions for table state (buf + depth in html_table_to_markdown and HtmlTableState in convert_html_tables) plus tiny free helpers. You can keep the refactor while reducing moving parts by:

  1. Move the free helpers into HtmlTableState and hide the depth contract.
impl HtmlTableState {
    fn push_html_line(&mut self, line: &str, out: &mut Vec<String>) -> bool {
        self.buf.push(line.to_string());
        self.depth += TABLE_START_RE.find_iter(line).count();
        if TABLE_END_RE.is_match(line) {
            self.depth = self.depth.saturating_sub(TABLE_END_RE.find_iter(line).count());
        }
        if self.depth == 0 {
            out.extend(table_lines_to_markdown(&self.buf));
            self.buf.clear();
            self.in_html = false;
            return true;
        }
        true
    }
}

Then convert_html_tables doesn’t need to know about append_html_table_line / flush_completed_html_table:

if html_state.in_html {
    html_state.push_html_line(line, &mut out);
    continue;
}

if TABLE_START_RE.is_match(line.trim_start()) {
    html_state.in_html = true;
    html_state.push_html_line(line, &mut out);
    continue;
}

append_html_table_line and flush_completed_html_table can then be private helpers inside the impl or removed entirely.

  1. Reuse HtmlTableState in html_table_to_markdown to avoid duplicating the state machine.
pub(crate) fn html_table_to_markdown(lines: &[String]) -> Vec<String> {
    let mut out = Vec::new();
    let mut state = HtmlTableState::default();

    for line in lines {
        if state.depth > 0 || TABLE_START_RE.is_match(line.trim_start()) {
            state.in_html = true;
            state.push_html_line(line, &mut out);
            continue;
        }
        out.push(line.clone());
    }

    state.flush_raw(&mut out);
    out
}

This removes the second manual buf + depth pair and keeps all table behavior in one place.

  1. Consider dropping in_html and derive it from state.

If you don’t have any “active but empty” table state, you can use depth > 0 or !buf.is_empty() instead of a separate flag:

impl HtmlTableState {
    fn in_html(&self) -> bool {
        self.depth > 0
    }
}

Then convert_html_tables becomes:

if html_state.in_html() {
    html_state.push_html_line(line, &mut out);
    continue;
}

if TABLE_START_RE.is_match(line.trim_start()) {
    html_state.push_html_line(line, &mut out);
    continue;
}

This keeps the new organization but collapses the table handling into a single, self-contained abstraction with a clear API.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3d51aed524

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/wrap.rs
classify_block(line),
Some(BlockKind::Heading | BlockKind::MarkdownlintDirective)
)
|| line.trim().is_empty()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Treat whitespace-only passthrough lines as empty

Including line.trim().is_empty() in passthrough detection causes wrap_text to emit blank lines verbatim via writer.push_verbatim, so inputs like " " or "\t\t" are preserved instead of normalized to empty lines. That changes user-visible output (and conflicts with existing process_stream expectations for whitespace-only lines), leaving trailing whitespace in formatted documents.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/code_emphasis.rs`:
- Around line 101-128: Add a unit test that exercises the edge case in
consume_code_affixes where a non-empty pending prefix combines with a next Text
token whose split_marks(mid) is non-empty so the function resets prefix to empty
(the branch with "else { prefix = \"\"; }"). Construct tokens (using the Token
enum and IntoIter<Token<'a>> in the same way other tests do) so pending is a
non-empty marker string, the next Token::Text contains lead+mid+suffix (i.e.
both markers and content), and assert the returned tuple is ("" /*empty
prefix*/, lead or suffix as appropriate, true) and that the next text is sliced
by lead.len(); place the test alongside existing tests for code_emphasis.rs to
cover the documented "Mixed surrounding markers are left untouched" behavior.

In `@src/footnotes/renumber.rs`:
- Line 418: The use of .expect() on write!(&mut line, "[^{new_number}]:") must
be replaced with proper error propagation/handling: remove .expect(...) and use
the ? operator on the write! result (i.e. write!(...).? ) or map the error into
the function's existing error type, and update the enclosing function's return
type to Result<..., std::fmt::Error> (or convert via .map_err(...) if changing
the signature is not desired). Locate the write! invocation in renumber.rs (the
line variable and new_number) and apply the same fix as done earlier at the
other occurrence (replace .expect with ?/map_err and adjust the enclosing
function signature or error conversion accordingly).
- Around line 344-350: The code currently uses .expect() on regex captures
(caps.name("num") and caps.name("rest")) which is forbidden; replace both
.expect() calls with fallible handling using let-else guards (e.g., let
Some(num_match) = caps.name("num") else { return Err(...) } and similarly for
rest_match) or propagate a suitable error/result from the surrounding function
instead of panicking; then compute whitespace from line using the obtained
num_match and rest_match as before.
- Line 330: Replace the .expect() on the infallible write! call by ignoring the
fmt::Result instead of panicking; specifically update the write!(&mut line,
"[^{new_number}]:") usage (where `line` and `new_number` are defined) to use the
infallibility pattern (e.g., let _ = write!(...)) or otherwise handle the
fmt::Result, removing the .expect() call to comply with the no
.expect()/unwrap() guideline.

In `@src/html.rs`:
- Around line 211-216: The function append_html_table_line pushes the raw line
then counts opening <table> tags using TABLE_START_RE which is anchored at the
line start; trim the input before counting starts so indented/multiline tables
are recognized: replace the use of line when calling
TABLE_START_RE.find_iter(...) with a trimmed version (e.g., let trimmed =
line.trim_start()) and keep TABLE_END_RE matching on the original line as
before; update the increment to use TABLE_START_RE.find_iter(trimmed).count()
while leaving the decrement logic unchanged.

In `@src/lists.rs`:
- Around line 100-103: Add a derived Default impl to the ListState struct and
use its default constructor when initializing: add #[derive(Default)] above the
ListState declaration (which contains indent_stack: Vec<usize> and counters:
HashMap<usize, usize>) and replace the manual initializer with let mut state =
ListState::default(); so initialization is centralized and boilerplate is
removed.

In `@src/wrap.rs`:
- Around line 65-98: The prefix_line function currently calls .expect() on regex
capture groups (BULLET_RE, FOOTNOTE_RE, BLOCKQUOTE_RE) which must be removed;
change each branch to use let-else guards or safe captures and return None when
a capture is missing: for the BULLET_RE and FOOTNOTE_RE branches extract
cap.get(1), cap.get(2) (and cap.get(3) for FOOTNOTE_RE) with let Some(...) = ...
else { return None } before building the PrefixLine, and for the BLOCKQUOTE_RE
branch similarly map the capture to a PrefixLine only if cap.get(1) and
cap.get(2) are Some; ensure PrefixLine fields (prefix, rest, repeat_prefix) are
set the same way but without any .expect() calls.

In `@src/wrap/paragraph.rs`:
- Around line 100-103: The push_wrapped_segment implementation uses indent.len()
(byte length) causing wrong available width and potential underflow; change it
to compute the display width via UnicodeWidthStr::width(indent), then compute
available = self.width.saturating_sub(indent_width).max(1) and pass that
available value into wrap_preserving_code instead of self.width - indent.len();
update references in push_wrapped_segment and keep prefixing lines with the
original indent as before.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: f781191a-957a-4f54-9cb6-154c1a435958

📥 Commits

Reviewing files that changed from the base of the PR and between 8c50fc8 and 3d51aed.

📒 Files selected for processing (23)
  • .github/workflows/release.yml
  • Cargo.toml
  • README.md
  • docs/release-process.md
  • src/code_emphasis.rs
  • src/fences.rs
  • src/footnotes/inline.rs
  • src/footnotes/parsing.rs
  • src/footnotes/renumber.rs
  • src/frontmatter.rs
  • src/html.rs
  • src/lists.rs
  • src/process.rs
  • src/table.rs
  • src/wrap.rs
  • src/wrap/fence.rs
  • src/wrap/inline.rs
  • src/wrap/line_buffer.rs
  • src/wrap/paragraph.rs
  • src/wrap/tests.rs
  • src/wrap/tokenize/mod.rs
  • src/wrap/tokenize/scanning.rs
  • tests/common/mod.rs

Comment thread src/code_emphasis.rs
Comment on lines +101 to +128
fn consume_code_affixes<'a>(
tokens: &mut Peekable<IntoIter<Token<'a>>>,
pending: &mut &'a str,
) -> (&'a str, &'a str, bool) {
let mut prefix = std::mem::take(pending);
let mut suffix = "";
let mut modified = !prefix.is_empty();

let Some(Token::Text(next)) = tokens.peek_mut() else {
return (prefix, suffix, modified);
};

let (lead, mid, _) = split_marks(next);
if lead.is_empty() {
return (prefix, suffix, modified);
}

modified = true;
if prefix.is_empty() {
prefix = lead;
} else if mid.is_empty() {
suffix = lead;
} else {
prefix = "";
}
*next = &next[lead.len()..];
(prefix, suffix, modified)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for test cases covering consume_code_affixes edge cases
rg -n "consume_code_affixes\|code_emphasis" --type rust -A 5 -B 2 src/code_emphasis.rs

Repository: leynos/mdtablefix

Length of output: 43


🏁 Script executed:

# First, let's confirm the file exists and check the repository structure
fd -type f -name "*.rs" | head -20

Repository: leynos/mdtablefix

Length of output: 231


🏁 Script executed:

# Search more broadly for the file and function
git ls-files | grep -E "(code|emphasis)"

Repository: leynos/mdtablefix

Length of output: 156


🏁 Script executed:

# Look for test files that might contain tests for this function
git ls-files | grep -i test

Repository: leynos/mdtablefix

Length of output: 3034


🏁 Script executed:

# Check the size of the source file
wc -l src/code_emphasis.rs

Repository: leynos/mdtablefix

Length of output: 85


🏁 Script executed:

# Read the source file
cat -n src/code_emphasis.rs

Repository: leynos/mdtablefix

Length of output: 8421


🏁 Script executed:

# Check test file size and read it
wc -l tests/code_emphasis.rs

Repository: leynos/mdtablefix

Length of output: 87


🏁 Script executed:

# Read the test file
cat -n tests/code_emphasis.rs

Repository: leynos/mdtablefix

Length of output: 5151


🏁 Script executed:

# Check if there are test files with mixed emphasis patterns
grep -r "\\*\\*" tests/code_emphasis.rs tests/data/ | head -20

Repository: leynos/mdtablefix

Length of output: 2305


🏁 Script executed:

# Look for any edge case patterns in test data files
find tests/data -name "*emphasis*" -o -name "*code*" | xargs cat 2>/dev/null | head -50

Repository: leynos/mdtablefix

Length of output: 43


🏁 Script executed:

# Let's create a focused test to understand the edge case
cat > /tmp/test_edge_case.rs << 'EOF'
// Testing the specific edge case:
// What happens when prefix is non-empty AND mid is non-empty?
// 
// Example input that would trigger this:
// `code` followed by **bold** (where ** comes with text)

// split_marks("**bold**") returns ("**", "bold", "**")
// If pending = "**" from previous token
// Then: prefix="**" (non-empty)
//       lead="**" (non-empty)
//       mid="bold" (non-empty)
// This hits line 124: prefix = ""

// The question: is this the correct handling?
EOF
cat /tmp/test_edge_case.rs

Repository: leynos/mdtablefix

Length of output: 539


🏁 Script executed:

# Search for any documentation about the intended behaviour
grep -r "mixed" src/code_emphasis.rs

Repository: leynos/mdtablefix

Length of output: 43


🏁 Script executed:

# Look at the module-level documentation more carefully
head -20 src/code_emphasis.rs

Repository: leynos/mdtablefix

Length of output: 868


🏁 Script executed:

# Let me trace through a specific example to verify the logic
cat > /tmp/trace_logic.py << 'EOF'
# Simulating the consume_code_affixes logic

def split_marks(s):
    """Split emphasis markers at both ends"""
    first = next((i for i, c in enumerate(s) if c not in '*_'), len(s))
    last = next((i for i in range(len(s)-1, -1, -1) if s[i] not in '*_'), first-1) + 1
    if last <= first:
        last = first
    return s[:first], s[first:last], s[last:]

def consume_code_affixes_trace(pending_in, next_text):
    """Trace through the logic with examples"""
    prefix = pending_in
    suffix = ""
    modified = bool(pending_in)
    
    print(f"Input: pending='{pending_in}', next_text='{next_text}'")
    
    if not next_text:
        print(f"  No next text token")
        return prefix, suffix, modified
    
    lead, mid, trail = split_marks(next_text)
    print(f"  split_marks('{next_text}') = ('{lead}', '{mid}', '{trail}')")
    
    if not lead:
        print(f"  No leading marks, return early")
        return prefix, suffix, modified
    
    modified = True
    print(f"  Set modified=True")
    
    if not prefix:
        prefix = lead
        print(f"  prefix was empty, set prefix='{lead}'")
    elif not mid:
        suffix = lead
        print(f"  mid was empty, set suffix='{lead}'")
    else:
        prefix = ""
        print(f"  EDGE CASE: mid is not empty, reset prefix to '' (avoid mixed markers)")
    
    print(f"  Result: prefix='{prefix}', suffix='{suffix}', modified={modified}")
    return prefix, suffix, modified

# Test cases
print("=== Case 1: No pending, plain text ===")
consume_code_affixes_trace("", "text")

print("\n=== Case 2: Pending markers, next text is all markers ===")
consume_code_affixes_trace("**", "**")

print("\n=== Case 3: EDGE CASE - Pending markers, next text has markers AND content ===")
consume_code_affixes_trace("**", "**more**")

print("\n=== Case 4: EDGE CASE - Pending underscore, next text has asterisks ===")
consume_code_affixes_trace("_", "*more*")
EOF
python3 /tmp/trace_logic.py

Repository: leynos/mdtablefix

Length of output: 1048


🏁 Script executed:

# Let's create a comprehensive test to verify the actual behaviour with the example cases
cat > /tmp/verify_behavior.py << 'EOF'
# Test the actual scenario the review mentions

def trace_scenario(input_str, label):
    print(f"\n=== {label} ===")
    print(f"Input: {repr(input_str)}")
    # The tokenizer would break this into tokens
    # Then consume_code_affixes logic applies
    # We need to understand what the expected output should be
    print(f"(Requires understanding of tokenization)")

trace_scenario("**code**", "Simple case: both leading and trailing emphasis")
trace_scenario("`code`**text**", "Code followed by emphasis and text")
trace_scenario("**`code`**", "Emphasis wrapping code (no affixes needed)")
trace_scenario("`code`**", "Code followed by emphasis (no following text)")

# The review asks: does the edge case (line 124: prefix = "") handle **code** correctly?
# But **code** without surrounding backticks wouldn't be tokenized as code
# The function only processes Text tokens adjacent to Code tokens

# The actual concern seems to be: when there's pending emphasis from a Text token
# followed by a Code token, and then another Text token with both markers and content,
# should we keep or discard the pending emphasis?

print("\n=== Analysis of edge case ===")
print("When prefix is non-empty AND mid is non-empty (line 124):")
print("- prefix = '' clears the pending emphasis markers")
print("- modified = True still wraps the code")
print("- This avoids: **`code`**more** (ambiguous nesting)")
print("- Instead: `code`**more** (next text handles its own markers)")
EOF
python3 /tmp/verify_behavior.py

Repository: leynos/mdtablefix

Length of output: 804


🏁 Script executed:

# Let me check if there's test coverage specifically for the edge case scenario
grep -n "prefix\|mid\|suffix" src/code_emphasis.rs | head -20

Repository: leynos/mdtablefix

Length of output: 766


🏁 Script executed:

# Let me check if there are any tests that would trigger the line 124 edge case
# by looking for test inputs with pending emphasis + content + markers

cat > /tmp/analyze_tests.py << 'EOF'
# Analyzing test coverage for the edge case

test_cases = [
    ("`StepContext`** Enhancement (in **`crates/rstest-bdd/src/context.rs`**)**", 
     "Complex case with nested code and emphasis"),
    ("**`code`**", "Emphasis only around code"),
    ("`code` text", "Standalone code"),
]

print("Test Coverage Analysis:")
print("=" * 60)

for input_str, description in test_cases:
    print(f"\nTest: {description}")
    print(f"Input: {repr(input_str)}")
    
    # The edge case occurs when:
    # 1. There's pending emphasis from previous text token (prefix non-empty)
    # 2. Next token is code
    # 3. After code, there's text with BOTH markers AND content
    #    (i.e., split_marks returns non-empty lead AND non-empty mid)
    
    if "**" in input_str and "`" in input_str:
        print("  ✓ Contains both emphasis and code")
    
    # Check if it would trigger the edge case
    # The edge case: prefix non-empty AND mid non-empty
    # This happens when: pending emphasis + code + text with markers+content

print("\n" + "=" * 60)
print("\nEdge Case Scenario:")
print("- Pending emphasis markers from previous text token")
print("- Code token is processed")
print("- Next text token has split_marks returning:")
print("  - lead (markers): non-empty")
print("  - mid (content): non-empty")
print("  - This triggers line 124: prefix = ''")
print("\nNo explicit test case found for this specific scenario!")
EOF
python3 /tmp/analyze_tests.py

Repository: leynos/mdtablefix

Length of output: 809


Add explicit test coverage for the edge case where pending emphasis combines with code and marked content.

The logic at lines 119–125 intentionally handles three scenarios. When prefix is non-empty and the next text token's mid (content between markers) is also non-empty, the code resets prefix to empty (line 124). This prevents ambiguous marker nesting—avoiding patterns like **code**more**.

This behaviour is by design per the module documentation ("Mixed surrounding markers are left untouched"), but it lacks explicit test coverage. Add a test case triggering the scenario where pending emphasis from one text token combines with code followed by another text token containing both markers and content. This confirms the edge case handling is correct.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/code_emphasis.rs` around lines 101 - 128, Add a unit test that exercises
the edge case in consume_code_affixes where a non-empty pending prefix combines
with a next Text token whose split_marks(mid) is non-empty so the function
resets prefix to empty (the branch with "else { prefix = \"\"; }"). Construct
tokens (using the Token enum and IntoIter<Token<'a>> in the same way other tests
do) so pending is a non-empty marker string, the next Token::Text contains
lead+mid+suffix (i.e. both markers and content), and assert the returned tuple
is ("" /*empty prefix*/, lead or suffix as appropriate, true) and that the next
text is sliced by lead.len(); place the test alongside existing tests for
code_emphasis.rs to cover the documented "Mixed surrounding markers are left
untouched" behavior.

Comment thread src/footnotes/renumber.rs
let rewritten_rest = rewrite_tokens(parts.rest, mapping);
let mut line = String::with_capacity(parts.prefix.len() + rewritten_rest.len() + 8);
line.push_str(parts.prefix);
write!(&mut line, "[^{new_number}]:").expect("write to string cannot fail");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Replace .expect() on infallible write with explicit handling.

Writing to a String via write! cannot fail, but the coding guidelines forbid .expect() outside tests. Use let _ = write!(...) or the write! macro's infallibility guarantee via fmt::Write for String.

🛠️ Proposed fix
-    write!(&mut line, "[^{new_number}]:").expect("write to string cannot fail");
+    // Writing to String is infallible; discard the result.
+    let _ = write!(&mut line, "[^{new_number}]:");

As per coding guidelines: ".expect() and .unwrap() are forbidden outside of tests."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
write!(&mut line, "[^{new_number}]:").expect("write to string cannot fail");
// Writing to String is infallible; discard the result.
let _ = write!(&mut line, "[^{new_number}]:");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/footnotes/renumber.rs` at line 330, Replace the .expect() on the
infallible write! call by ignoring the fmt::Result instead of panicking;
specifically update the write!(&mut line, "[^{new_number}]:") usage (where
`line` and `new_number` are defined) to use the infallibility pattern (e.g., let
_ = write!(...)) or otherwise handle the fmt::Result, removing the .expect()
call to comply with the no .expect()/unwrap() guideline.

Comment thread src/footnotes/renumber.rs
Comment on lines +344 to +350
let num_match = caps
.name("num")
.expect("numeric list capture missing number");
let rest_match = caps
.name("rest")
.expect("numeric list capture missing rest");
let whitespace = line[num_match.end() + 1..rest_match.start()].to_string();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Replace .expect() with fallible handling or use let-else.

Per coding guidelines, .expect() is forbidden outside tests. Even though the regex guarantees these captures exist after a successful match, propagate the error or use a let-else guard to satisfy the lint.

🛠️ Proposed fix using let-else guards
-    let num_match = caps
-        .name("num")
-        .expect("numeric list capture missing number");
-    let rest_match = caps
-        .name("rest")
-        .expect("numeric list capture missing rest");
+    let Some(num_match) = caps.name("num") else {
+        return None;
+    };
+    let Some(rest_match) = caps.name("rest") else {
+        return None;
+    };

As per coding guidelines: ".expect() and .unwrap() are forbidden outside of tests. Errors must be propagated."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let num_match = caps
.name("num")
.expect("numeric list capture missing number");
let rest_match = caps
.name("rest")
.expect("numeric list capture missing rest");
let whitespace = line[num_match.end() + 1..rest_match.start()].to_string();
let Some(num_match) = caps.name("num") else {
return None;
};
let Some(rest_match) = caps.name("rest") else {
return None;
};
let whitespace = line[num_match.end() + 1..rest_match.start()].to_string();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/footnotes/renumber.rs` around lines 344 - 350, The code currently uses
.expect() on regex captures (caps.name("num") and caps.name("rest")) which is
forbidden; replace both .expect() calls with fallible handling using let-else
guards (e.g., let Some(num_match) = caps.name("num") else { return Err(...) }
and similarly for rest_match) or propagate a suitable error/result from the
surrounding function instead of panicking; then compute whitespace from line
using the obtained num_match and rest_match as before.

Comment thread src/footnotes/renumber.rs
new_line.push_str(&rewritten_rest);
definitions.push(DefinitionLine {
line.push_str(&candidate.indent);
write!(&mut line, "[^{new_number}]:").expect("write to string cannot fail");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Same .expect() pattern requires adjustment.

Apply the same fix as suggested for line 330.

🛠️ Proposed fix
-        write!(&mut line, "[^{new_number}]:").expect("write to string cannot fail");
+        let _ = write!(&mut line, "[^{new_number}]:");

As per coding guidelines: ".expect() and .unwrap() are forbidden outside of tests."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
write!(&mut line, "[^{new_number}]:").expect("write to string cannot fail");
let _ = write!(&mut line, "[^{new_number}]:");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/footnotes/renumber.rs` at line 418, The use of .expect() on write!(&mut
line, "[^{new_number}]:") must be replaced with proper error
propagation/handling: remove .expect(...) and use the ? operator on the write!
result (i.e. write!(...).? ) or map the error into the function's existing error
type, and update the enclosing function's return type to Result<...,
std::fmt::Error> (or convert via .map_err(...) if changing the signature is not
desired). Locate the write! invocation in renumber.rs (the line variable and
new_number) and apply the same fix as done earlier at the other occurrence
(replace .expect with ?/map_err and adjust the enclosing function signature or
error conversion accordingly).

Comment thread src/html.rs
Comment on lines +211 to +216
fn append_html_table_line(line: &str, buf: &mut Vec<String>, depth: &mut usize) {
buf.push(line.to_string());
*depth += TABLE_START_RE.find_iter(line).count();
if TABLE_END_RE.is_match(line) {
*depth = depth.saturating_sub(TABLE_END_RE.find_iter(line).count());
if *depth == 0 {
out.extend(html_table_to_markdown(buf));
buf.clear();
*in_html = false;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, examine the append_html_table_line function and surrounding context
sed -n '200,230p' src/html.rs

# Check the regex definitions
grep -n 'TABLE_START_RE\|TABLE_END_RE' src/html.rs | head -20

# Look at call sites mentioned in the review
sed -n '275,290p' src/html.rs
sed -n '340,355p' src/html.rs

Repository: leynos/mdtablefix

Length of output: 2112


🏁 Script executed:

# Get the regex definitions
sed -n '18,24p' src/html.rs

Repository: leynos/mdtablefix

Length of output: 369


🏁 Script executed:

# Search for tests covering tables, especially indented ones
rg -n 'TABLE|table|indent' src/html.rs tests/ | grep -i 'test\|fn\|<table'

# Look for test files
find . -name '*test*' -o -name 'tests.rs' -o -path '*/tests/*' | head -20

Repository: leynos/mdtablefix

Length of output: 23848


🏁 Script executed:

# Search more specifically for indented HTML table tests
grep -n 'indent.*table\|table.*indent' tests/table/*.rs

# Check the convert_html tests more carefully
head -100 tests/table/convert_html.rs

# Look at process_stream_tests for HTML table handling
grep -B5 -A5 'TABLE_START_RE\|trim_start' src/html.rs | head -50

Repository: leynos/mdtablefix

Length of output: 4055


🏁 Script executed:

# Verify if there's a test for indented HTML tables
grep -r 'indented.*html\|html.*indented' tests/

# Check if any HTML table tests use indentation
grep -B2 -A2 'html_table' tests/table/convert_html.rs | grep -E '^\s{4}<'

# Let's look at the TABLE_END_RE usage in append_html_table_line
sed -n '211,216p' src/html.rs

# Verify the TABLE_END_RE pattern doesn't have anchoring
grep 'TABLE_END_RE' src/html.rs | head -5

Repository: leynos/mdtablefix

Length of output: 501


🏁 Script executed:

# Check if any test includes an indented HTML table (4+ spaces before <table>)
grep -r '^\s\s\s\s<table' tests/

# Look at what indented_table fixture contains
sed -n '41,45p' tests/table/mod.rs

# Verify the actual behavior with a simple check of how depth tracking works
# Look at process_stream to see if it calls convert_html_tables
grep -n 'convert_html_tables\|html_table_to_markdown' src/html.rs

Repository: leynos/mdtablefix

Length of output: 517


🏁 Script executed:

# Verify TABLE_END_RE doesn't have the same issue (no anchor)
grep -A1 'TABLE_END_RE.*lazy_regex' src/html.rs

# Create a minimal test case to confirm the bug
# Simulate what happens with an indented HTML table
python3 << 'EOF'
import re

TABLE_START_RE = re.compile(r"(?i)^<table(?:\s|>|$)")
TABLE_END_RE = re.compile(r"(?i)</table>")

# Indented table line
line = "    <table>"
trimmed = line.lstrip()

print(f"Raw line: '{line}'")
print(f"Trimmed line: '{trimmed}'")
print(f"TABLE_START_RE matches raw: {bool(TABLE_START_RE.search(line))}")
print(f"TABLE_START_RE matches trimmed: {bool(TABLE_START_RE.search(trimmed))}")
print(f"find_iter on raw: {len(TABLE_START_RE.findall(line))}")
print(f"find_iter on trimmed: {len(TABLE_START_RE.findall(trimmed))}")

# Simulating the flow
print("\n--- Simulating current code flow ---")
depth = 0
print(f"Initial depth: {depth}")
print(f"Check: depth > 0 or TABLE_START_RE.is_match('{trimmed}'): {depth > 0 or bool(TABLE_START_RE.search(trimmed))}")
if depth > 0 or bool(TABLE_START_RE.search(trimmed)):
    print("Entered table mode")
    depth += TABLE_START_RE.findall(line).__len__()  # Uses raw line
    print(f"After counting on raw line: depth = {depth}")
    if TABLE_END_RE.search(line):
        depth -= TABLE_END_RE.findall(line).__len__()
    print(f"Would flush? {depth == 0}")
EOF

Repository: leynos/mdtablefix

Length of output: 414


🏁 Script executed:

# Verify TABLE_END_RE behavior with indented closing tag
python3 << 'EOF'
import re

TABLE_END_RE = re.compile(r"(?i)</table>")

# Test with indented closing tag
line = "    </table>"

print(f"Line: '{line}'")
print(f"TABLE_END_RE.is_match: {bool(TABLE_END_RE.search(line))}")
print(f"TABLE_END_RE.find_iter count: {len(TABLE_END_RE.findall(line))}")
print("\nCONCLUSION: TABLE_END_RE matches regardless of indentation because it has no anchor")
EOF

# Look at the complete flow in both callers
echo "=== html_table_to_markdown ===" 
sed -n '275,290p' src/html.rs

echo -e "\n=== convert_html_tables ===" 
sed -n '321,360p' src/html.rs

Repository: leynos/mdtablefix

Length of output: 1732


Trim the line before counting <table> openings.

Count start tags on line.trim_start(), not on the raw line. TABLE_START_RE is anchored at ^<table, so an indented multiline table inside a list or block quote never increments depth; the callers at lines 283 and 348 then flush the buffer after the first line.

Patch
fn append_html_table_line(line: &str, buf: &mut Vec<String>, depth: &mut usize) {
    buf.push(line.to_string());
-    *depth += TABLE_START_RE.find_iter(line).count();
+    let trimmed = line.trim_start();
+    *depth += TABLE_START_RE.find_iter(trimmed).count();
     if TABLE_END_RE.is_match(line) {
         *depth = depth.saturating_sub(TABLE_END_RE.find_iter(line).count());
     }
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/html.rs` around lines 211 - 216, The function append_html_table_line
pushes the raw line then counts opening <table> tags using TABLE_START_RE which
is anchored at the line start; trim the input before counting starts so
indented/multiline tables are recognized: replace the use of line when calling
TABLE_START_RE.find_iter(...) with a trimmed version (e.g., let trimmed =
line.trim_start()) and keep TABLE_END_RE matching on the original line as
before; update the increment to use TABLE_START_RE.find_iter(trimmed).count()
while leaving the decrement logic unchanged.

Comment thread src/lists.rs
Comment on lines +100 to +103
let mut state = ListState {
indent_stack: Vec::new(),
counters: HashMap::new(),
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Derive Default for ListState and initialize via default().

Remove boilerplate and centralize initialization behaviour.

Apply concise initialization
+#[derive(Default)]
 struct ListState {
     indent_stack: Vec<usize>,
     counters: HashMap<usize, usize>,
 }
@@
-    let mut state = ListState {
-        indent_stack: Vec::new(),
-        counters: HashMap::new(),
-    };
+    let mut state = ListState::default();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lists.rs` around lines 100 - 103, Add a derived Default impl to the
ListState struct and use its default constructor when initializing: add
#[derive(Default)] above the ListState declaration (which contains indent_stack:
Vec<usize> and counters: HashMap<usize, usize>) and replace the manual
initializer with let mut state = ListState::default(); so initialization is
centralized and boilerplate is removed.

Comment thread src/wrap.rs
Comment on lines +65 to +98
fn prefix_line(line: &str) -> Option<PrefixLine<'_>> {
if let Some(cap) = BULLET_RE.captures(line) {
let prefix = cap.get(1).expect("bullet regex capture").as_str();
let rest = cap.get(2).expect("bullet regex remainder capture").as_str();
return Some(PrefixLine {
prefix: Cow::Borrowed(prefix),
rest,
repeat_prefix: false,
});
}

if let Some(cap) = FOOTNOTE_RE.captures(line) {
let prefix = cap.get(1).expect("footnote prefix capture").as_str();
let marker = cap.get(2).expect("footnote marker capture").as_str();
let rest = cap
.get(3)
.expect("footnote regex remainder capture")
.as_str();
return Some(PrefixLine {
prefix: Cow::Owned(format!("{prefix}{marker}")),
rest,
repeat_prefix: false,
});
}

BLOCKQUOTE_RE.captures(line).map(|cap| PrefixLine {
prefix: Cow::Borrowed(cap.get(1).expect("blockquote prefix capture").as_str()),
rest: cap
.get(2)
.expect("blockquote regex remainder capture")
.as_str(),
repeat_prefix: true,
})
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Replace .expect() calls with propagation or let-else guards.

The prefix_line function uses .expect() on regex capture groups (lines 67-68, 77-82, 91-95). Per coding guidelines, .expect() is forbidden outside tests. Since the function returns Option, return None on capture failure.

🛠️ Proposed fix using let-else guards
 fn prefix_line(line: &str) -> Option<PrefixLine<'_>> {
     if let Some(cap) = BULLET_RE.captures(line) {
-        let prefix = cap.get(1).expect("bullet regex capture").as_str();
-        let rest = cap.get(2).expect("bullet regex remainder capture").as_str();
+        let prefix = cap.get(1)?.as_str();
+        let rest = cap.get(2)?.as_str();
         return Some(PrefixLine {
             prefix: Cow::Borrowed(prefix),
             rest,
             repeat_prefix: false,
         });
     }

     if let Some(cap) = FOOTNOTE_RE.captures(line) {
-        let prefix = cap.get(1).expect("footnote prefix capture").as_str();
-        let marker = cap.get(2).expect("footnote marker capture").as_str();
-        let rest = cap
-            .get(3)
-            .expect("footnote regex remainder capture")
-            .as_str();
+        let prefix = cap.get(1)?.as_str();
+        let marker = cap.get(2)?.as_str();
+        let rest = cap.get(3)?.as_str();
         return Some(PrefixLine {
             prefix: Cow::Owned(format!("{prefix}{marker}")),
             rest,
             repeat_prefix: false,
         });
     }

-    BLOCKQUOTE_RE.captures(line).map(|cap| PrefixLine {
-        prefix: Cow::Borrowed(cap.get(1).expect("blockquote prefix capture").as_str()),
-        rest: cap
-            .get(2)
-            .expect("blockquote regex remainder capture")
-            .as_str(),
+    let cap = BLOCKQUOTE_RE.captures(line)?;
+    Some(PrefixLine {
+        prefix: Cow::Borrowed(cap.get(1)?.as_str()),
+        rest: cap.get(2)?.as_str(),
         repeat_prefix: true,
     })
 }

As per coding guidelines: ".expect() and .unwrap() are forbidden outside of tests. Errors must be propagated."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn prefix_line(line: &str) -> Option<PrefixLine<'_>> {
if let Some(cap) = BULLET_RE.captures(line) {
let prefix = cap.get(1).expect("bullet regex capture").as_str();
let rest = cap.get(2).expect("bullet regex remainder capture").as_str();
return Some(PrefixLine {
prefix: Cow::Borrowed(prefix),
rest,
repeat_prefix: false,
});
}
if let Some(cap) = FOOTNOTE_RE.captures(line) {
let prefix = cap.get(1).expect("footnote prefix capture").as_str();
let marker = cap.get(2).expect("footnote marker capture").as_str();
let rest = cap
.get(3)
.expect("footnote regex remainder capture")
.as_str();
return Some(PrefixLine {
prefix: Cow::Owned(format!("{prefix}{marker}")),
rest,
repeat_prefix: false,
});
}
BLOCKQUOTE_RE.captures(line).map(|cap| PrefixLine {
prefix: Cow::Borrowed(cap.get(1).expect("blockquote prefix capture").as_str()),
rest: cap
.get(2)
.expect("blockquote regex remainder capture")
.as_str(),
repeat_prefix: true,
})
}
fn prefix_line(line: &str) -> Option<PrefixLine<'_>> {
if let Some(cap) = BULLET_RE.captures(line) {
let prefix = cap.get(1)?.as_str();
let rest = cap.get(2)?.as_str();
return Some(PrefixLine {
prefix: Cow::Borrowed(prefix),
rest,
repeat_prefix: false,
});
}
if let Some(cap) = FOOTNOTE_RE.captures(line) {
let prefix = cap.get(1)?.as_str();
let marker = cap.get(2)?.as_str();
let rest = cap.get(3)?.as_str();
return Some(PrefixLine {
prefix: Cow::Owned(format!("{prefix}{marker}")),
rest,
repeat_prefix: false,
});
}
let cap = BLOCKQUOTE_RE.captures(line)?;
Some(PrefixLine {
prefix: Cow::Borrowed(cap.get(1)?.as_str()),
rest: cap.get(2)?.as_str(),
repeat_prefix: true,
})
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/wrap.rs` around lines 65 - 98, The prefix_line function currently calls
.expect() on regex capture groups (BULLET_RE, FOOTNOTE_RE, BLOCKQUOTE_RE) which
must be removed; change each branch to use let-else guards or safe captures and
return None when a capture is missing: for the BULLET_RE and FOOTNOTE_RE
branches extract cap.get(1), cap.get(2) (and cap.get(3) for FOOTNOTE_RE) with
let Some(...) = ... else { return None } before building the PrefixLine, and for
the BLOCKQUOTE_RE branch similarly map the capture to a PrefixLine only if
cap.get(1) and cap.get(2) are Some; ensure PrefixLine fields (prefix, rest,
repeat_prefix) are set the same way but without any .expect() calls.

Comment thread src/wrap/paragraph.rs
Comment on lines +100 to 103
fn push_wrapped_segment(&mut self, indent: &str, segment: &str) {
for line in wrap_preserving_code(segment, self.width - indent.len()) {
self.out.push(format!("{indent}{line}"));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Read the target function and surrounding context
echo "=== Lines 95-110 of src/wrap/paragraph.rs ==="
sed -n '95,110p' src/wrap/paragraph.rs

echo
echo "=== Search for UnicodeWidthStr usage and imports ==="
rg -n 'UnicodeWidthStr|use.*unicode' src/wrap/paragraph.rs | head -20

echo
echo "=== Search for wrap_preserving_code definition ==="
rg -n 'fn wrap_preserving_code' src/

echo
echo "=== Check the pattern mentioned in comment (saturating_sub.max) ==="
rg -n 'saturating_sub.*max|width.*saturating_sub' src/wrap/paragraph.rs

Repository: leynos/mdtablefix

Length of output: 1476


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Read the wrap_preserving_code function signature and context
echo "=== wrap_preserving_code function (lines 195-210) ==="
sed -n '195,210p' src/wrap/inline.rs

echo
echo "=== flush_paragraph to see how push_wrapped_segment is called ==="
rg -n 'flush_paragraph|push_wrapped_segment' src/wrap/paragraph.rs | head -15

echo
echo "=== Check the full append_wrapped_with_prefix for context ==="
sed -n '45,65p' src/wrap/paragraph.rs

Repository: leynos/mdtablefix

Length of output: 2054


Use display width for wrapped indent to guard against underflow.

indent.len() counts bytes, not display columns. For non-ASCII indentation, this miscalculates available width. When indent exceeds self.width, the subtraction panics in checked builds and wraps in unchecked builds.

Adopt the pattern already used at line 52 in append_wrapped_with_prefix:

Patch
fn push_wrapped_segment(&mut self, indent: &str, segment: &str) {
-    for line in wrap_preserving_code(segment, self.width - indent.len()) {
+    let indent_width = UnicodeWidthStr::width(indent);
+    let available = self.width.saturating_sub(indent_width).max(1);
+    for line in wrap_preserving_code(segment, available) {
         self.out.push(format!("{indent}{line}"));
     }
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/wrap/paragraph.rs` around lines 100 - 103, The push_wrapped_segment
implementation uses indent.len() (byte length) causing wrong available width and
potential underflow; change it to compute the display width via
UnicodeWidthStr::width(indent), then compute available =
self.width.saturating_sub(indent_width).max(1) and pass that available value
into wrap_preserving_code instead of self.width - indent.len(); update
references in push_wrapped_segment and keep prefixing lines with the original
indent as before.

@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Apr 18, 2026

@coderabbitai please can you give me an assessment of the cargo-binstall support. This is the focus of this pr.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 18, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== Cargo.toml ==="
cat Cargo.toml

echo ""
echo "=== .github/workflows/release.yml ==="
cat .github/workflows/release.yml

echo ""
echo "=== README.md ==="
cat README.md

echo ""
echo "=== docs/release-process.md ==="
cat docs/release-process.md

Length of output: 16628


@leynos: Here is a full assessment of the cargo-binstall support.


Cargo-binstall Support Assessment

1. Cargo.toml Metadata

The [package.metadata.binstall] configuration is correct.

repository field — Added as a prerequisite for the { repo } template variable used in pkg-url. Without it, { repo } would be empty and binstall would produce a broken URL.

cfg override target filter:

cfg(all(target_os = "linux", any(target_arch = "x86_64", target_arch = "aarch64"), target_env = "gnu"))

This precisely matches the two targets that have cargo_binstall_archive: true in the workflow matrix (x86_64-unknown-linux-gnu and aarch64-unknown-linux-gnu). The FreeBSD target is correctly excluded, and binstall will fall back to building from source for it.

pkg-url template:

pkg-url = "{ repo }/releases/download/v{ version }/{ name }-{ version }-{ target }.tar.gz"

Expands for a v0.3.1 tag to:

https://github.com/leynos/mdtablefix/releases/download/v0.3.1/mdtablefix-0.3.1-x86_64-unknown-linux-gnu.tar.gz

bin-dir:

bin-dir = "{ bin }{ binary-ext }"

This correctly expects the binary at the archive root, with no subdirectory. The { binary-ext } resolves to an empty string on Linux. ✅


2. Workflow Archive Production

The archive is created as:

archive_name="${{ env.REPO_NAME }}-${version}-${{ matrix.target }}.tar.gz"
tar -C "target/${{ matrix.target }}/release" -czf \
  "${artifact_dir}/${archive_name}" "${{ env.REPO_NAME }}${{ matrix.ext }}"

The -C flag changes into the release directory before archiving, so the binary lands at the archive root with no path prefix. This aligns exactly with bin-dir = "{ bin }{ binary-ext }". ✅

The version variable strips the leading v from the tag (version="${GITHUB_REF_NAME#v}"), so for tag v0.3.1 the archive is named mdtablefix-0.3.1-x86_64-unknown-linux-gnu.tar.gz. The pkg-url template uses v{ version } in the download path and { version } (without v) in the filename — this is consistent. ✅

Name/URL alignment summary:

Component Value
Workflow produces mdtablefix-0.3.1-x86_64-unknown-linux-gnu.tar.gz
pkg-url resolves to .../v0.3.1/mdtablefix-0.3.1-x86_64-unknown-linux-gnu.tar.gz
Match

3. Release Job Sequencing

The release job correctly:

  1. Creates the GitHub release (via softprops/action-gh-release@v1)
  2. Downloads all build artifacts
  3. Uploads every file via gh release upload

The artifact directory glob artifacts/${{ env.REPO_NAME }}-* correctly enumerates all per-target directories and picks up the binary, its SHA-256, the tar.gz archive, and the archive SHA-256. ✅


4. Documentation

Both README.md and docs/release-process.md accurately document the new installation path and clarify which targets produce binstall archives. The documented target list matches the workflow matrix and the cfg override. ✅


Summary

The implementation is well-structured and internally consistent. The three moving parts — the Cargo.toml override, the workflow archive naming, and the release upload glob — are all aligned. No issues found.

@leynos leynos merged commit b8962c0 into main Apr 18, 2026
3 checks passed
@leynos leynos deleted the mdtablefix-cargo-binstall-support branch April 18, 2026 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant