Conversation
Declare the Linux GNU `cargo-binstall` package metadata in `Cargo.toml` and teach the release workflow to build matching `.tar.gz` archives for `x86_64-unknown-linux-gnu` and `aarch64-unknown-linux-gnu`. Document the new installation path in the README and release process notes so the published assets and the user-facing instructions stay aligned.
Break up the remaining high-nesting and high-arity helpers across wrapping, footnote, HTML, and list processing so `make lint` passes without suppressing the configured warnings. Reshape the affected tests and shared test helpers alongside the production refactors so the new helper boundaries remain covered while preserving existing behaviour.
OverviewThis PR adds cargo-binstall support to mdtablefix while conducting a comprehensive internal refactoring of the markdown processing pipeline. The changes maintain backward compatibility and preserve all existing formatting behaviour. Cargo-binstall SupportNew Installation Method:
Release Workflow Updates:
Markdown Processing RefactoringThe PR extensively refactors internal helpers across multiple modules to improve code organization and satisfy linting constraints: Emphasis & Code Token Processing (
Fence Handling (
Footnote Processing (
HTML Table Support (
List Renumbering (
Paragraph Wrapping (
Inline Wrapping & Tokenisation (
Stream Processing (
Test Refactoring
DocumentationUpdated WalkthroughThe PR extends release infrastructure to support cargo-binstall distribution of prebuilt Linux GNU binaries whilst refactoring internal code through systematic extraction of helper functions and introduction of state-carrying structs to consolidate mutable parameter threading. No user-facing functional changes introduced. Changes
Poem
🚥 Pre-merge checks | ✅ 5 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Reviewer's GuideRefactors multiple markdown processing subsystems (footnotes, wrapping, inline tokenization, lists, HTML tables, process pipeline) into smaller helpers and state structs while adding cargo-binstall metadata and Linux tar.gz release artifacts, plus corresponding docs and tests updates. Class diagram for refactored paragraph wrapping and tokenizationclassDiagram
class PrefixLine {
+prefix: Cow<str>
+rest: &str
+repeat_prefix: bool
}
class ParagraphState {
+buf: Vec<(String,bool)>
+indent: String
+clear()
+note_indent(line: &str)
+push(text: String, hard_break: bool)
}
class ParagraphWriter {
-out: &mut Vec<String>
-width: usize
+new(out: &mut Vec<String>, width: usize) ParagraphWriter
+flush_paragraph(state: &mut ParagraphState)
+push_verbatim(state: &mut ParagraphState, line: &str)
+handle_prefix_line(state: &mut ParagraphState, prefix_line: &PrefixLine)
-append_wrapped_with_prefix(line: &PrefixLine)
-push_wrapped_segment(indent: &str, segment: &str)
}
class LineBuffer {
-text: String
-width: usize
-last_split: Option<usize>
+new() LineBuffer
+push_token(token: &str)
+push_span(tokens: &[String], start: usize, end: usize)
+split_with_span(ctx: &mut SplitContext, tokens: &[String], span: Range<usize>) bool
+flush_trailing_whitespace(lines: &mut Vec<String>, tokens: &[String], span: Range<usize>) bool
+flush_into(lines: &mut Vec<String>)
+width() usize
}
class SplitContext {
+lines: &mut Vec<String>
+width: usize
}
class WrapInline {
<<module>>
+wrap_preserving_code(text: &str, width: usize) Vec<String>
+attach_punctuation_to_previous_line(lines: &mut Vec<String>)
-push_span_with_carry(buffer: &mut LineBuffer, tokens: &[String], span: Range<usize>, carried_whitespace: &mut String)
}
class WrapTokenizeMod {
<<module>>
+segment_inline(text: &str) Vec<String>
-append_escaped_backtick(tokens: &mut Vec<String>)
-scan_plain_text_end(text: &str, bytes: &[u8], index: usize) usize
-should_stop_plain_text(text: &str, bytes: &[u8], index: usize, current: (char,bool)) bool
}
class WrapTopLevel {
<<module>>
+wrap_text(lines: &[String], width: usize) Vec<String>
-is_table_or_separator(line: &str) bool
-is_passthrough_block(line: &str) bool
-prefix_line(line: &str) Option<PrefixLine>
-line_break_parts(line: &str) (String,bool)
}
WrapTopLevel ..> ParagraphState : manages
WrapTopLevel ..> ParagraphWriter : uses
WrapTopLevel ..> PrefixLine : constructs
WrapTopLevel ..> WrapInline : calls
WrapTopLevel ..> WrapTokenizeMod : via WrapInline
ParagraphWriter ..> PrefixLine : wraps
ParagraphWriter ..> WrapInline : uses
WrapInline ..> LineBuffer : buffers
WrapInline ..> SplitContext : configures
WrapTokenizeMod ..> WrapInline : token source
Class diagram for refactored footnote renumbering and inline handlingclassDiagram
class DefinitionParts {
<<Copy>>
+prefix: &str
+number: usize
+rest: &str
}
class DefinitionLine {
+index: usize
+new_number: usize
+line: String
}
class DefinitionUpdates {
+definitions: Vec<DefinitionLine>
+is_definition_line: Vec<bool>
}
class DefinitionScanContext {
+mapping: &mut HashMap<usize,usize>
+next_number: &mut usize
+numeric_list_range: Option<(usize,usize)>
+skip_numeric_conversion: bool
}
class DefinitionAccumulator {
+definitions: Vec<DefinitionLine>
+is_definition_line: Vec<bool>
}
class NumericCandidate {
+index: usize
+number: usize
+indent: String
+whitespace: String
+rest: String
}
class FootnoteRenumberMod {
<<module>>
+collect_reference_mapping(lines: &[String]) HashMap<usize,usize>
-collect_reference_mapping_from_text(text: &str, mapping: &mut HashMap<usize,usize>, next: &mut usize)
-collect_definition_updates(lines: &[String], mapping: &mut HashMap<usize,usize>) DefinitionUpdates
-collect_scan_updates(lines: &[String], ctx: &mut DefinitionScanContext) (DefinitionAccumulator, Vec<NumericCandidate>)
-finalize_numeric_candidates(numeric_candidates: Vec<NumericCandidate>, ctx: &mut DefinitionScanContext, acc: &mut DefinitionAccumulator)
-definition_line_from_parts(index: usize, parts: DefinitionParts, mapping: &mut HashMap<usize,usize>, next_number: &mut usize) DefinitionLine
-numeric_candidate_from_line(line: &str, index: usize) Option<NumericCandidate>
-assign_new_number(mapping: &mut HashMap<usize,usize>, number: usize, next: &mut usize) usize
-rewrite_tokens(rest: &str, mapping: &HashMap<usize,usize>) String
}
class InlineFootnote {
+pre: &str
+punc: &str
+style: &str
+num: &str
+boundary: &str
}
class FootnoteInlineMod {
<<module>>
+convert_inline(text: &str) String
-capture_parts(caps: &Captures) InlineFootnote
-build_footnote(parts: InlineFootnote) String
}
FootnoteRenumberMod ..> DefinitionParts : uses
FootnoteRenumberMod ..> DefinitionScanContext : configures
FootnoteRenumberMod ..> DefinitionAccumulator : accumulates
FootnoteRenumberMod ..> NumericCandidate : converts
FootnoteRenumberMod ..> DefinitionUpdates : returns
DefinitionAccumulator *-- DefinitionLine : aggregates
DefinitionUpdates *-- DefinitionLine : aggregates
FootnoteInlineMod ..> InlineFootnote : builds
Class diagram for refactored processing pipeline, HTML tables, and listsclassDiagram
class ProcessBuffer {
+out: Vec<String>
+buf: Vec<String>
+in_table: bool
+flush()
+push_verbatim(line: &str)
+handle_fence_line(line: &str, fences: &mut FenceTracker) bool
+handle_table_line(line: &str) bool
}
class HtmlTableState {
+buf: Vec<String>
+depth: usize
+in_html: bool
+flush_raw(out: &mut Vec<String>)
+push_html_line(line: &str, out: &mut Vec<String>)
}
class HtmlMod {
<<module>>
+convert_html_tables(lines: &[String]) Vec<String>
+html_table_to_markdown(lines: &[String]) Vec<String>
-append_html_table_line(line: &str, buf: &mut Vec<String>, depth: &mut usize)
-flush_completed_html_table(buf: &mut Vec<String>, depth: usize, out: &mut Vec<String>) bool
-push_collapsed_text_char(ch: char, out: &mut String, last_space: &mut bool)
}
class ListState {
+indent_stack: Vec<usize>
+counters: HashMap<usize,usize>
+prune_deeper(indent: usize, inclusive: bool)
+handle_paragraph_restart(indent: usize, line: &str, prev_blank: bool) bool
}
class ListsMod {
<<module>>
+renumber_lists(lines: &[String]) Vec<String>
-is_plain_paragraph_line(line: &str) bool
-prune_deeper(indent: usize, inclusive: bool, indent_stack: &mut Vec<usize>, counters: &mut HashMap<usize,usize>)
}
class ProcessMod {
<<module>>
+process_stream_inner(lines: &[String], opts: Options) Vec<String>
}
class FenceTracker {
+observe(line: &str) bool
+in_fence() bool
}
class TableMod {
<<module>>
+reflow_table(lines: &[String]) Vec<String>
-calculate_and_format(parsed: &ParsedTable, indent: &str) Vec<String>
}
class ParsedTable {
+cleaned: Vec<Vec<String>>
+output_rows: Vec<Vec<String>>
+sep_cells: Option<Vec<String>>
+max_cols: usize
}
ProcessMod ..> ProcessBuffer : manages
ProcessMod ..> FenceTracker : tracks
ProcessMod ..> HtmlMod : calls convert_html_tables
ProcessBuffer ..> FenceTracker : uses
ProcessBuffer ..> TableMod : uses reflow_table
HtmlMod ..> HtmlTableState : uses
ListsMod ..> ListState : manages
TableMod ..> ParsedTable : consumes
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 5 issues, and left some high level feedback:
- The new
ListStatetype still exposes itsindent_stackandcountersto external mutation in a few places (e.g. directclear()calls); consider encapsulating these operations behindListStatemethods so list-renumbering behaviour remains centralized and easier to reason about. - In the HTML table handling,
HtmlTableState::flush_rawand the manualif !html_state.buf.is_empty() { out.extend(html_state.buf); }at the end ofconvert_html_tablesnow duplicate behaviour; you could reuseflush_rawthere to keep the flushing logic in a single place. - The introduction of multiple small helpers in
wrap::inlineandwrap::line_buffer(e.g.SplitContext, passingRangeinstead of start/end indices) improves clarity but also adds boilerplate at call sites; you might consider providing thin convenience wrappers for the common cases (e.g. constructingSplitContextonce per wrap call) to keep the main wrapping loop easier to scan.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new `ListState` type still exposes its `indent_stack` and `counters` to external mutation in a few places (e.g. direct `clear()` calls); consider encapsulating these operations behind `ListState` methods so list-renumbering behaviour remains centralized and easier to reason about.
- In the HTML table handling, `HtmlTableState::flush_raw` and the manual `if !html_state.buf.is_empty() { out.extend(html_state.buf); }` at the end of `convert_html_tables` now duplicate behaviour; you could reuse `flush_raw` there to keep the flushing logic in a single place.
- The introduction of multiple small helpers in `wrap::inline` and `wrap::line_buffer` (e.g. `SplitContext`, passing `Range` instead of start/end indices) improves clarity but also adds boilerplate at call sites; you might consider providing thin convenience wrappers for the common cases (e.g. constructing `SplitContext` once per wrap call) to keep the main wrapping loop easier to scan.
## Individual Comments
### Comment 1
<location path="Cargo.toml" line_range="16" />
<code_context>
+
+[package.metadata.binstall.overrides.'cfg(all(target_os = "linux", any(target_arch = "x86_64", target_arch = "aarch64"), target_env = "gnu"))']
+pkg-url = "{ repo }/releases/download/v{ version }/{ name }-{ version }-{ target }.tar.gz"
+bin-dir = "{ bin }{ binary-ext }"
+pkg-fmt = "tgz"
+
</code_context>
<issue_to_address>
**issue (bug_risk):** The `bin-dir` value in the cargo-binstall metadata looks like a file path instead of a directory, which may break installation.
In cargo-binstall metadata, `bin-dir` should point to the directory inside the archive that contains the binary (e.g. `"."` or `"bin"`), not the full binary path. With `bin-dir = "{ bin }{ binary-ext }"`, this expands to the full filename, while your `.tar.gz` places the binary at the archive root. To match the archive layout and let cargo-binstall find the binary, this should be `bin-dir = "."` (or another actual directory path if you change the archive structure).
</issue_to_address>
### Comment 2
<location path="src/footnotes/renumber.rs" line_range="282" />
<code_context>
is_definition_line: Vec<bool>,
}
+struct DefinitionScanContext<'a> {
+ mapping: &'a mut HashMap<usize, usize>,
+ next_number: &'a mut usize,
</code_context>
<issue_to_address>
**issue (complexity):** Consider collapsing the new scan context and accumulator into a single state struct and driving the scan/finalize flow directly from `collect_definition_updates` to reduce indirection.
You can keep the new helpers/behavior but reduce indirection by collapsing the scan context/accumulator and merging the two-phase scan/finalize into a single function-local flow.
### 1. Collapse `DefinitionScanContext` + `DefinitionAccumulator` into one local state
Instead of passing a `DefinitionScanContext` and `DefinitionAccumulator` between helpers, you can use a single local state struct inside `collect_definition_updates`:
```rust
struct ScanState<'a> {
mapping: &'a mut HashMap<usize, usize>,
next_number: &'a mut usize,
numeric_list_range: Option<(usize, usize)>,
skip_numeric_conversion: bool,
definitions: Vec<DefinitionLine>,
is_definition_line: Vec<bool>,
numeric_candidates: Vec<NumericCandidate>,
}
```
Then `collect_definition_updates` owns the control flow and state:
```rust
fn collect_definition_updates(
lines: &[String],
mapping: &mut HashMap<usize, usize>,
) -> DefinitionUpdates {
let mut next_number = mapping.values().copied().max().unwrap_or(0) + 1;
let numeric_list_range = footnote_block_range(lines);
let skip_numeric_conversion = numeric_list_range
.as_ref()
.is_some_and(|(start, _)| has_existing_footnote_block(lines, *start));
let mut state = ScanState {
mapping,
next_number: &mut next_number,
numeric_list_range,
skip_numeric_conversion,
definitions: Vec::new(),
is_definition_line: vec![false; lines.len()],
numeric_candidates: Vec::new(),
};
collect_scan_updates(lines, &mut state);
finalize_numeric_candidates(&mut state);
DefinitionUpdates {
definitions: state.definitions,
is_definition_line: state.is_definition_line,
}
}
```
### 2. Simplify `collect_scan_updates` / `finalize_numeric_candidates` signatures
With the unified `ScanState`, the helpers become simpler and avoid tuple juggling and separate accumulator types:
```rust
fn collect_scan_updates(lines: &[String], state: &mut ScanState<'_>) {
let mut in_fence = false;
for (index, line) in lines.iter().enumerate() {
if is_fence_line(line) {
in_fence = !in_fence;
continue;
}
if in_fence {
continue;
}
if let Some(parts) = parse_definition(line) {
state.definitions.push(definition_line_from_parts(
index,
parts,
state.mapping,
state.next_number,
));
state.is_definition_line[index] = true;
continue;
}
if !should_convert_numeric_line(
index,
state.numeric_list_range,
state.skip_numeric_conversion,
) {
continue;
}
if state.mapping.is_empty() && state.definitions.is_empty() {
continue;
}
if let Some(candidate) = numeric_candidate_from_line(line, index) {
state.numeric_candidates.push(candidate);
}
}
}
fn finalize_numeric_candidates(state: &mut ScanState<'_>) {
for candidate in state.numeric_candidates.drain(..).rev() {
let new_number = assign_new_number(state.mapping, candidate.number, state.next_number);
let rewritten_rest = rewrite_tokens(&candidate.rest, state.mapping);
let mut line = String::with_capacity(
candidate.indent.len() + candidate.whitespace.len() + rewritten_rest.len() + 8,
);
line.push_str(&candidate.indent);
write!(&mut line, "[^{new_number}]:").expect("write to string cannot fail");
line.push_str(&candidate.whitespace);
line.push_str(&rewritten_rest);
state.definitions.push(DefinitionLine {
index: candidate.index,
new_number,
line,
});
state.is_definition_line[candidate.index] = true;
}
}
```
This keeps:
- `definition_line_from_parts` and `numeric_candidate_from_line` as reusable helpers.
- The “scan then reverse-apply numeric candidates” behavior.
But it:
- Removes `DefinitionScanContext` and `DefinitionAccumulator` as separate concepts.
- Avoids `(acc, numeric_candidates)` tuples and multiple lifetimes being threaded around.
- Keeps all the logic reachable from one primary function (`collect_definition_updates`), which matches the original linear mental model while retaining your refactoring improvements.
</issue_to_address>
### Comment 3
<location path="src/wrap.rs" line_range="124" />
<code_context>
#[must_use]
pub fn wrap_text(lines: &[String], width: usize) -> Vec<String> {
let mut out = Vec::new();
- let mut buf: Vec<(String, bool)> = Vec::new();
</code_context>
<issue_to_address>
**issue (complexity):** Consider merging `ParagraphState` into `ParagraphWriter`, simplifying `PrefixLine`, and inlining passthrough checks to make `wrap_text`’s control flow and APIs more linear and self‑contained.
1. Merge `ParagraphState` into `ParagraphWriter` to reduce cross‑type coupling
Right now every call site has to thread `&mut writer` and `&mut state` together, and behavior depends on both. You can keep the same external behavior while collapsing them into a single type that owns the paragraph buffer and indentation, and exposes the same high‑level methods.
Example of the shape of the change:
```rust
// before
let mut state = ParagraphState::default();
let mut writer = ParagraphWriter::new(&mut out, width);
for line in lines {
if fence::handle_fence_line(line, &mut writer, &mut state, &mut fence_tracker) { … }
if fence_tracker.in_fence() {
writer.push_verbatim(&mut state, line);
continue;
}
if is_passthrough_block(line) {
writer.push_verbatim(&mut state, line);
continue;
}
if let Some(prefix_line) = prefix_line(line) {
writer.handle_prefix_line(&mut state, &prefix_line);
continue;
}
state.note_indent(line);
let (text, hard_break) = line_break_parts(line);
state.push(text, hard_break);
}
writer.flush_paragraph(&mut state);
```
```rust
// after: single struct owns the state
pub struct ParagraphWriter<'a> {
out: &'a mut Vec<String>,
width: usize,
// moved fields from ParagraphState:
buf: Vec<(String, bool)>,
indent: String,
// any other state...
}
impl<'a> ParagraphWriter<'a> {
pub fn new(out: &'a mut Vec<String>, width: usize) -> Self { … }
pub fn push_verbatim(&mut self, line: &str) { … } // internally flushes buf/indent as needed
pub fn handle_prefix_line(&mut self, prefix_line: &PrefixLine<'_>) { … }
pub fn note_indent(&mut self, line: &str) { … }
pub fn push_wrapped(&mut self, text: String, hard_break: bool) { … }
pub fn flush_paragraph(&mut self) { … }
}
```
```rust
// wrap_text becomes simpler to read:
let mut writer = ParagraphWriter::new(&mut out, width);
for line in lines {
if fence::handle_fence_line(line, &mut writer, &mut fence_tracker) {
continue;
}
if fence_tracker.in_fence() {
writer.push_verbatim(line);
continue;
}
if is_passthrough_block(line) {
writer.push_verbatim(line);
continue;
}
if let Some(prefix_line) = prefix_line(line) {
writer.handle_prefix_line(&prefix_line);
continue;
}
writer.note_indent(line);
let (text, hard_break) = line_break_parts(line);
writer.push_wrapped(text, hard_break);
}
writer.flush_paragraph();
```
This keeps all current semantics (the logic inside `ParagraphState` just moves into `ParagraphWriter`), but reduces the number of moving parts and hidden early returns (only one object mutates paragraph state).
---
2. Simplify `PrefixLine` by avoiding `Cow` unless you really need it
The only owned case is the footnote prefix `format!("{prefix}{marker}")`. If that optimization isn’t performance‑critical, you can drop `Cow` from public signatures and keep ownership concerns inside the writer. This makes lifetimes and method signatures easier to follow.
Example:
```rust
// before
pub struct PrefixLine<'a> {
pub prefix: Cow<'a, str>,
pub rest: &'a str,
pub repeat_prefix: bool,
}
```
```rust
// after: only borrows; writer can allocate when necessary
pub struct PrefixLine<'a> {
pub prefix: &'a str,
pub rest: &'a str,
pub repeat_prefix: bool,
}
```
```rust
fn prefix_line(line: &str) -> Option<PrefixLine<'_>> {
if let Some(cap) = BULLET_RE.captures(line) {
return Some(PrefixLine {
prefix: cap.get(1).expect("bullet regex capture").as_str(),
rest: cap.get(2).expect("bullet regex remainder capture").as_str(),
repeat_prefix: false,
});
}
if let Some(cap) = FOOTNOTE_RE.captures(line) {
let prefix = cap.get(1).expect("footnote prefix capture").as_str();
let marker = cap.get(2).expect("footnote marker capture").as_str();
let rest = cap
.get(3)
.expect("footnote regex remainder capture")
.as_str();
// pass prefix + marker separately; concatenate inside writer
return Some(PrefixLine {
prefix,
rest,
repeat_prefix: false,
});
}
BLOCKQUOTE_RE.captures(line).map(|cap| PrefixLine {
prefix: cap.get(1).expect("blockquote prefix capture").as_str(),
rest: cap.get(2).expect("blockquote regex remainder capture").as_str(),
repeat_prefix: true,
})
}
```
Then let `ParagraphWriter::handle_prefix_line` do the one allocation for footnotes when emitting the first line of the paragraph, instead of requiring `Cow` throughout the API.
---
3. Make passthrough control flow more linear in `wrap_text`
The small helpers are fine, but you can make the early‑return cases easier to scan by inlining `is_table_or_separator` into `is_passthrough_block` and keeping all passthrough checks together:
```rust
fn is_passthrough_block(line: &str) -> bool {
let trimmed = line.trim();
line.trim_start().starts_with('|')
|| crate::table::SEP_RE.is_match(trimmed)
|| matches!(
classify_block(line),
Some(BlockKind::Heading | BlockKind::MarkdownlintDirective)
)
|| trimmed.is_empty()
|| is_indented_code_line(line)
}
```
This keeps `wrap_text`’s top section as a clear sequence of “fence → passthrough → prefix → paragraph” checks without requiring the reader to jump between multiple tiny predicates.
</issue_to_address>
### Comment 4
<location path="src/wrap/paragraph.rs" line_range="18" />
<code_context>
+ true
+}
+
+#[derive(Default)]
+struct HtmlTableState {
+ buf: Vec<String>,
</code_context>
<issue_to_address>
**issue (complexity):** Consider merging `ParagraphState` into `ParagraphWriter`, inlining the prefix-wrapping logic into `handle_prefix_line`, and potentially simplifying `PrefixLine` to reduce cognitive overhead while preserving behavior.
You can retain the new behavior while reducing the number of moving parts by collapsing `ParagraphState` into `ParagraphWriter` and inlining some of the prefix handling. This removes the “two objects + lifetimes” cognitive load without changing semantics.
### 1. Merge `ParagraphState` into `ParagraphWriter`
Instead of passing `ParagraphState` around, let `ParagraphWriter` own the paragraph state. That way `wrap_text` only needs a single `&mut ParagraphWriter`.
```rust
pub(super) struct ParagraphWriter<'a> {
out: &'a mut Vec<String>,
width: usize,
buf: Vec<(String, bool)>,
indent: String,
}
impl<'a> ParagraphWriter<'a> {
pub(super) fn new(out: &'a mut Vec<String>, width: usize) -> Self {
Self {
out,
width,
buf: Vec::new(),
indent: String::new(),
}
}
pub(super) fn clear(&mut self) {
self.buf.clear();
self.indent.clear();
}
pub(super) fn note_indent(&mut self, line: &str) {
if self.buf.is_empty() {
self.indent = line.chars().take_while(|c| c.is_whitespace()).collect();
}
}
pub(super) fn push(&mut self, text: String, hard_break: bool) {
self.buf.push((text, hard_break));
}
pub(super) fn flush_paragraph(&mut self) {
if self.buf.is_empty() {
return;
}
let mut segment = String::new();
for (text, hard_break) in &self.buf {
if !segment.is_empty() {
segment.push(' ');
}
segment.push_str(text);
if *hard_break {
self.push_wrapped_segment(&self.indent, &segment);
segment.clear();
}
}
if !segment.is_empty() {
self.push_wrapped_segment(&self.indent, &segment);
}
self.clear();
}
fn push_wrapped_segment(&mut self, indent: &str, segment: &str) {
for line in wrap_preserving_code(segment, self.width - indent.len()) {
self.out.push(format!("{indent}{line}"));
}
}
pub(super) fn push_verbatim(&mut self, line: &str) {
self.flush_paragraph();
self.out.push(line.to_string());
}
}
```
Callers that currently pass `&mut ParagraphState` can instead just use `&mut ParagraphWriter` (e.g. `writer.push(...)`, `writer.note_indent(...)`, etc.), and `flush_paragraph`/`push_verbatim` no longer need a `state` parameter.
### 2. Inline `append_wrapped_with_prefix` into `handle_prefix_line`
If `append_wrapped_with_prefix` is only used from `handle_prefix_line`, you can inline it to avoid one extra layer of indirection:
```rust
impl<'a> ParagraphWriter<'a> {
pub(super) fn handle_prefix_line(&mut self, prefix_line: &PrefixLine<'_>) {
self.flush_paragraph();
let prefix = prefix_line.prefix.as_ref();
let prefix_width = UnicodeWidthStr::width(prefix);
let available = self.width.saturating_sub(prefix_width).max(1);
let indent_str: String =
prefix.chars().take_while(|c| c.is_whitespace()).collect();
let indent_width = UnicodeWidthStr::width(indent_str.as_str());
let wrapped_indent = if prefix_line.repeat_prefix {
prefix.to_string()
} else {
format!("{}{}", indent_str, " ".repeat(prefix_width - indent_width))
};
let lines = wrap_preserving_code(prefix_line.rest, available);
if lines.is_empty() {
self.out.push(prefix.to_string());
return;
}
for (index, wrapped_line) in lines.iter().enumerate() {
if index == 0 {
self.out.push(format!("{prefix}{wrapped_line}"));
} else {
self.out.push(format!("{wrapped_indent}{wrapped_line}"));
}
}
}
}
```
This keeps the behavior identical but localizes all “prefix” behavior in one method.
### 3. Consider simplifying `PrefixLine` if `Cow` isn’t pulling its weight
If you don’t actually need to pass owned prefixes frequently, you can simplify the type:
```rust
pub(super) struct PrefixLine<'a> {
pub(super) prefix: &'a str,
pub(super) rest: &'a str,
pub(super) repeat_prefix: bool,
}
```
Callers that need an owned prefix can still build a `String` and hold it locally, then pass `&prefix_string` into `PrefixLine`. This keeps the public API simpler while preserving all capabilities.
</issue_to_address>
### Comment 5
<location path="src/html.rs" line_range="228" />
<code_context>
+ true
+}
+
+#[derive(Default)]
+struct HtmlTableState {
+ buf: Vec<String>,
</code_context>
<issue_to_address>
**issue (complexity):** Consider consolidating table parsing into a single `HtmlTableState` abstraction that owns its helpers and state so callers no longer juggle parallel `buf + depth` logic and flags.
You now have two parallel abstractions for table state (`buf + depth` in `html_table_to_markdown` and `HtmlTableState` in `convert_html_tables`) plus tiny free helpers. You can keep the refactor while reducing moving parts by:
1. **Move the free helpers into `HtmlTableState` and hide the `depth` contract.**
```rust
impl HtmlTableState {
fn push_html_line(&mut self, line: &str, out: &mut Vec<String>) -> bool {
self.buf.push(line.to_string());
self.depth += TABLE_START_RE.find_iter(line).count();
if TABLE_END_RE.is_match(line) {
self.depth = self.depth.saturating_sub(TABLE_END_RE.find_iter(line).count());
}
if self.depth == 0 {
out.extend(table_lines_to_markdown(&self.buf));
self.buf.clear();
self.in_html = false;
return true;
}
true
}
}
```
Then `convert_html_tables` doesn’t need to know about `append_html_table_line` / `flush_completed_html_table`:
```rust
if html_state.in_html {
html_state.push_html_line(line, &mut out);
continue;
}
if TABLE_START_RE.is_match(line.trim_start()) {
html_state.in_html = true;
html_state.push_html_line(line, &mut out);
continue;
}
```
`append_html_table_line` and `flush_completed_html_table` can then be private helpers *inside* the impl or removed entirely.
2. **Reuse `HtmlTableState` in `html_table_to_markdown` to avoid duplicating the state machine.**
```rust
pub(crate) fn html_table_to_markdown(lines: &[String]) -> Vec<String> {
let mut out = Vec::new();
let mut state = HtmlTableState::default();
for line in lines {
if state.depth > 0 || TABLE_START_RE.is_match(line.trim_start()) {
state.in_html = true;
state.push_html_line(line, &mut out);
continue;
}
out.push(line.clone());
}
state.flush_raw(&mut out);
out
}
```
This removes the second manual `buf + depth` pair and keeps all table behavior in one place.
3. **Consider dropping `in_html` and derive it from state.**
If you don’t have any “active but empty” table state, you can use `depth > 0` or `!buf.is_empty()` instead of a separate flag:
```rust
impl HtmlTableState {
fn in_html(&self) -> bool {
self.depth > 0
}
}
```
Then `convert_html_tables` becomes:
```rust
if html_state.in_html() {
html_state.push_html_line(line, &mut out);
continue;
}
if TABLE_START_RE.is_match(line.trim_start()) {
html_state.push_html_line(line, &mut out);
continue;
}
```
This keeps the new organization but collapses the table handling into a single, self-contained abstraction with a clear API.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
|
||
| [package.metadata.binstall.overrides.'cfg(all(target_os = "linux", any(target_arch = "x86_64", target_arch = "aarch64"), target_env = "gnu"))'] | ||
| pkg-url = "{ repo }/releases/download/v{ version }/{ name }-{ version }-{ target }.tar.gz" | ||
| bin-dir = "{ bin }{ binary-ext }" |
There was a problem hiding this comment.
issue (bug_risk): The bin-dir value in the cargo-binstall metadata looks like a file path instead of a directory, which may break installation.
In cargo-binstall metadata, bin-dir should point to the directory inside the archive that contains the binary (e.g. "." or "bin"), not the full binary path. With bin-dir = "{ bin }{ binary-ext }", this expands to the full filename, while your .tar.gz places the binary at the archive root. To match the archive layout and let cargo-binstall find the binary, this should be bin-dir = "." (or another actual directory path if you change the archive structure).
| is_definition_line: Vec<bool>, | ||
| } | ||
|
|
||
| struct DefinitionScanContext<'a> { |
There was a problem hiding this comment.
issue (complexity): Consider collapsing the new scan context and accumulator into a single state struct and driving the scan/finalize flow directly from collect_definition_updates to reduce indirection.
You can keep the new helpers/behavior but reduce indirection by collapsing the scan context/accumulator and merging the two-phase scan/finalize into a single function-local flow.
1. Collapse DefinitionScanContext + DefinitionAccumulator into one local state
Instead of passing a DefinitionScanContext and DefinitionAccumulator between helpers, you can use a single local state struct inside collect_definition_updates:
struct ScanState<'a> {
mapping: &'a mut HashMap<usize, usize>,
next_number: &'a mut usize,
numeric_list_range: Option<(usize, usize)>,
skip_numeric_conversion: bool,
definitions: Vec<DefinitionLine>,
is_definition_line: Vec<bool>,
numeric_candidates: Vec<NumericCandidate>,
}Then collect_definition_updates owns the control flow and state:
fn collect_definition_updates(
lines: &[String],
mapping: &mut HashMap<usize, usize>,
) -> DefinitionUpdates {
let mut next_number = mapping.values().copied().max().unwrap_or(0) + 1;
let numeric_list_range = footnote_block_range(lines);
let skip_numeric_conversion = numeric_list_range
.as_ref()
.is_some_and(|(start, _)| has_existing_footnote_block(lines, *start));
let mut state = ScanState {
mapping,
next_number: &mut next_number,
numeric_list_range,
skip_numeric_conversion,
definitions: Vec::new(),
is_definition_line: vec![false; lines.len()],
numeric_candidates: Vec::new(),
};
collect_scan_updates(lines, &mut state);
finalize_numeric_candidates(&mut state);
DefinitionUpdates {
definitions: state.definitions,
is_definition_line: state.is_definition_line,
}
}2. Simplify collect_scan_updates / finalize_numeric_candidates signatures
With the unified ScanState, the helpers become simpler and avoid tuple juggling and separate accumulator types:
fn collect_scan_updates(lines: &[String], state: &mut ScanState<'_>) {
let mut in_fence = false;
for (index, line) in lines.iter().enumerate() {
if is_fence_line(line) {
in_fence = !in_fence;
continue;
}
if in_fence {
continue;
}
if let Some(parts) = parse_definition(line) {
state.definitions.push(definition_line_from_parts(
index,
parts,
state.mapping,
state.next_number,
));
state.is_definition_line[index] = true;
continue;
}
if !should_convert_numeric_line(
index,
state.numeric_list_range,
state.skip_numeric_conversion,
) {
continue;
}
if state.mapping.is_empty() && state.definitions.is_empty() {
continue;
}
if let Some(candidate) = numeric_candidate_from_line(line, index) {
state.numeric_candidates.push(candidate);
}
}
}
fn finalize_numeric_candidates(state: &mut ScanState<'_>) {
for candidate in state.numeric_candidates.drain(..).rev() {
let new_number = assign_new_number(state.mapping, candidate.number, state.next_number);
let rewritten_rest = rewrite_tokens(&candidate.rest, state.mapping);
let mut line = String::with_capacity(
candidate.indent.len() + candidate.whitespace.len() + rewritten_rest.len() + 8,
);
line.push_str(&candidate.indent);
write!(&mut line, "[^{new_number}]:").expect("write to string cannot fail");
line.push_str(&candidate.whitespace);
line.push_str(&rewritten_rest);
state.definitions.push(DefinitionLine {
index: candidate.index,
new_number,
line,
});
state.is_definition_line[candidate.index] = true;
}
}This keeps:
definition_line_from_partsandnumeric_candidate_from_lineas reusable helpers.- The “scan then reverse-apply numeric candidates” behavior.
But it:
- Removes
DefinitionScanContextandDefinitionAccumulatoras separate concepts. - Avoids
(acc, numeric_candidates)tuples and multiple lifetimes being threaded around. - Keeps all the logic reachable from one primary function (
collect_definition_updates), which matches the original linear mental model while retaining your refactoring improvements.
| /// # Panics | ||
| /// Panics if regex captures fail unexpectedly. | ||
| #[must_use] | ||
| pub fn wrap_text(lines: &[String], width: usize) -> Vec<String> { |
There was a problem hiding this comment.
issue (complexity): Consider merging ParagraphState into ParagraphWriter, simplifying PrefixLine, and inlining passthrough checks to make wrap_text’s control flow and APIs more linear and self‑contained.
- Merge
ParagraphStateintoParagraphWriterto reduce cross‑type coupling
Right now every call site has to thread&mut writerand&mut statetogether, and behavior depends on both. You can keep the same external behavior while collapsing them into a single type that owns the paragraph buffer and indentation, and exposes the same high‑level methods.
Example of the shape of the change:
// before
let mut state = ParagraphState::default();
let mut writer = ParagraphWriter::new(&mut out, width);
for line in lines {
if fence::handle_fence_line(line, &mut writer, &mut state, &mut fence_tracker) { … }
if fence_tracker.in_fence() {
writer.push_verbatim(&mut state, line);
continue;
}
if is_passthrough_block(line) {
writer.push_verbatim(&mut state, line);
continue;
}
if let Some(prefix_line) = prefix_line(line) {
writer.handle_prefix_line(&mut state, &prefix_line);
continue;
}
state.note_indent(line);
let (text, hard_break) = line_break_parts(line);
state.push(text, hard_break);
}
writer.flush_paragraph(&mut state);// after: single struct owns the state
pub struct ParagraphWriter<'a> {
out: &'a mut Vec<String>,
width: usize,
// moved fields from ParagraphState:
buf: Vec<(String, bool)>,
indent: String,
// any other state...
}
impl<'a> ParagraphWriter<'a> {
pub fn new(out: &'a mut Vec<String>, width: usize) -> Self { … }
pub fn push_verbatim(&mut self, line: &str) { … } // internally flushes buf/indent as needed
pub fn handle_prefix_line(&mut self, prefix_line: &PrefixLine<'_>) { … }
pub fn note_indent(&mut self, line: &str) { … }
pub fn push_wrapped(&mut self, text: String, hard_break: bool) { … }
pub fn flush_paragraph(&mut self) { … }
}// wrap_text becomes simpler to read:
let mut writer = ParagraphWriter::new(&mut out, width);
for line in lines {
if fence::handle_fence_line(line, &mut writer, &mut fence_tracker) {
continue;
}
if fence_tracker.in_fence() {
writer.push_verbatim(line);
continue;
}
if is_passthrough_block(line) {
writer.push_verbatim(line);
continue;
}
if let Some(prefix_line) = prefix_line(line) {
writer.handle_prefix_line(&prefix_line);
continue;
}
writer.note_indent(line);
let (text, hard_break) = line_break_parts(line);
writer.push_wrapped(text, hard_break);
}
writer.flush_paragraph();This keeps all current semantics (the logic inside ParagraphState just moves into ParagraphWriter), but reduces the number of moving parts and hidden early returns (only one object mutates paragraph state).
- Simplify
PrefixLineby avoidingCowunless you really need it
The only owned case is the footnote prefixformat!("{prefix}{marker}"). If that optimization isn’t performance‑critical, you can dropCowfrom public signatures and keep ownership concerns inside the writer. This makes lifetimes and method signatures easier to follow.
Example:
// before
pub struct PrefixLine<'a> {
pub prefix: Cow<'a, str>,
pub rest: &'a str,
pub repeat_prefix: bool,
}// after: only borrows; writer can allocate when necessary
pub struct PrefixLine<'a> {
pub prefix: &'a str,
pub rest: &'a str,
pub repeat_prefix: bool,
}fn prefix_line(line: &str) -> Option<PrefixLine<'_>> {
if let Some(cap) = BULLET_RE.captures(line) {
return Some(PrefixLine {
prefix: cap.get(1).expect("bullet regex capture").as_str(),
rest: cap.get(2).expect("bullet regex remainder capture").as_str(),
repeat_prefix: false,
});
}
if let Some(cap) = FOOTNOTE_RE.captures(line) {
let prefix = cap.get(1).expect("footnote prefix capture").as_str();
let marker = cap.get(2).expect("footnote marker capture").as_str();
let rest = cap
.get(3)
.expect("footnote regex remainder capture")
.as_str();
// pass prefix + marker separately; concatenate inside writer
return Some(PrefixLine {
prefix,
rest,
repeat_prefix: false,
});
}
BLOCKQUOTE_RE.captures(line).map(|cap| PrefixLine {
prefix: cap.get(1).expect("blockquote prefix capture").as_str(),
rest: cap.get(2).expect("blockquote regex remainder capture").as_str(),
repeat_prefix: true,
})
}Then let ParagraphWriter::handle_prefix_line do the one allocation for footnotes when emitting the first line of the paragraph, instead of requiring Cow throughout the API.
- Make passthrough control flow more linear in
wrap_text
The small helpers are fine, but you can make the early‑return cases easier to scan by inliningis_table_or_separatorintois_passthrough_blockand keeping all passthrough checks together:
fn is_passthrough_block(line: &str) -> bool {
let trimmed = line.trim();
line.trim_start().starts_with('|')
|| crate::table::SEP_RE.is_match(trimmed)
|| matches!(
classify_block(line),
Some(BlockKind::Heading | BlockKind::MarkdownlintDirective)
)
|| trimmed.is_empty()
|| is_indented_code_line(line)
}This keeps wrap_text’s top section as a clear sequence of “fence → passthrough → prefix → paragraph” checks without requiring the reader to jump between multiple tiny predicates.
| pub(super) repeat_prefix: bool, | ||
| } | ||
|
|
||
| #[derive(Default)] |
There was a problem hiding this comment.
issue (complexity): Consider merging ParagraphState into ParagraphWriter, inlining the prefix-wrapping logic into handle_prefix_line, and potentially simplifying PrefixLine to reduce cognitive overhead while preserving behavior.
You can retain the new behavior while reducing the number of moving parts by collapsing ParagraphState into ParagraphWriter and inlining some of the prefix handling. This removes the “two objects + lifetimes” cognitive load without changing semantics.
1. Merge ParagraphState into ParagraphWriter
Instead of passing ParagraphState around, let ParagraphWriter own the paragraph state. That way wrap_text only needs a single &mut ParagraphWriter.
pub(super) struct ParagraphWriter<'a> {
out: &'a mut Vec<String>,
width: usize,
buf: Vec<(String, bool)>,
indent: String,
}
impl<'a> ParagraphWriter<'a> {
pub(super) fn new(out: &'a mut Vec<String>, width: usize) -> Self {
Self {
out,
width,
buf: Vec::new(),
indent: String::new(),
}
}
pub(super) fn clear(&mut self) {
self.buf.clear();
self.indent.clear();
}
pub(super) fn note_indent(&mut self, line: &str) {
if self.buf.is_empty() {
self.indent = line.chars().take_while(|c| c.is_whitespace()).collect();
}
}
pub(super) fn push(&mut self, text: String, hard_break: bool) {
self.buf.push((text, hard_break));
}
pub(super) fn flush_paragraph(&mut self) {
if self.buf.is_empty() {
return;
}
let mut segment = String::new();
for (text, hard_break) in &self.buf {
if !segment.is_empty() {
segment.push(' ');
}
segment.push_str(text);
if *hard_break {
self.push_wrapped_segment(&self.indent, &segment);
segment.clear();
}
}
if !segment.is_empty() {
self.push_wrapped_segment(&self.indent, &segment);
}
self.clear();
}
fn push_wrapped_segment(&mut self, indent: &str, segment: &str) {
for line in wrap_preserving_code(segment, self.width - indent.len()) {
self.out.push(format!("{indent}{line}"));
}
}
pub(super) fn push_verbatim(&mut self, line: &str) {
self.flush_paragraph();
self.out.push(line.to_string());
}
}Callers that currently pass &mut ParagraphState can instead just use &mut ParagraphWriter (e.g. writer.push(...), writer.note_indent(...), etc.), and flush_paragraph/push_verbatim no longer need a state parameter.
2. Inline append_wrapped_with_prefix into handle_prefix_line
If append_wrapped_with_prefix is only used from handle_prefix_line, you can inline it to avoid one extra layer of indirection:
impl<'a> ParagraphWriter<'a> {
pub(super) fn handle_prefix_line(&mut self, prefix_line: &PrefixLine<'_>) {
self.flush_paragraph();
let prefix = prefix_line.prefix.as_ref();
let prefix_width = UnicodeWidthStr::width(prefix);
let available = self.width.saturating_sub(prefix_width).max(1);
let indent_str: String =
prefix.chars().take_while(|c| c.is_whitespace()).collect();
let indent_width = UnicodeWidthStr::width(indent_str.as_str());
let wrapped_indent = if prefix_line.repeat_prefix {
prefix.to_string()
} else {
format!("{}{}", indent_str, " ".repeat(prefix_width - indent_width))
};
let lines = wrap_preserving_code(prefix_line.rest, available);
if lines.is_empty() {
self.out.push(prefix.to_string());
return;
}
for (index, wrapped_line) in lines.iter().enumerate() {
if index == 0 {
self.out.push(format!("{prefix}{wrapped_line}"));
} else {
self.out.push(format!("{wrapped_indent}{wrapped_line}"));
}
}
}
}This keeps the behavior identical but localizes all “prefix” behavior in one method.
3. Consider simplifying PrefixLine if Cow isn’t pulling its weight
If you don’t actually need to pass owned prefixes frequently, you can simplify the type:
pub(super) struct PrefixLine<'a> {
pub(super) prefix: &'a str,
pub(super) rest: &'a str,
pub(super) repeat_prefix: bool,
}Callers that need an owned prefix can still build a String and hold it locally, then pass &prefix_string into PrefixLine. This keeps the public API simpler while preserving all capabilities.
| true | ||
| } | ||
|
|
||
| #[derive(Default)] |
There was a problem hiding this comment.
issue (complexity): Consider consolidating table parsing into a single HtmlTableState abstraction that owns its helpers and state so callers no longer juggle parallel buf + depth logic and flags.
You now have two parallel abstractions for table state (buf + depth in html_table_to_markdown and HtmlTableState in convert_html_tables) plus tiny free helpers. You can keep the refactor while reducing moving parts by:
- Move the free helpers into
HtmlTableStateand hide thedepthcontract.
impl HtmlTableState {
fn push_html_line(&mut self, line: &str, out: &mut Vec<String>) -> bool {
self.buf.push(line.to_string());
self.depth += TABLE_START_RE.find_iter(line).count();
if TABLE_END_RE.is_match(line) {
self.depth = self.depth.saturating_sub(TABLE_END_RE.find_iter(line).count());
}
if self.depth == 0 {
out.extend(table_lines_to_markdown(&self.buf));
self.buf.clear();
self.in_html = false;
return true;
}
true
}
}Then convert_html_tables doesn’t need to know about append_html_table_line / flush_completed_html_table:
if html_state.in_html {
html_state.push_html_line(line, &mut out);
continue;
}
if TABLE_START_RE.is_match(line.trim_start()) {
html_state.in_html = true;
html_state.push_html_line(line, &mut out);
continue;
}append_html_table_line and flush_completed_html_table can then be private helpers inside the impl or removed entirely.
- Reuse
HtmlTableStateinhtml_table_to_markdownto avoid duplicating the state machine.
pub(crate) fn html_table_to_markdown(lines: &[String]) -> Vec<String> {
let mut out = Vec::new();
let mut state = HtmlTableState::default();
for line in lines {
if state.depth > 0 || TABLE_START_RE.is_match(line.trim_start()) {
state.in_html = true;
state.push_html_line(line, &mut out);
continue;
}
out.push(line.clone());
}
state.flush_raw(&mut out);
out
}This removes the second manual buf + depth pair and keeps all table behavior in one place.
- Consider dropping
in_htmland derive it from state.
If you don’t have any “active but empty” table state, you can use depth > 0 or !buf.is_empty() instead of a separate flag:
impl HtmlTableState {
fn in_html(&self) -> bool {
self.depth > 0
}
}Then convert_html_tables becomes:
if html_state.in_html() {
html_state.push_html_line(line, &mut out);
continue;
}
if TABLE_START_RE.is_match(line.trim_start()) {
html_state.push_html_line(line, &mut out);
continue;
}This keeps the new organization but collapses the table handling into a single, self-contained abstraction with a clear API.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3d51aed524
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| classify_block(line), | ||
| Some(BlockKind::Heading | BlockKind::MarkdownlintDirective) | ||
| ) | ||
| || line.trim().is_empty() |
There was a problem hiding this comment.
Treat whitespace-only passthrough lines as empty
Including line.trim().is_empty() in passthrough detection causes wrap_text to emit blank lines verbatim via writer.push_verbatim, so inputs like " " or "\t\t" are preserved instead of normalized to empty lines. That changes user-visible output (and conflicts with existing process_stream expectations for whitespace-only lines), leaving trailing whitespace in formatted documents.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/code_emphasis.rs`:
- Around line 101-128: Add a unit test that exercises the edge case in
consume_code_affixes where a non-empty pending prefix combines with a next Text
token whose split_marks(mid) is non-empty so the function resets prefix to empty
(the branch with "else { prefix = \"\"; }"). Construct tokens (using the Token
enum and IntoIter<Token<'a>> in the same way other tests do) so pending is a
non-empty marker string, the next Token::Text contains lead+mid+suffix (i.e.
both markers and content), and assert the returned tuple is ("" /*empty
prefix*/, lead or suffix as appropriate, true) and that the next text is sliced
by lead.len(); place the test alongside existing tests for code_emphasis.rs to
cover the documented "Mixed surrounding markers are left untouched" behavior.
In `@src/footnotes/renumber.rs`:
- Line 418: The use of .expect() on write!(&mut line, "[^{new_number}]:") must
be replaced with proper error propagation/handling: remove .expect(...) and use
the ? operator on the write! result (i.e. write!(...).? ) or map the error into
the function's existing error type, and update the enclosing function's return
type to Result<..., std::fmt::Error> (or convert via .map_err(...) if changing
the signature is not desired). Locate the write! invocation in renumber.rs (the
line variable and new_number) and apply the same fix as done earlier at the
other occurrence (replace .expect with ?/map_err and adjust the enclosing
function signature or error conversion accordingly).
- Around line 344-350: The code currently uses .expect() on regex captures
(caps.name("num") and caps.name("rest")) which is forbidden; replace both
.expect() calls with fallible handling using let-else guards (e.g., let
Some(num_match) = caps.name("num") else { return Err(...) } and similarly for
rest_match) or propagate a suitable error/result from the surrounding function
instead of panicking; then compute whitespace from line using the obtained
num_match and rest_match as before.
- Line 330: Replace the .expect() on the infallible write! call by ignoring the
fmt::Result instead of panicking; specifically update the write!(&mut line,
"[^{new_number}]:") usage (where `line` and `new_number` are defined) to use the
infallibility pattern (e.g., let _ = write!(...)) or otherwise handle the
fmt::Result, removing the .expect() call to comply with the no
.expect()/unwrap() guideline.
In `@src/html.rs`:
- Around line 211-216: The function append_html_table_line pushes the raw line
then counts opening <table> tags using TABLE_START_RE which is anchored at the
line start; trim the input before counting starts so indented/multiline tables
are recognized: replace the use of line when calling
TABLE_START_RE.find_iter(...) with a trimmed version (e.g., let trimmed =
line.trim_start()) and keep TABLE_END_RE matching on the original line as
before; update the increment to use TABLE_START_RE.find_iter(trimmed).count()
while leaving the decrement logic unchanged.
In `@src/lists.rs`:
- Around line 100-103: Add a derived Default impl to the ListState struct and
use its default constructor when initializing: add #[derive(Default)] above the
ListState declaration (which contains indent_stack: Vec<usize> and counters:
HashMap<usize, usize>) and replace the manual initializer with let mut state =
ListState::default(); so initialization is centralized and boilerplate is
removed.
In `@src/wrap.rs`:
- Around line 65-98: The prefix_line function currently calls .expect() on regex
capture groups (BULLET_RE, FOOTNOTE_RE, BLOCKQUOTE_RE) which must be removed;
change each branch to use let-else guards or safe captures and return None when
a capture is missing: for the BULLET_RE and FOOTNOTE_RE branches extract
cap.get(1), cap.get(2) (and cap.get(3) for FOOTNOTE_RE) with let Some(...) = ...
else { return None } before building the PrefixLine, and for the BLOCKQUOTE_RE
branch similarly map the capture to a PrefixLine only if cap.get(1) and
cap.get(2) are Some; ensure PrefixLine fields (prefix, rest, repeat_prefix) are
set the same way but without any .expect() calls.
In `@src/wrap/paragraph.rs`:
- Around line 100-103: The push_wrapped_segment implementation uses indent.len()
(byte length) causing wrong available width and potential underflow; change it
to compute the display width via UnicodeWidthStr::width(indent), then compute
available = self.width.saturating_sub(indent_width).max(1) and pass that
available value into wrap_preserving_code instead of self.width - indent.len();
update references in push_wrapped_segment and keep prefixing lines with the
original indent as before.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: f781191a-957a-4f54-9cb6-154c1a435958
📒 Files selected for processing (23)
.github/workflows/release.ymlCargo.tomlREADME.mddocs/release-process.mdsrc/code_emphasis.rssrc/fences.rssrc/footnotes/inline.rssrc/footnotes/parsing.rssrc/footnotes/renumber.rssrc/frontmatter.rssrc/html.rssrc/lists.rssrc/process.rssrc/table.rssrc/wrap.rssrc/wrap/fence.rssrc/wrap/inline.rssrc/wrap/line_buffer.rssrc/wrap/paragraph.rssrc/wrap/tests.rssrc/wrap/tokenize/mod.rssrc/wrap/tokenize/scanning.rstests/common/mod.rs
| fn consume_code_affixes<'a>( | ||
| tokens: &mut Peekable<IntoIter<Token<'a>>>, | ||
| pending: &mut &'a str, | ||
| ) -> (&'a str, &'a str, bool) { | ||
| let mut prefix = std::mem::take(pending); | ||
| let mut suffix = ""; | ||
| let mut modified = !prefix.is_empty(); | ||
|
|
||
| let Some(Token::Text(next)) = tokens.peek_mut() else { | ||
| return (prefix, suffix, modified); | ||
| }; | ||
|
|
||
| let (lead, mid, _) = split_marks(next); | ||
| if lead.is_empty() { | ||
| return (prefix, suffix, modified); | ||
| } | ||
|
|
||
| modified = true; | ||
| if prefix.is_empty() { | ||
| prefix = lead; | ||
| } else if mid.is_empty() { | ||
| suffix = lead; | ||
| } else { | ||
| prefix = ""; | ||
| } | ||
| *next = &next[lead.len()..]; | ||
| (prefix, suffix, modified) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for test cases covering consume_code_affixes edge cases
rg -n "consume_code_affixes\|code_emphasis" --type rust -A 5 -B 2 src/code_emphasis.rsRepository: leynos/mdtablefix
Length of output: 43
🏁 Script executed:
# First, let's confirm the file exists and check the repository structure
fd -type f -name "*.rs" | head -20Repository: leynos/mdtablefix
Length of output: 231
🏁 Script executed:
# Search more broadly for the file and function
git ls-files | grep -E "(code|emphasis)"Repository: leynos/mdtablefix
Length of output: 156
🏁 Script executed:
# Look for test files that might contain tests for this function
git ls-files | grep -i testRepository: leynos/mdtablefix
Length of output: 3034
🏁 Script executed:
# Check the size of the source file
wc -l src/code_emphasis.rsRepository: leynos/mdtablefix
Length of output: 85
🏁 Script executed:
# Read the source file
cat -n src/code_emphasis.rsRepository: leynos/mdtablefix
Length of output: 8421
🏁 Script executed:
# Check test file size and read it
wc -l tests/code_emphasis.rsRepository: leynos/mdtablefix
Length of output: 87
🏁 Script executed:
# Read the test file
cat -n tests/code_emphasis.rsRepository: leynos/mdtablefix
Length of output: 5151
🏁 Script executed:
# Check if there are test files with mixed emphasis patterns
grep -r "\\*\\*" tests/code_emphasis.rs tests/data/ | head -20Repository: leynos/mdtablefix
Length of output: 2305
🏁 Script executed:
# Look for any edge case patterns in test data files
find tests/data -name "*emphasis*" -o -name "*code*" | xargs cat 2>/dev/null | head -50Repository: leynos/mdtablefix
Length of output: 43
🏁 Script executed:
# Let's create a focused test to understand the edge case
cat > /tmp/test_edge_case.rs << 'EOF'
// Testing the specific edge case:
// What happens when prefix is non-empty AND mid is non-empty?
//
// Example input that would trigger this:
// `code` followed by **bold** (where ** comes with text)
// split_marks("**bold**") returns ("**", "bold", "**")
// If pending = "**" from previous token
// Then: prefix="**" (non-empty)
// lead="**" (non-empty)
// mid="bold" (non-empty)
// This hits line 124: prefix = ""
// The question: is this the correct handling?
EOF
cat /tmp/test_edge_case.rsRepository: leynos/mdtablefix
Length of output: 539
🏁 Script executed:
# Search for any documentation about the intended behaviour
grep -r "mixed" src/code_emphasis.rsRepository: leynos/mdtablefix
Length of output: 43
🏁 Script executed:
# Look at the module-level documentation more carefully
head -20 src/code_emphasis.rsRepository: leynos/mdtablefix
Length of output: 868
🏁 Script executed:
# Let me trace through a specific example to verify the logic
cat > /tmp/trace_logic.py << 'EOF'
# Simulating the consume_code_affixes logic
def split_marks(s):
"""Split emphasis markers at both ends"""
first = next((i for i, c in enumerate(s) if c not in '*_'), len(s))
last = next((i for i in range(len(s)-1, -1, -1) if s[i] not in '*_'), first-1) + 1
if last <= first:
last = first
return s[:first], s[first:last], s[last:]
def consume_code_affixes_trace(pending_in, next_text):
"""Trace through the logic with examples"""
prefix = pending_in
suffix = ""
modified = bool(pending_in)
print(f"Input: pending='{pending_in}', next_text='{next_text}'")
if not next_text:
print(f" No next text token")
return prefix, suffix, modified
lead, mid, trail = split_marks(next_text)
print(f" split_marks('{next_text}') = ('{lead}', '{mid}', '{trail}')")
if not lead:
print(f" No leading marks, return early")
return prefix, suffix, modified
modified = True
print(f" Set modified=True")
if not prefix:
prefix = lead
print(f" prefix was empty, set prefix='{lead}'")
elif not mid:
suffix = lead
print(f" mid was empty, set suffix='{lead}'")
else:
prefix = ""
print(f" EDGE CASE: mid is not empty, reset prefix to '' (avoid mixed markers)")
print(f" Result: prefix='{prefix}', suffix='{suffix}', modified={modified}")
return prefix, suffix, modified
# Test cases
print("=== Case 1: No pending, plain text ===")
consume_code_affixes_trace("", "text")
print("\n=== Case 2: Pending markers, next text is all markers ===")
consume_code_affixes_trace("**", "**")
print("\n=== Case 3: EDGE CASE - Pending markers, next text has markers AND content ===")
consume_code_affixes_trace("**", "**more**")
print("\n=== Case 4: EDGE CASE - Pending underscore, next text has asterisks ===")
consume_code_affixes_trace("_", "*more*")
EOF
python3 /tmp/trace_logic.pyRepository: leynos/mdtablefix
Length of output: 1048
🏁 Script executed:
# Let's create a comprehensive test to verify the actual behaviour with the example cases
cat > /tmp/verify_behavior.py << 'EOF'
# Test the actual scenario the review mentions
def trace_scenario(input_str, label):
print(f"\n=== {label} ===")
print(f"Input: {repr(input_str)}")
# The tokenizer would break this into tokens
# Then consume_code_affixes logic applies
# We need to understand what the expected output should be
print(f"(Requires understanding of tokenization)")
trace_scenario("**code**", "Simple case: both leading and trailing emphasis")
trace_scenario("`code`**text**", "Code followed by emphasis and text")
trace_scenario("**`code`**", "Emphasis wrapping code (no affixes needed)")
trace_scenario("`code`**", "Code followed by emphasis (no following text)")
# The review asks: does the edge case (line 124: prefix = "") handle **code** correctly?
# But **code** without surrounding backticks wouldn't be tokenized as code
# The function only processes Text tokens adjacent to Code tokens
# The actual concern seems to be: when there's pending emphasis from a Text token
# followed by a Code token, and then another Text token with both markers and content,
# should we keep or discard the pending emphasis?
print("\n=== Analysis of edge case ===")
print("When prefix is non-empty AND mid is non-empty (line 124):")
print("- prefix = '' clears the pending emphasis markers")
print("- modified = True still wraps the code")
print("- This avoids: **`code`**more** (ambiguous nesting)")
print("- Instead: `code`**more** (next text handles its own markers)")
EOF
python3 /tmp/verify_behavior.pyRepository: leynos/mdtablefix
Length of output: 804
🏁 Script executed:
# Let me check if there's test coverage specifically for the edge case scenario
grep -n "prefix\|mid\|suffix" src/code_emphasis.rs | head -20Repository: leynos/mdtablefix
Length of output: 766
🏁 Script executed:
# Let me check if there are any tests that would trigger the line 124 edge case
# by looking for test inputs with pending emphasis + content + markers
cat > /tmp/analyze_tests.py << 'EOF'
# Analyzing test coverage for the edge case
test_cases = [
("`StepContext`** Enhancement (in **`crates/rstest-bdd/src/context.rs`**)**",
"Complex case with nested code and emphasis"),
("**`code`**", "Emphasis only around code"),
("`code` text", "Standalone code"),
]
print("Test Coverage Analysis:")
print("=" * 60)
for input_str, description in test_cases:
print(f"\nTest: {description}")
print(f"Input: {repr(input_str)}")
# The edge case occurs when:
# 1. There's pending emphasis from previous text token (prefix non-empty)
# 2. Next token is code
# 3. After code, there's text with BOTH markers AND content
# (i.e., split_marks returns non-empty lead AND non-empty mid)
if "**" in input_str and "`" in input_str:
print(" ✓ Contains both emphasis and code")
# Check if it would trigger the edge case
# The edge case: prefix non-empty AND mid non-empty
# This happens when: pending emphasis + code + text with markers+content
print("\n" + "=" * 60)
print("\nEdge Case Scenario:")
print("- Pending emphasis markers from previous text token")
print("- Code token is processed")
print("- Next text token has split_marks returning:")
print(" - lead (markers): non-empty")
print(" - mid (content): non-empty")
print(" - This triggers line 124: prefix = ''")
print("\nNo explicit test case found for this specific scenario!")
EOF
python3 /tmp/analyze_tests.pyRepository: leynos/mdtablefix
Length of output: 809
Add explicit test coverage for the edge case where pending emphasis combines with code and marked content.
The logic at lines 119–125 intentionally handles three scenarios. When prefix is non-empty and the next text token's mid (content between markers) is also non-empty, the code resets prefix to empty (line 124). This prevents ambiguous marker nesting—avoiding patterns like **code**more**.
This behaviour is by design per the module documentation ("Mixed surrounding markers are left untouched"), but it lacks explicit test coverage. Add a test case triggering the scenario where pending emphasis from one text token combines with code followed by another text token containing both markers and content. This confirms the edge case handling is correct.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/code_emphasis.rs` around lines 101 - 128, Add a unit test that exercises
the edge case in consume_code_affixes where a non-empty pending prefix combines
with a next Text token whose split_marks(mid) is non-empty so the function
resets prefix to empty (the branch with "else { prefix = \"\"; }"). Construct
tokens (using the Token enum and IntoIter<Token<'a>> in the same way other tests
do) so pending is a non-empty marker string, the next Token::Text contains
lead+mid+suffix (i.e. both markers and content), and assert the returned tuple
is ("" /*empty prefix*/, lead or suffix as appropriate, true) and that the next
text is sliced by lead.len(); place the test alongside existing tests for
code_emphasis.rs to cover the documented "Mixed surrounding markers are left
untouched" behavior.
| let rewritten_rest = rewrite_tokens(parts.rest, mapping); | ||
| let mut line = String::with_capacity(parts.prefix.len() + rewritten_rest.len() + 8); | ||
| line.push_str(parts.prefix); | ||
| write!(&mut line, "[^{new_number}]:").expect("write to string cannot fail"); |
There was a problem hiding this comment.
Replace .expect() on infallible write with explicit handling.
Writing to a String via write! cannot fail, but the coding guidelines forbid .expect() outside tests. Use let _ = write!(...) or the write! macro's infallibility guarantee via fmt::Write for String.
🛠️ Proposed fix
- write!(&mut line, "[^{new_number}]:").expect("write to string cannot fail");
+ // Writing to String is infallible; discard the result.
+ let _ = write!(&mut line, "[^{new_number}]:");As per coding guidelines: ".expect() and .unwrap() are forbidden outside of tests."
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| write!(&mut line, "[^{new_number}]:").expect("write to string cannot fail"); | |
| // Writing to String is infallible; discard the result. | |
| let _ = write!(&mut line, "[^{new_number}]:"); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/footnotes/renumber.rs` at line 330, Replace the .expect() on the
infallible write! call by ignoring the fmt::Result instead of panicking;
specifically update the write!(&mut line, "[^{new_number}]:") usage (where
`line` and `new_number` are defined) to use the infallibility pattern (e.g., let
_ = write!(...)) or otherwise handle the fmt::Result, removing the .expect()
call to comply with the no .expect()/unwrap() guideline.
| let num_match = caps | ||
| .name("num") | ||
| .expect("numeric list capture missing number"); | ||
| let rest_match = caps | ||
| .name("rest") | ||
| .expect("numeric list capture missing rest"); | ||
| let whitespace = line[num_match.end() + 1..rest_match.start()].to_string(); |
There was a problem hiding this comment.
Replace .expect() with fallible handling or use let-else.
Per coding guidelines, .expect() is forbidden outside tests. Even though the regex guarantees these captures exist after a successful match, propagate the error or use a let-else guard to satisfy the lint.
🛠️ Proposed fix using let-else guards
- let num_match = caps
- .name("num")
- .expect("numeric list capture missing number");
- let rest_match = caps
- .name("rest")
- .expect("numeric list capture missing rest");
+ let Some(num_match) = caps.name("num") else {
+ return None;
+ };
+ let Some(rest_match) = caps.name("rest") else {
+ return None;
+ };As per coding guidelines: ".expect() and .unwrap() are forbidden outside of tests. Errors must be propagated."
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let num_match = caps | |
| .name("num") | |
| .expect("numeric list capture missing number"); | |
| let rest_match = caps | |
| .name("rest") | |
| .expect("numeric list capture missing rest"); | |
| let whitespace = line[num_match.end() + 1..rest_match.start()].to_string(); | |
| let Some(num_match) = caps.name("num") else { | |
| return None; | |
| }; | |
| let Some(rest_match) = caps.name("rest") else { | |
| return None; | |
| }; | |
| let whitespace = line[num_match.end() + 1..rest_match.start()].to_string(); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/footnotes/renumber.rs` around lines 344 - 350, The code currently uses
.expect() on regex captures (caps.name("num") and caps.name("rest")) which is
forbidden; replace both .expect() calls with fallible handling using let-else
guards (e.g., let Some(num_match) = caps.name("num") else { return Err(...) }
and similarly for rest_match) or propagate a suitable error/result from the
surrounding function instead of panicking; then compute whitespace from line
using the obtained num_match and rest_match as before.
| new_line.push_str(&rewritten_rest); | ||
| definitions.push(DefinitionLine { | ||
| line.push_str(&candidate.indent); | ||
| write!(&mut line, "[^{new_number}]:").expect("write to string cannot fail"); |
There was a problem hiding this comment.
Same .expect() pattern requires adjustment.
Apply the same fix as suggested for line 330.
🛠️ Proposed fix
- write!(&mut line, "[^{new_number}]:").expect("write to string cannot fail");
+ let _ = write!(&mut line, "[^{new_number}]:");As per coding guidelines: ".expect() and .unwrap() are forbidden outside of tests."
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| write!(&mut line, "[^{new_number}]:").expect("write to string cannot fail"); | |
| let _ = write!(&mut line, "[^{new_number}]:"); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/footnotes/renumber.rs` at line 418, The use of .expect() on write!(&mut
line, "[^{new_number}]:") must be replaced with proper error
propagation/handling: remove .expect(...) and use the ? operator on the write!
result (i.e. write!(...).? ) or map the error into the function's existing error
type, and update the enclosing function's return type to Result<...,
std::fmt::Error> (or convert via .map_err(...) if changing the signature is not
desired). Locate the write! invocation in renumber.rs (the line variable and
new_number) and apply the same fix as done earlier at the other occurrence
(replace .expect with ?/map_err and adjust the enclosing function signature or
error conversion accordingly).
| fn append_html_table_line(line: &str, buf: &mut Vec<String>, depth: &mut usize) { | ||
| buf.push(line.to_string()); | ||
| *depth += TABLE_START_RE.find_iter(line).count(); | ||
| if TABLE_END_RE.is_match(line) { | ||
| *depth = depth.saturating_sub(TABLE_END_RE.find_iter(line).count()); | ||
| if *depth == 0 { | ||
| out.extend(html_table_to_markdown(buf)); | ||
| buf.clear(); | ||
| *in_html = false; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, examine the append_html_table_line function and surrounding context
sed -n '200,230p' src/html.rs
# Check the regex definitions
grep -n 'TABLE_START_RE\|TABLE_END_RE' src/html.rs | head -20
# Look at call sites mentioned in the review
sed -n '275,290p' src/html.rs
sed -n '340,355p' src/html.rsRepository: leynos/mdtablefix
Length of output: 2112
🏁 Script executed:
# Get the regex definitions
sed -n '18,24p' src/html.rsRepository: leynos/mdtablefix
Length of output: 369
🏁 Script executed:
# Search for tests covering tables, especially indented ones
rg -n 'TABLE|table|indent' src/html.rs tests/ | grep -i 'test\|fn\|<table'
# Look for test files
find . -name '*test*' -o -name 'tests.rs' -o -path '*/tests/*' | head -20Repository: leynos/mdtablefix
Length of output: 23848
🏁 Script executed:
# Search more specifically for indented HTML table tests
grep -n 'indent.*table\|table.*indent' tests/table/*.rs
# Check the convert_html tests more carefully
head -100 tests/table/convert_html.rs
# Look at process_stream_tests for HTML table handling
grep -B5 -A5 'TABLE_START_RE\|trim_start' src/html.rs | head -50Repository: leynos/mdtablefix
Length of output: 4055
🏁 Script executed:
# Verify if there's a test for indented HTML tables
grep -r 'indented.*html\|html.*indented' tests/
# Check if any HTML table tests use indentation
grep -B2 -A2 'html_table' tests/table/convert_html.rs | grep -E '^\s{4}<'
# Let's look at the TABLE_END_RE usage in append_html_table_line
sed -n '211,216p' src/html.rs
# Verify the TABLE_END_RE pattern doesn't have anchoring
grep 'TABLE_END_RE' src/html.rs | head -5Repository: leynos/mdtablefix
Length of output: 501
🏁 Script executed:
# Check if any test includes an indented HTML table (4+ spaces before <table>)
grep -r '^\s\s\s\s<table' tests/
# Look at what indented_table fixture contains
sed -n '41,45p' tests/table/mod.rs
# Verify the actual behavior with a simple check of how depth tracking works
# Look at process_stream to see if it calls convert_html_tables
grep -n 'convert_html_tables\|html_table_to_markdown' src/html.rsRepository: leynos/mdtablefix
Length of output: 517
🏁 Script executed:
# Verify TABLE_END_RE doesn't have the same issue (no anchor)
grep -A1 'TABLE_END_RE.*lazy_regex' src/html.rs
# Create a minimal test case to confirm the bug
# Simulate what happens with an indented HTML table
python3 << 'EOF'
import re
TABLE_START_RE = re.compile(r"(?i)^<table(?:\s|>|$)")
TABLE_END_RE = re.compile(r"(?i)</table>")
# Indented table line
line = " <table>"
trimmed = line.lstrip()
print(f"Raw line: '{line}'")
print(f"Trimmed line: '{trimmed}'")
print(f"TABLE_START_RE matches raw: {bool(TABLE_START_RE.search(line))}")
print(f"TABLE_START_RE matches trimmed: {bool(TABLE_START_RE.search(trimmed))}")
print(f"find_iter on raw: {len(TABLE_START_RE.findall(line))}")
print(f"find_iter on trimmed: {len(TABLE_START_RE.findall(trimmed))}")
# Simulating the flow
print("\n--- Simulating current code flow ---")
depth = 0
print(f"Initial depth: {depth}")
print(f"Check: depth > 0 or TABLE_START_RE.is_match('{trimmed}'): {depth > 0 or bool(TABLE_START_RE.search(trimmed))}")
if depth > 0 or bool(TABLE_START_RE.search(trimmed)):
print("Entered table mode")
depth += TABLE_START_RE.findall(line).__len__() # Uses raw line
print(f"After counting on raw line: depth = {depth}")
if TABLE_END_RE.search(line):
depth -= TABLE_END_RE.findall(line).__len__()
print(f"Would flush? {depth == 0}")
EOFRepository: leynos/mdtablefix
Length of output: 414
🏁 Script executed:
# Verify TABLE_END_RE behavior with indented closing tag
python3 << 'EOF'
import re
TABLE_END_RE = re.compile(r"(?i)</table>")
# Test with indented closing tag
line = " </table>"
print(f"Line: '{line}'")
print(f"TABLE_END_RE.is_match: {bool(TABLE_END_RE.search(line))}")
print(f"TABLE_END_RE.find_iter count: {len(TABLE_END_RE.findall(line))}")
print("\nCONCLUSION: TABLE_END_RE matches regardless of indentation because it has no anchor")
EOF
# Look at the complete flow in both callers
echo "=== html_table_to_markdown ==="
sed -n '275,290p' src/html.rs
echo -e "\n=== convert_html_tables ==="
sed -n '321,360p' src/html.rsRepository: leynos/mdtablefix
Length of output: 1732
Trim the line before counting <table> openings.
Count start tags on line.trim_start(), not on the raw line. TABLE_START_RE is anchored at ^<table, so an indented multiline table inside a list or block quote never increments depth; the callers at lines 283 and 348 then flush the buffer after the first line.
Patch
fn append_html_table_line(line: &str, buf: &mut Vec<String>, depth: &mut usize) {
buf.push(line.to_string());
- *depth += TABLE_START_RE.find_iter(line).count();
+ let trimmed = line.trim_start();
+ *depth += TABLE_START_RE.find_iter(trimmed).count();
if TABLE_END_RE.is_match(line) {
*depth = depth.saturating_sub(TABLE_END_RE.find_iter(line).count());
}
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/html.rs` around lines 211 - 216, The function append_html_table_line
pushes the raw line then counts opening <table> tags using TABLE_START_RE which
is anchored at the line start; trim the input before counting starts so
indented/multiline tables are recognized: replace the use of line when calling
TABLE_START_RE.find_iter(...) with a trimmed version (e.g., let trimmed =
line.trim_start()) and keep TABLE_END_RE matching on the original line as
before; update the increment to use TABLE_START_RE.find_iter(trimmed).count()
while leaving the decrement logic unchanged.
| let mut state = ListState { | ||
| indent_stack: Vec::new(), | ||
| counters: HashMap::new(), | ||
| }; |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Derive Default for ListState and initialize via default().
Remove boilerplate and centralize initialization behaviour.
Apply concise initialization
+#[derive(Default)]
struct ListState {
indent_stack: Vec<usize>,
counters: HashMap<usize, usize>,
}
@@
- let mut state = ListState {
- indent_stack: Vec::new(),
- counters: HashMap::new(),
- };
+ let mut state = ListState::default();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/lists.rs` around lines 100 - 103, Add a derived Default impl to the
ListState struct and use its default constructor when initializing: add
#[derive(Default)] above the ListState declaration (which contains indent_stack:
Vec<usize> and counters: HashMap<usize, usize>) and replace the manual
initializer with let mut state = ListState::default(); so initialization is
centralized and boilerplate is removed.
| fn prefix_line(line: &str) -> Option<PrefixLine<'_>> { | ||
| if let Some(cap) = BULLET_RE.captures(line) { | ||
| let prefix = cap.get(1).expect("bullet regex capture").as_str(); | ||
| let rest = cap.get(2).expect("bullet regex remainder capture").as_str(); | ||
| return Some(PrefixLine { | ||
| prefix: Cow::Borrowed(prefix), | ||
| rest, | ||
| repeat_prefix: false, | ||
| }); | ||
| } | ||
|
|
||
| if let Some(cap) = FOOTNOTE_RE.captures(line) { | ||
| let prefix = cap.get(1).expect("footnote prefix capture").as_str(); | ||
| let marker = cap.get(2).expect("footnote marker capture").as_str(); | ||
| let rest = cap | ||
| .get(3) | ||
| .expect("footnote regex remainder capture") | ||
| .as_str(); | ||
| return Some(PrefixLine { | ||
| prefix: Cow::Owned(format!("{prefix}{marker}")), | ||
| rest, | ||
| repeat_prefix: false, | ||
| }); | ||
| } | ||
|
|
||
| BLOCKQUOTE_RE.captures(line).map(|cap| PrefixLine { | ||
| prefix: Cow::Borrowed(cap.get(1).expect("blockquote prefix capture").as_str()), | ||
| rest: cap | ||
| .get(2) | ||
| .expect("blockquote regex remainder capture") | ||
| .as_str(), | ||
| repeat_prefix: true, | ||
| }) | ||
| } |
There was a problem hiding this comment.
Replace .expect() calls with propagation or let-else guards.
The prefix_line function uses .expect() on regex capture groups (lines 67-68, 77-82, 91-95). Per coding guidelines, .expect() is forbidden outside tests. Since the function returns Option, return None on capture failure.
🛠️ Proposed fix using let-else guards
fn prefix_line(line: &str) -> Option<PrefixLine<'_>> {
if let Some(cap) = BULLET_RE.captures(line) {
- let prefix = cap.get(1).expect("bullet regex capture").as_str();
- let rest = cap.get(2).expect("bullet regex remainder capture").as_str();
+ let prefix = cap.get(1)?.as_str();
+ let rest = cap.get(2)?.as_str();
return Some(PrefixLine {
prefix: Cow::Borrowed(prefix),
rest,
repeat_prefix: false,
});
}
if let Some(cap) = FOOTNOTE_RE.captures(line) {
- let prefix = cap.get(1).expect("footnote prefix capture").as_str();
- let marker = cap.get(2).expect("footnote marker capture").as_str();
- let rest = cap
- .get(3)
- .expect("footnote regex remainder capture")
- .as_str();
+ let prefix = cap.get(1)?.as_str();
+ let marker = cap.get(2)?.as_str();
+ let rest = cap.get(3)?.as_str();
return Some(PrefixLine {
prefix: Cow::Owned(format!("{prefix}{marker}")),
rest,
repeat_prefix: false,
});
}
- BLOCKQUOTE_RE.captures(line).map(|cap| PrefixLine {
- prefix: Cow::Borrowed(cap.get(1).expect("blockquote prefix capture").as_str()),
- rest: cap
- .get(2)
- .expect("blockquote regex remainder capture")
- .as_str(),
+ let cap = BLOCKQUOTE_RE.captures(line)?;
+ Some(PrefixLine {
+ prefix: Cow::Borrowed(cap.get(1)?.as_str()),
+ rest: cap.get(2)?.as_str(),
repeat_prefix: true,
})
}As per coding guidelines: ".expect() and .unwrap() are forbidden outside of tests. Errors must be propagated."
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| fn prefix_line(line: &str) -> Option<PrefixLine<'_>> { | |
| if let Some(cap) = BULLET_RE.captures(line) { | |
| let prefix = cap.get(1).expect("bullet regex capture").as_str(); | |
| let rest = cap.get(2).expect("bullet regex remainder capture").as_str(); | |
| return Some(PrefixLine { | |
| prefix: Cow::Borrowed(prefix), | |
| rest, | |
| repeat_prefix: false, | |
| }); | |
| } | |
| if let Some(cap) = FOOTNOTE_RE.captures(line) { | |
| let prefix = cap.get(1).expect("footnote prefix capture").as_str(); | |
| let marker = cap.get(2).expect("footnote marker capture").as_str(); | |
| let rest = cap | |
| .get(3) | |
| .expect("footnote regex remainder capture") | |
| .as_str(); | |
| return Some(PrefixLine { | |
| prefix: Cow::Owned(format!("{prefix}{marker}")), | |
| rest, | |
| repeat_prefix: false, | |
| }); | |
| } | |
| BLOCKQUOTE_RE.captures(line).map(|cap| PrefixLine { | |
| prefix: Cow::Borrowed(cap.get(1).expect("blockquote prefix capture").as_str()), | |
| rest: cap | |
| .get(2) | |
| .expect("blockquote regex remainder capture") | |
| .as_str(), | |
| repeat_prefix: true, | |
| }) | |
| } | |
| fn prefix_line(line: &str) -> Option<PrefixLine<'_>> { | |
| if let Some(cap) = BULLET_RE.captures(line) { | |
| let prefix = cap.get(1)?.as_str(); | |
| let rest = cap.get(2)?.as_str(); | |
| return Some(PrefixLine { | |
| prefix: Cow::Borrowed(prefix), | |
| rest, | |
| repeat_prefix: false, | |
| }); | |
| } | |
| if let Some(cap) = FOOTNOTE_RE.captures(line) { | |
| let prefix = cap.get(1)?.as_str(); | |
| let marker = cap.get(2)?.as_str(); | |
| let rest = cap.get(3)?.as_str(); | |
| return Some(PrefixLine { | |
| prefix: Cow::Owned(format!("{prefix}{marker}")), | |
| rest, | |
| repeat_prefix: false, | |
| }); | |
| } | |
| let cap = BLOCKQUOTE_RE.captures(line)?; | |
| Some(PrefixLine { | |
| prefix: Cow::Borrowed(cap.get(1)?.as_str()), | |
| rest: cap.get(2)?.as_str(), | |
| repeat_prefix: true, | |
| }) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/wrap.rs` around lines 65 - 98, The prefix_line function currently calls
.expect() on regex capture groups (BULLET_RE, FOOTNOTE_RE, BLOCKQUOTE_RE) which
must be removed; change each branch to use let-else guards or safe captures and
return None when a capture is missing: for the BULLET_RE and FOOTNOTE_RE
branches extract cap.get(1), cap.get(2) (and cap.get(3) for FOOTNOTE_RE) with
let Some(...) = ... else { return None } before building the PrefixLine, and for
the BLOCKQUOTE_RE branch similarly map the capture to a PrefixLine only if
cap.get(1) and cap.get(2) are Some; ensure PrefixLine fields (prefix, rest,
repeat_prefix) are set the same way but without any .expect() calls.
| fn push_wrapped_segment(&mut self, indent: &str, segment: &str) { | ||
| for line in wrap_preserving_code(segment, self.width - indent.len()) { | ||
| self.out.push(format!("{indent}{line}")); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Read the target function and surrounding context
echo "=== Lines 95-110 of src/wrap/paragraph.rs ==="
sed -n '95,110p' src/wrap/paragraph.rs
echo
echo "=== Search for UnicodeWidthStr usage and imports ==="
rg -n 'UnicodeWidthStr|use.*unicode' src/wrap/paragraph.rs | head -20
echo
echo "=== Search for wrap_preserving_code definition ==="
rg -n 'fn wrap_preserving_code' src/
echo
echo "=== Check the pattern mentioned in comment (saturating_sub.max) ==="
rg -n 'saturating_sub.*max|width.*saturating_sub' src/wrap/paragraph.rsRepository: leynos/mdtablefix
Length of output: 1476
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Read the wrap_preserving_code function signature and context
echo "=== wrap_preserving_code function (lines 195-210) ==="
sed -n '195,210p' src/wrap/inline.rs
echo
echo "=== flush_paragraph to see how push_wrapped_segment is called ==="
rg -n 'flush_paragraph|push_wrapped_segment' src/wrap/paragraph.rs | head -15
echo
echo "=== Check the full append_wrapped_with_prefix for context ==="
sed -n '45,65p' src/wrap/paragraph.rsRepository: leynos/mdtablefix
Length of output: 2054
Use display width for wrapped indent to guard against underflow.
indent.len() counts bytes, not display columns. For non-ASCII indentation, this miscalculates available width. When indent exceeds self.width, the subtraction panics in checked builds and wraps in unchecked builds.
Adopt the pattern already used at line 52 in append_wrapped_with_prefix:
Patch
fn push_wrapped_segment(&mut self, indent: &str, segment: &str) {
- for line in wrap_preserving_code(segment, self.width - indent.len()) {
+ let indent_width = UnicodeWidthStr::width(indent);
+ let available = self.width.saturating_sub(indent_width).max(1);
+ for line in wrap_preserving_code(segment, available) {
self.out.push(format!("{indent}{line}"));
}
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/wrap/paragraph.rs` around lines 100 - 103, The push_wrapped_segment
implementation uses indent.len() (byte length) causing wrong available width and
potential underflow; change it to compute the display width via
UnicodeWidthStr::width(indent), then compute available =
self.width.saturating_sub(indent_width).max(1) and pass that available value
into wrap_preserving_code instead of self.width - indent.len(); update
references in push_wrapped_segment and keep prefixing lines with the original
indent as before.
|
@coderabbitai please can you give me an assessment of the cargo-binstall support. This is the focus of this pr. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 16628
Cargo-binstall Support Assessment1.
|
| Component | Value |
|---|---|
| Workflow produces | mdtablefix-0.3.1-x86_64-unknown-linux-gnu.tar.gz |
pkg-url resolves to |
.../v0.3.1/mdtablefix-0.3.1-x86_64-unknown-linux-gnu.tar.gz |
| Match | ✅ |
3. Release Job Sequencing
The release job correctly:
- Creates the GitHub release (via
softprops/action-gh-release@v1) - Downloads all build artifacts
- Uploads every file via
gh release upload
The artifact directory glob artifacts/${{ env.REPO_NAME }}-* correctly enumerates all per-target directories and picks up the binary, its SHA-256, the tar.gz archive, and the archive SHA-256. ✅
4. Documentation
Both README.md and docs/release-process.md accurately document the new installation path and clarify which targets produce binstall archives. The documented target list matches the workflow matrix and the cfg override. ✅
Summary
The implementation is well-structured and internally consistent. The three moving parts — the Cargo.toml override, the workflow archive naming, and the release upload glob — are all aligned. No issues found.
Summary by Sourcery
Add cargo-binstall packaging support and refactor core Markdown processing utilities while preserving existing behavior.
New Features:
Enhancements:
Build:
Documentation:
Tests: