Conversation
Reviewer's GuideEnhance wrapping logic to treat Markdown links as indivisible tokens, update README code blocks with language annotations and consistent wrapping, and add a test to verify link wrapping behavior. Class diagram for updated Markdown wrapping logicclassDiagram
class Wrap {
+tokenize_inline(text: &str) -> Vec<String>
}
class tokenize_inline {
+Handles Markdown links as indivisible tokens
+Handles images (e.g., ) as indivisible tokens
+Handles code spans as indivisible tokens
+Handles hyphenated words as indivisible tokens
}
Wrap --> tokenize_inline : uses
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
Warning Rate limit exceeded@leynos has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 5 minutes and 33 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (2)
Summary by CodeRabbit
WalkthroughExtend the inline tokeniser to recognise and handle Markdown links and images as atomic tokens, preventing them from being split during text wrapping. Add a test to verify that links remain unbroken after wrapping. Reformat the README for improved readability, adjusting line breaks and code block annotations without altering content. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Wrapper
participant Tokenizer
User->>Wrapper: Request to wrap paragraph with Markdown link/image
Wrapper->>Tokenizer: Tokenise paragraph content
Tokenizer->>Tokenizer: Detect link/image pattern ([...](...)/)
Tokenizer-->>Wrapper: Return tokens (links/images as atomic)
Wrapper-->>User: Return wrapped paragraph (links/images unbroken)
Possibly related PRs
Poem
✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Hey @leynos - I've reviewed your changes and found some issues that need to be addressed.
- Consider extracting the new link‐tokenization logic into its own helper function to improve readability and isolate complexity.
- It might be worth adding tests for image links (
) and other edge cases (e.g. missing or nested parentheses) to ensure the tokenizer handles them correctly.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider extracting the new link‐tokenization logic into its own helper function to improve readability and isolate complexity.
- It might be worth adding tests for image links (``) and other edge cases (e.g. missing or nested parentheses) to ensure the tokenizer handles them correctly.
## Individual Comments
### Comment 1
<location> `src/wrap.rs:75` </location>
<code_context>
tokens.push(chars[start..end].iter().collect());
i = end;
}
+ } else if c == '[' || (c == '!' && i + 1 < chars.len() && chars[i + 1] == '[') {
+ let start = i;
+ if c == '!' {
</code_context>
<issue_to_address>
Consider extracting the link and image parsing logic into a separate function to keep the main loop flat and readable.
Consider extracting the new link/image logic into its own function so the main loop stays flat and readable:
```rust
fn parse_link_or_image(chars: &[char], mut i: usize) -> (String, usize) {
let start = i;
// optional '!' prefix
if chars[i] == '!' { i += 1; }
// opening '['
i += 1;
// find closing ']'
while i < chars.len() && chars[i] != ']' { i += 1; }
if i < chars.len() && chars[i] == ']' {
i += 1;
// expect '('
if i < chars.len() && chars[i] == '(' {
i += 1;
// balance nested parens
let mut depth = 1;
while i < chars.len() && depth > 0 {
match chars[i] {
'(' => depth += 1,
')' => depth -= 1,
_ => {}
}
i += 1;
}
// return the whole slice and new index
let tok: String = chars[start..i].iter().collect();
return (tok, i);
}
}
// fallback: just one char
let tok: String = chars[start..start+1].iter().collect();
(tok, start + 1)
}
```
Then in `tokenize_inline` replace the big if-nest with:
```rust
} else if c == '[' || (c == '!' && i + 1 < chars.len() && chars[i+1] == '[') {
let (tok, new_i) = parse_link_or_image(&chars, i);
tokens.push(tok);
i = new_i;
continue;
} else {
// ...
}
```
This keeps identical behavior but dramatically reduces nesting in your main loop.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| tokens.push(chars[start..end].iter().collect()); | ||
| i = end; | ||
| } | ||
| } else if c == '[' || (c == '!' && i + 1 < chars.len() && chars[i + 1] == '[') { |
There was a problem hiding this comment.
issue (complexity): Consider extracting the link and image parsing logic into a separate function to keep the main loop flat and readable.
Consider extracting the new link/image logic into its own function so the main loop stays flat and readable:
fn parse_link_or_image(chars: &[char], mut i: usize) -> (String, usize) {
let start = i;
// optional '!' prefix
if chars[i] == '!' { i += 1; }
// opening '['
i += 1;
// find closing ']'
while i < chars.len() && chars[i] != ']' { i += 1; }
if i < chars.len() && chars[i] == ']' {
i += 1;
// expect '('
if i < chars.len() && chars[i] == '(' {
i += 1;
// balance nested parens
let mut depth = 1;
while i < chars.len() && depth > 0 {
match chars[i] {
'(' => depth += 1,
')' => depth -= 1,
_ => {}
}
i += 1;
}
// return the whole slice and new index
let tok: String = chars[start..i].iter().collect();
return (tok, i);
}
}
// fallback: just one char
let tok: String = chars[start..start+1].iter().collect();
(tok, start + 1)
}Then in tokenize_inline replace the big if-nest with:
} else if c == '[' || (c == '!' && i + 1 < chars.len() && chars[i+1] == '[') {
let (tok, new_i) = parse_link_or_image(&chars, i);
tokens.push(tok);
i = new_i;
continue;
} else {
// ...
}This keeps identical behavior but dramatically reduces nesting in your main loop.
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
README.md(6 hunks)src/wrap.rs(1 hunks)tests/wrap.rs(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs
Instructions used from:
Sources:
📄 CodeRabbit Inference Engine
- AGENTS.md
⚙️ CodeRabbit Configuration File
**/*.md
Instructions used from:
Sources:
📄 CodeRabbit Inference Engine
- AGENTS.md
⚙️ CodeRabbit Configuration File
🧬 Code Graph Analysis (1)
tests/wrap.rs (1)
src/process.rs (1)
process_stream(81-81)
🪛 LanguageTool
README.md
[style] ~49-~49: Would you like to use the Oxford spelling “standardize”? The spelling ‘standardise’ is also correct.
Context: ...indentation level. - Use --breaks to standardise thematic breaks to a line of 70 undersc...
(OXFORD_SPELLING_Z_NOT_S)
[style] ~52-~52: Consider using the typographical ellipsis character here instead.
Context: ...ipsis to replace groups of three dots (...) with the ellipsis character (…`). ...
(ELLIPSIS)
[style] ~58-~58: You have used the passive voice repeatedly in nearby sentences. To make your writing clearer and easier to read, consider using active voice.
Context: ...ed, input is read from stdin and output is written to stdout. ### Example: Table Reflow...
(REP_PASSIVE_VOICE)
[style] ~121-~121: Since ownership is already implied, this phrasing may be redundant.
Context: ...for embedding the table reflow logic in your own Rust project: Rust ```rust use mdtabl...
(PRP_OWN)
[style] ~152-~152: Would you like to use the Oxford spelling “recognizes”? The spelling ‘recognises’ is also correct.
Context: ...e. ## HTML table support mdtablefix recognises basic HTML <table> elements embedded ...
(OXFORD_SPELLING_Z_NOT_S)
[style] ~154-~154: ‘prior to’ might be wordy. Consider a shorter alternative.
Context: ...sing stage using convert_html_tables, prior to reflow. Only simple tables composed of...
(EN_WORDINESS_PREMIUM_PRIOR_TO)
[locale-violation] ~172-~172: License must be spelled with a “c” when used as a noun in British English. Use “licence”.
Context: ...th-rstest-fixtures.md) for details. ## License This project is licensed under the ISC...
(LICENCE_LICENSE_NOUN_SINGULAR)
[locale-violation] ~174-~174: License must be spelled with a “c” when used as a noun in British English. Use “licence”.
Context: ... This project is licensed under the ISC License. See the LICENSE file for fu...
(LICENCE_LICENSE_NOUN_SINGULAR)
[locale-violation] ~174-~174: LICENSE must be spelled with a “c” when used as a noun in British English. Use “licence”.
Context: ...icensed under the ISC License. See the LICENSE file for full details.
(LICENCE_LICENSE_NOUN_SINGULAR)
🔇 Additional comments (14)
README.md (12)
3-5: LGTM! Proper paragraph wrapping applied.The text has been correctly wrapped to 80 columns whilst preserving the original meaning and improving readability.
7-10: Formatting improvement maintained.The paragraph wrapping enhances readability whilst keeping the content intact.
18-18: Correct code block language annotation added.The
bashannotation improves syntax highlighting and follows the coding guidelines.
26-26: Consistent language annotation applied.The
bashannotation maintains consistency with other code blocks in the document.
34-34: Language annotation correctly applied.
38-59: Proper formatting applied to command-line options.The bullet points are correctly wrapped to 80 columns, improving readability whilst preserving all content. The use of "standardise" follows British English spelling conventions as required by the coding guidelines.
67-67: Markdown code block properly annotated.
77-77: Consistent markdown annotation applied.
91-117: Example formatting improved with proper annotations.The markdown code blocks are correctly annotated and the content is preserved whilst improving readability.
121-149: Library usage section properly formatted.The text wrapping and rust code block annotation improve readability whilst maintaining all essential information.
152-159: HTML table support section correctly formatted.The text wrapping improves readability. The use of "recognises" follows British English spelling conventions as required.
164-170: Section formatting improved.The text wrapping enhances readability whilst preserving content.
src/wrap.rs (1)
75-104: Robust link tokenization logic implemented.The implementation correctly handles inline Markdown links and images as atomic tokens, preventing them from being split during wrapping. The logic appropriately:
- Detects both link
[text](url)and imagesyntax- Uses balanced parentheses tracking to handle nested parentheses in URLs
- Falls back gracefully to single-character tokenization for incomplete patterns
This addresses the PR objective effectively. Note that reference-style links
[text][ref]and autolinks<url>are not handled, but these are separate patterns that could be added in future iterations if needed.tests/wrap.rs (1)
428-444: Effective test for link preservation implemented.The test correctly validates that Markdown links remain unbroken during paragraph wrapping. The realistic example and specific assertion provide good coverage for the new tokenization functionality and establish regression protection.
Summary
Testing
make fmtmake lintmake testmake markdownlintmake nixiehttps://chatgpt.com/codex/tasks/task_e_687b82f6823c83229077f87bc83e5260
Summary by Sourcery
Improve paragraph wrapping by treating Markdown links as atomic tokens and ensure proper link wrapping behavior, while updating README code fences with language annotations and reflowing long lines.
Enhancements:
Documentation:
Tests: