Conversation
Reviewer's GuideThis PR enriches the wrap::tokenize module by adding detailed Rustdoc comments with usage examples to its scanning and tokenization functions, updates the high-level tokenize_markdown documentation, and re-exports tokenize_markdown from the wrap module for external use. File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
Warning Rate limit exceeded@leynos has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 11 minutes and 23 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (2)
Summary by CodeRabbit
WalkthroughExpose the Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Possibly related PRs
Poem
✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Hey @leynos - I've reviewed your changes and found some issues that need to be addressed.
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `src/wrap/tokenize.rs:120` </location>
<code_context>
+/// # Examples
+///
+/// ```rust,ignore
+/// let tokens = segment_inline("see [link](url) and `code`");
+/// assert_eq!(
+/// tokens,
</code_context>
<issue_to_address>
The example for `segment_inline` omits edge cases like consecutive spaces.
Add an example with consecutive or unusual whitespace to demonstrate the function's handling of these cases.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
///
/// ```rust,ignore
/// let tokens = segment_inline("see [link](url) and `code`");
/// assert_eq!(
/// tokens,
/// vec", " ", "and", " ", "`code`"]
/// );
/// ```
=======
///
/// ```rust,ignore
/// let tokens = segment_inline("see [link](url) and `code`");
/// assert_eq!(
/// tokens,
/// vec", " ", "and", " ", "`code`"]
/// );
///
/// // Example with consecutive and unusual whitespace
/// let tokens = segment_inline("foo bar\tbaz `qux`");
/// assert_eq!(
/// tokens,
/// vec!["foo", " ", "bar", "\t", "baz", " ", "`qux`"]
/// );
/// ```
>>>>>>> REPLACE
</suggested_fix>
### Comment 2
<location> `src/wrap/tokenize.rs:186` </location>
<code_context>
+/// # Examples
+///
+/// ```rust,ignore
+/// tokenize_inline("run `cmd`", &mut |t| println!("{:?}", t));
+/// ```
fn tokenize_inline<'a, F>(text: &'a str, emit: &mut F)
</code_context>
<issue_to_address>
The example for `tokenize_inline` does not show the output format.
Consider including the expected output or a description of the callback's input to clarify the example.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
///
/// ```rust,ignore
/// tokenize_inline("run `cmd`", &mut |t| println!("{:?}", t));
/// ```
=======
///
/// ```rust
/// // Prints:
/// // Token::Text("run ")
/// // Token::Code("cmd")
/// tokenize_inline("run `cmd`", &mut |t| println!("{:?}", t));
/// ```
///
/// // The callback receives each token as a `Token<'a>`:
/// // For example, Token::Text(&str) or Token::Code(&str)
>>>>>>> REPLACE
</suggested_fix>
### Comment 3
<location> `src/wrap/tokenize.rs:225` </location>
<code_context>
+/// returned as [`Token::Text`]. Whitespace is preserved exactly as it appears.
+///
+/// ```rust,no_run
+/// use mdtablefix::wrap::{Token, tokenize_markdown};
+///
+/// let tokens = tokenize_markdown("Example with `code`");
</code_context>
<issue_to_address>
The example for `tokenize_markdown` uses `no_run` but appears runnable.
If the example should be tested, remove `no_run`. Otherwise, add a comment explaining why it shouldn't be run.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
///
/// ```rust,no_run
/// use mdtablefix::wrap::{Token, tokenize_markdown};
///
=======
///
/// ```rust
/// use mdtablefix::wrap::{Token, tokenize_markdown};
///
>>>>>>> REPLACE
</suggested_fix>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Actionable comments posted: 4
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
src/wrap.rs(1 hunks)src/wrap/tokenize.rs(6 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit Inference Engine (AGENTS.md)
**/*.rs: Clippy warnings MUST be disallowed.
Fix any warnings emitted during tests in the code itself rather than silencing them.
Where a function is too long, extract meaningfully named helper functions adhering to separation of concerns and CQRS.
Where a function has too many parameters, group related parameters in meaningfully named structs.
Where a function is returning a large error consider usingArcto reduce the amount of data returned.
Write unit and behavioural tests for new functionality. Run both before and after making any change.
Every module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs using Rustdoc comments (///) so documentation can be generated with cargo doc.
Prefer immutable data and avoid unnecessarymutbindings.
Handle errors with theResulttype instead of panicking where feasible.
Avoidunsafecode unless absolutely necessary and document any usage clearly.
Place function attributes after doc comments.
Do not usereturnin single-line functions.
Use predicate functions for conditional criteria with more than two branches.
Lints must not be silenced except as a last resort.
Lint rule suppressions must be tightly scoped and include a clear reason.
Preferexpectoverallow.
Prefer.expect()over.unwrap().
Useconcat!()to combine long string literals rather than escaping newlines with a backslash.
Prefer semantic error enums: Derivestd::error::Error(via thethiserrorcrate) for any condition the caller might inspect, retry, or map to an HTTP status.
Use an opaque error only at the app boundary: Useeyre::Reportfor human-readable logs; these should not be exposed in public APIs.
Never export the opaque type from a library: Convert to domain enums at API boundaries, and toeyreonly in the mainmain()entrypoint or top-level async task.
Files:
src/wrap.rssrc/wrap/tokenize.rs
⚙️ CodeRabbit Configuration File
**/*.rs: * Seek to keep the cyclomatic complexity of functions no more than 12.
Adhere to single responsibility and CQRS
Place function attributes after doc comments.
Do not use
returnin single-line functions.Move conditionals with >2 branches into a predicate function.
Avoid
unsafeunless absolutely necessary.Every module must begin with a
//!doc comment that explains the module's purpose and utility.Comments and docs must follow en-GB-oxendict (-ize / -our) spelling and grammar
Lints must not be silenced except as a last resort.
#[allow]is forbidden.- Only narrowly scoped
#[expect(lint, reason = "...")]is allowed.- No lint groups, no blanket or file-wide suppression.
- Include
FIXME:with link if a fix is expected.Use
rstestfixtures for shared setup and to avoid repetition between tests.Replace duplicated tests with
#[rstest(...)]parameterised cases.Prefer
mockallfor mocks/stubs.Prefer
.expect()over.unwrap()Ensure that any API or behavioural changes are reflected in the documentation in
docs/Ensure that any completed roadmap steps are recorded in the appropriate roadmap in
docs/Files must not exceed 400 lines in length
- Large modules must be decomposed
- Long match statements or dispatch tables should be decomposed by domain and collocated with targets
- Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
Files:
src/wrap.rssrc/wrap/tokenize.rs
🧬 Code Graph Analysis (1)
src/wrap.rs (1)
src/wrap/tokenize.rs (1)
tokenize_markdown(234-268)
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
src/wrap/tokenize.rs (3)
112-133: Whitespace edge-case example landed – thanks
Matches the earlier review request exactly. Nothing further.
185-202: Inline-token API example now shows output – good
This resolves the previous feedback.
228-246: Crate-relative path used – portable docs achieved
The example will now compile for downstream crates regardless of rename.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
src/html.rs(2 hunks)src/wrap.rs(3 hunks)src/wrap/tokenize.rs(5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit Inference Engine (AGENTS.md)
**/*.rs: Clippy warnings MUST be disallowed.
Fix any warnings emitted during tests in the code itself rather than silencing them.
Where a function is too long, extract meaningfully named helper functions adhering to separation of concerns and CQRS.
Where a function has too many parameters, group related parameters in meaningfully named structs.
Where a function is returning a large error consider usingArcto reduce the amount of data returned.
Write unit and behavioural tests for new functionality. Run both before and after making any change.
Every module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs using Rustdoc comments (///) so documentation can be generated with cargo doc.
Prefer immutable data and avoid unnecessarymutbindings.
Handle errors with theResulttype instead of panicking where feasible.
Avoidunsafecode unless absolutely necessary and document any usage clearly.
Place function attributes after doc comments.
Do not usereturnin single-line functions.
Use predicate functions for conditional criteria with more than two branches.
Lints must not be silenced except as a last resort.
Lint rule suppressions must be tightly scoped and include a clear reason.
Preferexpectoverallow.
Prefer.expect()over.unwrap().
Useconcat!()to combine long string literals rather than escaping newlines with a backslash.
Prefer semantic error enums: Derivestd::error::Error(via thethiserrorcrate) for any condition the caller might inspect, retry, or map to an HTTP status.
Use an opaque error only at the app boundary: Useeyre::Reportfor human-readable logs; these should not be exposed in public APIs.
Never export the opaque type from a library: Convert to domain enums at API boundaries, and toeyreonly in the mainmain()entrypoint or top-level async task.
Files:
src/html.rssrc/wrap.rssrc/wrap/tokenize.rs
⚙️ CodeRabbit Configuration File
**/*.rs: * Seek to keep the cyclomatic complexity of functions no more than 12.
Adhere to single responsibility and CQRS
Place function attributes after doc comments.
Do not use
returnin single-line functions.Move conditionals with >2 branches into a predicate function.
Avoid
unsafeunless absolutely necessary.Every module must begin with a
//!doc comment that explains the module's purpose and utility.Comments and docs must follow en-GB-oxendict (-ize / -our) spelling and grammar
Lints must not be silenced except as a last resort.
#[allow]is forbidden.- Only narrowly scoped
#[expect(lint, reason = "...")]is allowed.- No lint groups, no blanket or file-wide suppression.
- Include
FIXME:with link if a fix is expected.Use
rstestfixtures for shared setup and to avoid repetition between tests.Replace duplicated tests with
#[rstest(...)]parameterised cases.Prefer
mockallfor mocks/stubs.Prefer
.expect()over.unwrap()Ensure that any API or behavioural changes are reflected in the documentation in
docs/Ensure that any completed roadmap steps are recorded in the appropriate roadmap in
docs/Files must not exceed 400 lines in length
- Large modules must be decomposed
- Long match statements or dispatch tables should be decomposed by domain and collocated with targets
- Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
Files:
src/html.rssrc/wrap.rssrc/wrap/tokenize.rs
🔇 Additional comments (7)
src/html.rs (2)
117-124: Early-return branch reads better – good callMoving the
is_bold_tagtest out of the guard clarifies the intent and avoids a longif letwith a guard. No further action required.
86-90: Preserve single-expression form unless a statement block is neededThe new multi-line body is correct but adds two extra branches and indentation for no functional gain. Retain the original single expression for terseness and hot-path readability.
-fn is_table_cell(handle: &Handle) -> bool { - is_element(handle, "td") || is_element(handle, "th") -} +fn is_table_cell(handle: &Handle) -> bool { + is_element(handle, "td") || is_element(handle, "th") +}Likely an incorrect or invalid review comment.
src/wrap/tokenize.rs (4)
6-18: AcceptFnMutfor stateful predicates – 👍Switching the predicate bound to
FnMutunblocks closures that mutate captured state. The extramut condparameter is required and correct.
No issues detected.
28-36: Doc comment looks goodThe example compiles and showcases the helper succinctly.
58-66: Example clarifies nested-parenthesis handlingThe additional sample demonstrates the edge case the parser is built for. Nice.
93-109:'('is unlikely to be “trailing” punctuation – verify intent
segment_inlineonly scans after the link/image token. An opening parenthesis cannot appear after")"without whitespace (e.g."[link](url)("), so this extension may silently consume a character that belongs to the next token.- '.' | ',' | ';' | ':' | '!' | '?' | '(' | ')' | ']' | '"' | '\'' + '.' | ',' | ';' | ':' | '!' | '?' | ')' | ']' | '"' | '\''Confirm that you have a concrete input where
(follows the link and should be glued; otherwise drop it to avoid mis-tokenisation.src/wrap.rs (1)
201-204:doc(hidden)hidesis_fencefrom public API – approvedThis keeps the helper usable internally while keeping the surface tidy.
Removes an unused PrefixHandler abstraction and its associated regex captures import from the markdown wrapping logic. Simplifies the code while preserving all functionality. Also improves code clarity in the strong tag detection by using Rust's let-and pattern matching syntax.
Summary
wrap::tokenizetokenize_markdownfrom thewrapmoduleTesting
make fmtmake lintmake testhttps://chatgpt.com/codex/tasks/task_e_688bea9258d8832289b1f52cdd5701ac
Summary by Sourcery
Add thorough documentation and examples for markdown tokenizer utilities and re-export tokenize_markdown from the wrap module
New Features:
Documentation: