Conversation
Reviewer's GuideFix inline code parsing to treat empty and unmatched backtick delimiters as plain text and add a regression test for unmatched code spans. Class diagram for updated Tokenizer logic in tokenize.rsclassDiagram
class Token {
<<enum>>
+Text(&str)
+Code(&str)
}
class Tokenizer {
+next_token(s: &str) -> Option<(Token, usize)>
}
Tokenizer --> Token
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Summary by CodeRabbit
WalkthroughRefine the tokenisation logic in Changes
Sequence Diagram(s)sequenceDiagram
participant Input as Input String
participant Tokenizer as next_token
participant Output as Token Stream
Input->>Tokenizer: Provide markdown string
Tokenizer->>Tokenizer: Count leading backticks (delim_len)
alt delim_len == 0
Tokenizer->>Tokenizer: Search for next backtick
alt Backtick found
Tokenizer->>Output: Emit Token::Text up to backtick
else No backtick found
Tokenizer->>Output: Emit Token::Text (whole string)
end
else delim_len > 0
Tokenizer->>Tokenizer: Search for matching closing delimiter of same length
alt Closing delimiter found
Tokenizer->>Output: Emit Token::Code with enclosed content
else No closing delimiter
Tokenizer->>Output: Emit Token::Text (opening delimiter only)
end
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~7 minutes Possibly related PRs
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (1)**/*.rs📄 CodeRabbit Inference Engine (AGENTS.md)
Files:
⚙️ CodeRabbit Configuration File
Files:
🔇 Additional comments (2)
✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Hey @leynos - I've reviewed your changes and found some issues that need to be addressed.
- The delim_len == 0 branch now slices &s[..1], which can panic on multi-byte UTF-8 chars; use the first char’s actual byte length instead of hard-coding 1.
- Rather than emitting a single-byte Text token for non-backtick characters, consider consuming up to the next backtick so you don’t fragment text into one-character tokens.
- You might also add a regression test for multiple unmatched backticks (e.g. "``bad code") to verify the new fallback behavior handles that correctly.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The delim_len == 0 branch now slices &s[..1], which can panic on multi-byte UTF-8 chars; use the first char’s actual byte length instead of hard-coding 1.
- Rather than emitting a single-byte Text token for non-backtick characters, consider consuming up to the next backtick so you don’t fragment text into one-character tokens.
- You might also add a regression test for multiple unmatched backticks (e.g. "``bad code") to verify the new fallback behavior handles that correctly.
## Individual Comments
### Comment 1
<location> `src/wrap/tokenize.rs:132` </location>
<code_context>
let delim_len = s.chars().take_while(|&c| c == '`').count();
if delim_len == 0 {
- return Some((Token::Text(s), s.len()));
+ return Some((Token::Text(&s[..1]), 1));
}
let closing = &s[..delim_len];
</code_context>
<issue_to_address>
Returning only the first character as Token::Text may cause unexpected tokenization.
Returning only the first character may result in single-character tokens, which could be inefficient and disrupt consumers expecting larger text segments. Please confirm if this behavior is intentional or consider returning the full string as before.
</issue_to_address>
### Comment 2
<location> `src/wrap/tokenize.rs:139` </location>
<code_context>
let code = &s[delim_len..delim_len + end];
return Some((Token::Code(code), delim_len + end + delim_len));
}
+ return Some((Token::Text(&s[..delim_len]), delim_len));
}
Some((Token::Text(s), s.len()))
</code_context>
<issue_to_address>
Returning Token::Text for unmatched opening backticks may not align with expected Markdown parsing.
Verify that returning only the opening backticks as Token::Text is intentional, as this may differ from standard Markdown handling. Also, consider if this could cause issues with consecutive unmatched backticks.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Summary
Testing
RUSTUP_TOOLCHAIN=nightly-2025-07-22 make fmtRUSTUP_TOOLCHAIN=nightly-2025-07-22 make lintRUSTUP_TOOLCHAIN=nightly-2025-07-22 make testRUSTUP_TOOLCHAIN=nightly-2025-07-22 make nixie(fails: too many arguments)https://chatgpt.com/codex/tasks/task_e_688d32f6a54483229b3f0e212a78abb2
Summary by Sourcery
Treat unmatched or empty backtick delimiters as plain text when tokenizing markdown and add a regression test to cover this scenario
Bug Fixes:
Tests: