Conversation
Reviewer's GuideThis PR adds full support for parsing Class diagram for new and updated AST types for function parsingclassDiagram
class Root {
+functions() Vec~Function~
}
class Function {
+syntax() SyntaxNode
+name() Option~String~
+is_extern() bool
}
Root --> "*" Function : functions()
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Summary by CodeRabbit
WalkthroughThe parser has been enhanced to recognise and process function declarations and definitions, including extern functions, within its parsing framework. New parsers and span collectors for functions have been introduced, the CST builder updated, and typed AST wrappers extended to represent functions. Corresponding tests for function parsing have been added. Changes
Sequence Diagram(s)sequenceDiagram
participant Source as Source Code
participant Lexer as Lexer
participant Parser as Parser
participant CST as CST Builder
participant AST as AST Wrapper
Source->>Lexer: Tokenise input
Lexer->>Parser: Provide tokens
Parser->>Parser: Collect spans (import, typedef, relation, index, function)
Parser->>Parser: Parse functions (params, body, return type)
Parser->>CST: Build CST with function nodes
CST->>AST: Wrap CST nodes as typed AST (Function struct)
AST->>AST: Provide access to function info (name, is_extern, etc.)
Poem
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Clippy (1.86.0)warning: failed to write cache, path: /usr/local/registry/index/index.crates.io-1949cf8c6b5b557f/.cache/ch/um/chumsky, error: Permission denied (os error 13) Caused by: ✨ Finishing Touches
🧪 Generate Unit Tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Hey @leynos - I've reviewed your changes - here's some feedback:
- The Cell-based
function_paramsandfunction_bodyparsers share nearly identical nesting logic—consider extracting a generic balanced-delimiter parser to reduce duplication. - The
parse_into_spanusage and span/error handling incollect_function_spansis duplicated for extern vs normal functions—factoring out the common logic would simplify that function. - Since
parse_tokensnow returns six separate vectors, consider defining a dedicated struct for its return value to improve readability and prevent ordering mistakes.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The Cell-based `function_params` and `function_body` parsers share nearly identical nesting logic—consider extracting a generic balanced-delimiter parser to reduce duplication.
- The `parse_into_span` usage and span/error handling in `collect_function_spans` is duplicated for extern vs normal functions—factoring out the common logic would simplify that function.
- Since `parse_tokens` now returns six separate vectors, consider defining a dedicated struct for its return value to improve readability and prevent ordering mistakes.
## Individual Comments
### Comment 1
<location> `src/parser/mod.rs:673` </location>
<code_context>
+) -> (Vec<Span>, Vec<Simple<SyntaxKind>>) {
+ type State<'a> = SpanCollector<'a, Vec<Simple<SyntaxKind>>>;
+
+ fn parse_into_span(
+ st: &mut State<'_>,
+ parser: impl Parser<SyntaxKind, Span, Error = Simple<SyntaxKind>>,
+ start: usize,
+ ) {
+ let iter = st.stream.tokens().iter().skip(st.stream.cursor()).cloned();
+ let sub = Stream::from_iter(start..st.stream.src().len(), iter);
+ let (res, err) = parser.parse_recovery(sub);
+ if let Some(sp) = res {
+ let end = sp.end;
+ st.spans.push(sp);
</code_context>
<issue_to_address>
parse_into_span() skips to the end of the line on error, which may skip too much in multi-line function declarations.
This approach may cause the parser to miss valid code after a multi-line function declaration. Consider a recovery strategy that skips to the end of the function span instead.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
} else {
st.extra.extend(err);
let end = st.stream.line_end(st.stream.cursor());
st.stream.skip_until(end);
}
=======
} else {
st.extra.extend(err);
// Try to find the end of the function (matching closing brace)
let mut depth = 0;
let mut found_end = false;
let mut idx = st.stream.cursor();
let tokens = st.stream.tokens();
while idx < tokens.len() {
match tokens[idx].0 {
SyntaxKind::LBrace => depth += 1,
SyntaxKind::RBrace => {
if depth == 0 {
// Found a closing brace without opening, treat as end
found_end = true;
break;
} else {
depth -= 1;
if depth == 0 {
found_end = true;
break;
}
}
}
_ => {}
}
idx += 1;
}
if found_end {
let end = tokens[idx].1.end;
st.stream.skip_until(end);
} else {
// Fallback: skip to end of line
let end = st.stream.line_end(st.stream.cursor());
st.stream.skip_until(end);
}
}
>>>>>>> REPLACE
</suggested_fix>
### Comment 2
<location> `tests/parser.rs:114` </location>
<code_context>
" index Idx_User_ws \t on\n User (\n username ) "
}
+#[fixture]
+fn extern_function() -> &'static str {
+ "extern function hash(data: string): u64"
+}
</code_context>
<issue_to_address>
Consider adding tests for malformed or invalid function declarations.
Adding such tests will help verify that the parser properly rejects invalid syntax and reports errors as expected.
Suggested implementation:
```rust
#[fixture]
fn function_with_body() -> &'static str {
"function to_uppercase(s: string): string { }"
}
#[fixture]
fn malformed_extern_function_missing_parens() -> &'static str {
"extern function hash data: string): u64"
}
#[fixture]
fn malformed_extern_function_missing_colon() -> &'static str {
"extern function hash(data string) u64"
}
#[fixture]
fn malformed_extern_function_no_type() -> &'static str {
"extern function hash(data):"
}
#[test]
fn test_malformed_extern_function_missing_parens() {
let input = malformed_extern_function_missing_parens();
let result = parse_function_declaration(input);
assert!(result.is_err(), "Parser should error on missing parentheses in extern function declaration");
}
#[test]
fn test_malformed_extern_function_missing_colon() {
let input = malformed_extern_function_missing_colon();
let result = parse_function_declaration(input);
assert!(result.is_err(), "Parser should error on missing colon in extern function declaration");
}
#[test]
fn test_malformed_extern_function_no_type() {
let input = malformed_extern_function_no_type();
let result = parse_function_declaration(input);
assert!(result.is_err(), "Parser should error on missing return type in extern function declaration");
}
```
- Ensure that the function `parse_function_declaration` is available in scope and is the correct function to parse function declarations. If your parser uses a different function or module, adjust the test calls accordingly.
- If you use a test framework other than the standard Rust test harness, you may need to adapt the test attribute or structure.
</issue_to_address>
### Comment 3
<location> `tests/parser.rs:131` </location>
<code_context>
+
/// Verifies that parsing and pretty-printing preserves the original input text
/// and produces the expected root node kind.
#[rstest]
</code_context>
<issue_to_address>
Add tests for functions with complex parameter lists and nested types.
Add tests covering functions with multiple parameters, complex or nested types, and edge cases such as empty parameter lists or unusual whitespace to ensure comprehensive parser coverage.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| } else { | ||
| st.extra.extend(err); | ||
| let end = st.stream.line_end(st.stream.cursor()); | ||
| st.stream.skip_until(end); | ||
| } |
There was a problem hiding this comment.
suggestion (bug_risk): parse_into_span() skips to the end of the line on error, which may skip too much in multi-line function declarations.
This approach may cause the parser to miss valid code after a multi-line function declaration. Consider a recovery strategy that skips to the end of the function span instead.
| } else { | |
| st.extra.extend(err); | |
| let end = st.stream.line_end(st.stream.cursor()); | |
| st.stream.skip_until(end); | |
| } | |
| } else { | |
| st.extra.extend(err); | |
| // Try to find the end of the function (matching closing brace) | |
| let mut depth = 0; | |
| let mut found_end = false; | |
| let mut idx = st.stream.cursor(); | |
| let tokens = st.stream.tokens(); | |
| while idx < tokens.len() { | |
| match tokens[idx].0 { | |
| SyntaxKind::LBrace => depth += 1, | |
| SyntaxKind::RBrace => { | |
| if depth == 0 { | |
| // Found a closing brace without opening, treat as end | |
| found_end = true; | |
| break; | |
| } else { | |
| depth -= 1; | |
| if depth == 0 { | |
| found_end = true; | |
| break; | |
| } | |
| } | |
| } | |
| _ => {} | |
| } | |
| idx += 1; | |
| } | |
| if found_end { | |
| let end = tokens[idx].1.end; | |
| st.stream.skip_until(end); | |
| } else { | |
| // Fallback: skip to end of line | |
| let end = st.stream.line_end(st.stream.cursor()); | |
| st.stream.skip_until(end); | |
| } | |
| } |
| #[fixture] | ||
| fn extern_function() -> &'static str { |
There was a problem hiding this comment.
suggestion (testing): Consider adding tests for malformed or invalid function declarations.
Adding such tests will help verify that the parser properly rejects invalid syntax and reports errors as expected.
Suggested implementation:
#[fixture]
fn function_with_body() -> &'static str {
"function to_uppercase(s: string): string { }"
}
#[fixture]
fn malformed_extern_function_missing_parens() -> &'static str {
"extern function hash data: string): u64"
}
#[fixture]
fn malformed_extern_function_missing_colon() -> &'static str {
"extern function hash(data string) u64"
}
#[fixture]
fn malformed_extern_function_no_type() -> &'static str {
"extern function hash(data):"
}
#[test]
fn test_malformed_extern_function_missing_parens() {
let input = malformed_extern_function_missing_parens();
let result = parse_function_declaration(input);
assert!(result.is_err(), "Parser should error on missing parentheses in extern function declaration");
}
#[test]
fn test_malformed_extern_function_missing_colon() {
let input = malformed_extern_function_missing_colon();
let result = parse_function_declaration(input);
assert!(result.is_err(), "Parser should error on missing colon in extern function declaration");
}
#[test]
fn test_malformed_extern_function_no_type() {
let input = malformed_extern_function_no_type();
let result = parse_function_declaration(input);
assert!(result.is_err(), "Parser should error on missing return type in extern function declaration");
}- Ensure that the function
parse_function_declarationis available in scope and is the correct function to parse function declarations. If your parser uses a different function or module, adjust the test calls accordingly. - If you use a test framework other than the standard Rust test harness, you may need to adapt the test attribute or structure.
| assert_eq!(idx.relation(), Some("User".into())); | ||
| assert_eq!(idx.columns(), vec![String::from("username")]); | ||
| } | ||
|
|
||
| #[rstest] | ||
| fn extern_function_parsed(extern_function: &str) { |
There was a problem hiding this comment.
suggestion (testing): Add tests for functions with complex parameter lists and nested types.
Add tests covering functions with multiple parameters, complex or nested types, and edge cases such as empty parameter lists or unusual whitespace to ensure comprehensive parser coverage.
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
src/parser/mod.rs(10 hunks)tests/parser.rs(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.rs`: * Seek to keep the cyclomatic complexity of functions no more than 12...
**/*.rs: * Seek to keep the cyclomatic complexity of functions no more than 12.
Adhere to single responsibility and CQRS
Place function attributes after doc comments.
Do not use
returnin single-line functions.Move conditionals with >2 branches into a predicate function.
Avoid
unsafeunless absolutely necessary.Every module must begin with a
//!doc comment that explains the module's purpose and utility.Comments must use en-GB-oxendict spelling and grammar.
Lints must not be silenced except as a last resort.
#[allow]is forbidden.- Only narrowly scoped
#[expect(lint, reason = "...")]is allowed.- No lint groups, no blanket or file-wide suppression.
- Include
FIXME:with link if a fix is expected.Use
rstestfixtures for shared setup and to avoid repetition between tests.Replace duplicated tests with
#[rstest(...)]parameterised cases.Prefer
mockallfor mocks/stubs.Prefer
.expect()over.unwrap()Ensure that any API or behavioural changes are reflected in the documentation in
docs/Ensure that any completed roadmap steps are recorded in the appropriate roadmap in
docs/
⚙️ Source: CodeRabbit Configuration File
List of files the instruction was applied to:
tests/parser.rssrc/parser/mod.rs
🧬 Code Graph Analysis (2)
tests/parser.rs (1)
src/parser/mod.rs (8)
parse(110-131)errors(99-101)name(1056-1062)name(1133-1144)name(1322-1333)name(1422-1433)is_extern(1066-1070)is_extern(1437-1441)
src/parser/mod.rs (3)
src/parser/token_stream.rs (3)
tokens(105-107)src(118-120)new(41-47)src/parser/span_collector.rs (1)
new(44-50)src/language.rs (1)
kind_to_raw(185-190)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: build-test
🔇 Additional comments (2)
src/parser/mod.rs (1)
1407-1442: LGTM!The
FunctionAST implementation follows the established pattern and provides appropriate methods for accessing function properties.tests/parser.rs (1)
114-571: Well-structured tests for function parsingThe test fixtures and corresponding test functions provide comprehensive coverage of the new function parsing capabilities. The tests follow the established patterns and verify all essential properties.
| } | ||
| SyntaxKind::T_RPAREN => { | ||
| if depth.get() == 0 { | ||
| Err(Simple::custom(span, "unexpected ')'")) |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Clarify the error usage in the parser logic
The error message "unexpected ')'" might be confusing since the closing parenthesis is actually expected to terminate the parameter list. Consider adding a comment to explain that this error is used as a signal to stop parsing rather than indicating an actual syntax error.
SyntaxKind::T_RPAREN => {
if depth.get() == 0 {
+ // This error signals the end of the parameter list, not an actual syntax error
Err(Simple::custom(span, "unexpected ')'"))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Err(Simple::custom(span, "unexpected ')'")) | |
| SyntaxKind::T_RPAREN => { | |
| if depth.get() == 0 { | |
| // This error signals the end of the parameter list, not an actual syntax error | |
| Err(Simple::custom(span, "unexpected ')'")) |
🤖 Prompt for AI Agents
In src/parser/mod.rs at line 276, the error message "unexpected ')'" is used to
signal the end of parameter list parsing rather than an actual syntax error. Add
a clarifying comment above this line explaining that this error is intentionally
used as a control flow mechanism to stop parsing parameters, so readers
understand its purpose and it is not a real parsing error.
| let mut idx = st.stream.cursor() + 1; | ||
| while let Some((kind, sp)) = st.stream.tokens().get(idx) { | ||
| if matches!(kind, SyntaxKind::T_WHITESPACE | SyntaxKind::T_COMMENT) | ||
| && !st | ||
| .stream | ||
| .src() | ||
| .get(sp.clone()) | ||
| .is_some_and(|t| t.contains('\n')) | ||
| { | ||
| idx += 1; | ||
| continue; | ||
| } | ||
| break; | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Extract lookahead logic into a reusable helper method
The token lookahead logic could be extracted into a helper method to improve reusability and adhere to DRY principles. This pattern might be useful elsewhere in the parser.
Consider extracting this into a helper method on TokenStream:
+ /// Look ahead to find the next non-whitespace/comment token without advancing the cursor
+ fn peek_next_significant(&self, from_idx: usize) -> Option<&(SyntaxKind, Span)> {
+ let mut idx = from_idx;
+ while let Some((kind, sp)) = self.tokens.get(idx) {
+ if matches!(kind, SyntaxKind::T_WHITESPACE | SyntaxKind::T_COMMENT)
+ && !self.src.get(sp.clone()).is_some_and(|t| t.contains('\n'))
+ {
+ idx += 1;
+ continue;
+ }
+ return Some((kind, sp));
+ }
+ None
+ }Then use it in handle_extern:
- let mut idx = st.stream.cursor() + 1;
- while let Some((kind, sp)) = st.stream.tokens().get(idx) {
- if matches!(kind, SyntaxKind::T_WHITESPACE | SyntaxKind::T_COMMENT)
- && !st
- .stream
- .src()
- .get(sp.clone())
- .is_some_and(|t| t.contains('\n'))
- {
- idx += 1;
- continue;
- }
- break;
- }
-
- let is_func = st
- .stream
- .tokens()
- .get(idx)
- .is_some_and(|(kind, _)| *kind == SyntaxKind::K_FUNCTION);
+ let next_token = st.stream.peek_next_significant(st.stream.cursor() + 1);
+ let is_func = next_token.is_some_and(|(kind, _)| *kind == SyntaxKind::K_FUNCTION);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let mut idx = st.stream.cursor() + 1; | |
| while let Some((kind, sp)) = st.stream.tokens().get(idx) { | |
| if matches!(kind, SyntaxKind::T_WHITESPACE | SyntaxKind::T_COMMENT) | |
| && !st | |
| .stream | |
| .src() | |
| .get(sp.clone()) | |
| .is_some_and(|t| t.contains('\n')) | |
| { | |
| idx += 1; | |
| continue; | |
| } | |
| break; | |
| } | |
| let next_token = st.stream.peek_next_significant(st.stream.cursor() + 1); | |
| let is_func = next_token.is_some_and(|(kind, _)| *kind == SyntaxKind::K_FUNCTION); |
🤖 Prompt for AI Agents
In src/parser/mod.rs around lines 693 to 706, the token lookahead logic is
directly implemented in the function, which reduces reusability and violates DRY
principles. Extract this logic into a helper method on the TokenStream struct
that performs the lookahead while skipping whitespace and comments without
newlines. Then replace the existing inline code in handle_extern with a call to
this new helper method to improve code clarity and reuse.
Summary
functiondefinitionsextern functiondeclarationsTesting
make lintmake testhttps://chatgpt.com/codex/tasks/task_e_68658f3302608322bc4aa1b58a621378
Summary by Sourcery
Add support for parsing Datalog-style functions, including
extern functiondeclarations and regular function definitions, and expose them in the CST/AST.New Features:
extern functiondeclarations andfunctiondefinitions during token parsing.FunctionAST node withname()andis_extern()methods and aRoot::functions()accessor.Enhancements:
parse_tokensandbuild_green_treeto handle function spans alongside imports, typedefs, relations, and indexes.Tests: