-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Improve error messages with function name suggestion. #6520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Expr::WindowFunction(expr::WindowFunction::new( | ||
| WindowFunction::AggregateFunction(aggregate_fun), | ||
| args, | ||
| if let Ok(fun) = self.find_window_func(&name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The diff display on GitHub is not clear, it's just a simple refactor
Before:
if let Some(WindowType::WindowSpec(window)) = function.over.take() {
/* 1 */
let fun = self.find_window_func(&name)?;
/* 2 */
return ...;
}
/* 3 */
// Return error
Err(...)After:
if let Some(WindowType::WindowSpec(window)) = function.over.take() {
/* 1 */
if let Ok(fun) = self.find_window_func(&name) {
/* 2 */
return ...;
}
} else {
/* 3 */
}
// Return error
Err(...)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whitespace blind diff also shows this nicely: https://github.com/apache/arrow-datafusion/pull/6520/files?w=1
|
This looks awesome -- thank you @2010YOUY01 -- I plan to review this carefully tomorrow |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this out locally and it was sooo cool!
DataFusion CLI v25.0.0
❯ select fooo('s');
Error during planning: Invalid function 'fooo'.
Did you mean 'floor'?Really neat -- thank you @2010YOUY01
The only question I have is about the new dependency (but it seems small and perhaps we could simply inline the implementation if we are worried about the implications of using it)
datafusion/expr/Cargo.toml
Outdated
| datafusion-common = { path = "../common", version = "25.0.0" } | ||
| lazy_static = { version = "^1.4.0" } | ||
| sqlparser = "0.34" | ||
| strsim = "0.10.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed this crate and it is widely used in the ecosystem (used by clap): https://crates.io/crates/strsim but hasn't been updated for 3 years
However, it does appear to be a new dependency for datafusion,
An alternate if we are worried about this new dependency is we could inline the definition of levenshtein into datafusion as it is not large
| use datafusion_common::{DataFusionError, Result}; | ||
| use std::sync::Arc; | ||
| use std::{fmt, str::FromStr}; | ||
| use strum_macros::EnumIter; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Expr::WindowFunction(expr::WindowFunction::new( | ||
| WindowFunction::AggregateFunction(aggregate_fun), | ||
| args, | ||
| if let Ok(fun) = self.find_window_func(&name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whitespace blind diff also shows this nicely: https://github.com/apache/arrow-datafusion/pull/6520/files?w=1
|
cc @comphead / @mingmwang / @mustafasrepo / @ozankabak if you have any thoughts on the new dependency? |
|
If we are only using Levenshtein, inlining sounds good to me as it is only a small piece of code. I suggest using the full package only if we can see use cases where we'd use it more extensively in the future. |
Agree. the entire idea is neat to give a suggestion, however stale packages brings up not only future dependency conflicts but also unresolved security vulnerabilities. Perhaps we can replicate it from https://github.com/wooorm/levenshtein-rs ? |
|
Thank you all for the feedback! Makes sense, the |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me -- thank you @2010YOUY01
| /// | ||
| /// ``` | ||
| /// use strsim::levenshtein; | ||
| /// use datafusion_common::utils::datafusion_strsim::levenshtein; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
|
||
| /// Adopted from strsim-rs for string similarity metrics | ||
| pub mod datafusion_strsim { | ||
| // Source: https://github.com/dguo/strsim-rs/blob/master/src/lib.rs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 thank you for the link
|
Thanks again @2010YOUY01 and @ozankabak |
Which issue does this PR close?
Part of #6396
Rationale for this change
When a wrong function name is given (possibly by typo, or missing underscore), suggest a similar valid function name:
What changes are included in this PR?
When binding a function, parser only knows is this function a window function or non-window function.
If it's non-window function, suggest the most similar function name from all
BuiltinScalarFunctionandAggreagteFunctionIf it's window function, suggest the most similar function name from all
AggregateFunctionandBuiltinWindowFunctionEdit distance is used for the metric of similarity.
Are these changes tested?
end-to-end sqllogictests are added.
Are there any user-facing changes?
No