Skip to content

Conversation

@2010YOUY01
Copy link
Contributor

@2010YOUY01 2010YOUY01 commented Jun 1, 2023

Which issue does this PR close?

Part of #6396

Rationale for this change

When a wrong function name is given (possibly by typo, or missing underscore), suggest a similar valid function name:

DataFusion CLI v25.0.0
❯ select arrowtypeof();
Error during planning: Invalid function 'arrowtypeof'.
Did you mean 'arrow_typeof'?

What changes are included in this PR?

When binding a function, parser only knows is this function a window function or non-window function.
If it's non-window function, suggest the most similar function name from all BuiltinScalarFunction and AggreagteFunction
If it's window function, suggest the most similar function name from all AggregateFunction and BuiltinWindowFunction
Edit distance is used for the metric of similarity.

Are these changes tested?

end-to-end sqllogictests are added.

Are there any user-facing changes?

No

@github-actions github-actions bot added core Core DataFusion crate logical-expr Logical plan and expressions sql SQL Planner sqllogictest SQL Logic Tests (.slt) labels Jun 1, 2023
Expr::WindowFunction(expr::WindowFunction::new(
WindowFunction::AggregateFunction(aggregate_fun),
args,
if let Ok(fun) = self.find_window_func(&name) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff display on GitHub is not clear, it's just a simple refactor
Before:

if let Some(WindowType::WindowSpec(window)) = function.over.take() {
		/* 1 */
		let fun = self.find_window_func(&name)?;
		/* 2 */
		return ...;
}

/* 3 */

// Return error
Err(...)

After:

if let Some(WindowType::WindowSpec(window)) = function.over.take() {
		/* 1 */
		if let Ok(fun) = self.find_window_func(&name) {
				/* 2 */
				return ...;
		}
} else {
		/* 3 */
}

// Return error
Err(...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whitespace blind diff also shows this nicely: https://github.com/apache/arrow-datafusion/pull/6520/files?w=1

@alamb
Copy link
Contributor

alamb commented Jun 1, 2023

This looks awesome -- thank you @2010YOUY01 -- I plan to review this carefully tomorrow

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this out locally and it was sooo cool!

DataFusion CLI v25.0.0select fooo('s');
Error during planning: Invalid function 'fooo'.
Did you mean 'floor'?

Really neat -- thank you @2010YOUY01

The only question I have is about the new dependency (but it seems small and perhaps we could simply inline the implementation if we are worried about the implications of using it)

datafusion-common = { path = "../common", version = "25.0.0" }
lazy_static = { version = "^1.4.0" }
sqlparser = "0.34"
strsim = "0.10.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed this crate and it is widely used in the ecosystem (used by clap): https://crates.io/crates/strsim but hasn't been updated for 3 years

However, it does appear to be a new dependency for datafusion,

An alternate if we are worried about this new dependency is we could inline the definition of levenshtein into datafusion as it is not large

https://github.com/dguo/strsim-rs/blob/65eac453cbd10ba4e13273002c843e95c81ae93f/src/lib.rs#L192-L238

use datafusion_common::{DataFusionError, Result};
use std::sync::Arc;
use std::{fmt, str::FromStr};
use strum_macros::EnumIter;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expr::WindowFunction(expr::WindowFunction::new(
WindowFunction::AggregateFunction(aggregate_fun),
args,
if let Ok(fun) = self.find_window_func(&name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whitespace blind diff also shows this nicely: https://github.com/apache/arrow-datafusion/pull/6520/files?w=1

@alamb
Copy link
Contributor

alamb commented Jun 2, 2023

cc @comphead / @mingmwang / @mustafasrepo / @ozankabak if you have any thoughts on the new dependency?

@alamb alamb changed the title Add function name suggestion. Improve error messages with function name suggestion. Jun 2, 2023
@ozankabak
Copy link
Contributor

If we are only using Levenshtein, inlining sounds good to me as it is only a small piece of code. I suggest using the full package only if we can see use cases where we'd use it more extensively in the future.

@comphead
Copy link
Contributor

comphead commented Jun 2, 2023

If we are only using Levenshtein, inlining sounds good to me as it is only a small piece of code. I suggest using the full package only if we can see use cases where we'd use it more extensively in the future.

Agree. the entire idea is neat to give a suggestion, however stale packages brings up not only future dependency conflicts but also unresolved security vulnerabilities. Perhaps we can replicate it from https://github.com/wooorm/levenshtein-rs ?

@2010YOUY01
Copy link
Contributor Author

Thank you all for the feedback! Makes sense, the strsim external dependency is replaced with inlining.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thank you @2010YOUY01

///
/// ```
/// use strsim::levenshtein;
/// use datafusion_common::utils::datafusion_strsim::levenshtein;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


/// Adopted from strsim-rs for string similarity metrics
pub mod datafusion_strsim {
// Source: https://github.com/dguo/strsim-rs/blob/master/src/lib.rs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 thank you for the link

@alamb alamb merged commit a7970eb into apache:main Jun 5, 2023
@alamb
Copy link
Contributor

alamb commented Jun 5, 2023

Thanks again @2010YOUY01 and @ozankabak

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate logical-expr Logical plan and expressions sql SQL Planner sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants