Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented May 26, 2023

Which issue does this PR close?

related to #6448

Rationale for this change

While reviewing #6448 from @2010YOUY01 I was somewhat worried about the performance impact of doing linear searches for function names.

What changes are included in this PR?

  1. Build HashMaps from function name table and use them in lookups
  2. panic if there is no corresponding entry for displaying a BuiltInFunction

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the logical-expr Logical plan and expressions label May 26, 2023
@alamb alamb changed the title Improve BuiltInFunctionLookup Improve BuiltInFunction Name Lookup May 26, 2023
write!(f, "{}", func_name)
} else {
// Should not be reached
panic!("Internal error: {self:?} not in ALL_FUNCTIONS list");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything is great and needed, one nit, do we need to panic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This panic will occur if someone adds a new BuiltInFunction and doesn't add it to the built in list.

I can remove the panic but then if someone doesn't add the function to the new LIST I was worried the error would be masked (the wrong function name would be used or the lookup might fail). But maybe that is preferable 🤔

Copy link
Contributor

@comphead comphead May 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking wouldn't be that better to implement a function name

fn name(func: BuiltinScalarFunction) -> Result<String> {
 match func {
   BuiltinScalarFunction::Abs => "abs"
   ....
 }
}

This will ensure on compile time the newly added function gets its string alias

Then we can build the hashmaps the same way over iterating BuiltinScalarFunction enum

    for func in BuiltinScalarFunction::iter() {
        let name = name(func);
        hashmapLookup.insert(name, func);
        hashmapRevLookup.insert(func, name);
    }

I can play with this and create another PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can play with this and create another PR

Thank you 🙏

Note one gotcha you might hit is that the mapping is one-> many

So you might need something like

/// returns all the names that can be used for this function
fn names(func: BuiltinScalarFunction) -> &[&'str]] {
 match func {
   BuiltinScalarFunction::Abs => "abs"
   ....
 }
}

@alamb alamb marked this pull request as draft May 26, 2023 17:26
@2010YOUY01
Copy link
Contributor

I forget to mention in the original PR #6448 : the rationale behind storing the mapping (ALL_FUNCTIONS) separately is for future implementation of 'Did you mean': select powr() -> Error: Did you mean power()?, otherwise copy and paste the mapping in FromStr into Display would be okay.

Also I thought iterating through a small vector several times is not likely to be the performance issue for an analytical system, so the original implementation is trying to make the code more maintainable and avoid over-engineering.
But this speedup implementation is very effective and clean, thank you!

@comphead
Copy link
Contributor

comphead commented May 27, 2023

@alamb lazy_static! limitations brings some complexity in the code, lets merge this PR as is, I'll try to make it better, but dont want to block this PR

UPD: Thanks for waiting. PR #6479

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

logical-expr Logical plan and expressions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants