Skip to content

Conversation

@seddonm1
Copy link
Contributor

@seddonm1 seddonm1 commented Mar 3, 2021

@alamb This is the second last of the current string functions but I think there may be one after that with new code.

This implements some of the miscellaneous string functions ascii, chr, initcap, repeat, reverse, to_hex. The next PR will have more useful functions (including regex).

A little bit of tidying for consistency to the other functions was applied.

This PR is a child of #9243

@github-actions
Copy link

github-actions bot commented Mar 3, 2021

@alamb
Copy link
Contributor

alamb commented Mar 3, 2021

Thanks @seddonm1 -- I plan to review this tomorrow

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skimmed through this and it looks good. The only nit I have is that we are often repeating this code:

args[0]
        .as_any()
        .downcast_ref::<GenericStringArray<T>>()
        .ok_or_else(|| {
            DataFusionError::Internal("could not cast string to StringArray".to_string())
        })?;

It might be worth considering adding a helper method or macro for this.

@seddonm1
Copy link
Contributor Author

seddonm1 commented Mar 3, 2021

@andygrove no problem. I will create a macro for that.

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't review all of the code in detail but I am approving this based on the tests (which are really cool) and the fact that this is new functionality, so we can always follow up with bug fixes if issues are found. Thanks for adding the macros - I think that makes the code much easier to review.

@seddonm1
Copy link
Contributor Author

seddonm1 commented Mar 4, 2021

Thanks @andygrove . I have a few more PRs to do to finish this first phase of work. Then I think it's time to tackle type coercion.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through this PR carefully and I like it!

I agree with @andygrove 's comments about the excellent test coverage and new code. 🏅 🏅

Thank you so much @seddonm1 . I'll plan to merge this sometime over the next day or two

test_expression!("character_length('josé')", "4");
test_expression!("character_length(NULL)", "NULL");
test_expression!("chr(CAST(120 AS int))", "x");
test_expression!("chr(CAST(128175 AS int))", "💯");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 indeed!

test_function!(
Ascii,
&[lit(ScalarValue::Utf8(Some("💯".to_string())))],
Ok(Some(128175)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is strange to me that a function called ascii can return something larger than 127. However, it seems that quirkiness / awesomeness came from postgres :)

alamb=# select ascii('💯');
 ascii  
--------
 128175
(1 row)

👍

"could not cast fill to StringArray".to_string(),
)
})?;
let string_array = downcast_string_arg!(args[0], "string", T);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these macros certainly make the code nicer to read. 👍

@alamb alamb closed this in dec5ab9 Mar 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants