-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-13626: [R] Bindings for log base b #11133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
nealrichardson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this! If you don't have one already, can you sign up for an ASF Jira account so I can assign the issue to you?
r/tests/testthat/test-dplyr.R
Outdated
| input %>% | ||
| # suppress 'NaNs produced' warning on the first row of df | ||
| # that evaluates to NaN (R raises warning but Arrow does not) | ||
| suppressWarnings(mutate(., y = log(x, base = x))) %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the old Windows build failed--perhaps the magrittr improvement that supports this is too new for 3.6 binary packages. So maybe the best option is to move the suppressWarnings and the accompanying comment to be around/outside of expect_dplyr_equal().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I uncovered another corner case in testing this (log(10, base = 1) evaluates to Inf in both R and Arrow). I think the latest commit does this well without silencing any warnings that come out of the other (more common) cases!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log(x) with base 1 is not a reasonable calculation because this implies solving the following equation 1^y = x which is only defined for x = 1. Nevertheless, most math libraries will yield a value of infinity for base 1 logarithms when x != 1 and NaN for x = 1. Arrow C++ follows this scheme.
These cases are missing from the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a test in 812f941 that ensures NaN for log(1, base = 1) and Inf for log(10, base = 1) in both dplyr and arrow. Is that test reasonable or should it be rewritten/clarified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is reasonable. I also noticed that these corner cases are not tested in the C++ log tests, but that corresponds to another PR.
|
Sure! I should be 'paleolimbot' in ASF Jira. |
|
Fixed the base with length != 1 issue, although I think my error message could be better since base with length != 1 is supported, just not as a non-expression. Reprex summary of new functionality: library(arrow)
library(dplyr, warn.conflicts = FALSE)
# with base as a column
RecordBatch$create(a = 2:5, b = 2:5) %>%
mutate(log(a, base = b)) %>%
collect()
#> # A tibble: 4 × 3
#> a b `log(a, base = b)`
#> <int> <int> <dbl>
#> 1 2 2 1
#> 2 3 3 1
#> 3 4 4 1
#> 4 5 5 1
# with base as a scalar
RecordBatch$create(a = 2:5) %>%
mutate(log(a, base = 3)) %>%
collect()
#> # A tibble: 4 × 2
#> a `log(a, base = 3)`
#> <int> <dbl>
#> 1 2 0.631
#> 2 3 1
#> 3 4 1.26
#> 4 5 1.46
# errors for base with length != 1
RecordBatch$create(a = 2:5) %>%
mutate(log(a, base = 3:4)) %>%
collect()
#> Warning: In log(a, base = 3:4), base with length != 1 not supported by Arrow;
#> pulling data into R
#> # A tibble: 4 × 2
#> a `log(a, base = 3:4)`
#> <int> <dbl>
#> 1 2 0.631
#> 2 3 0.792
#> 3 4 1.26
#> 4 5 1.16Should # errors
RecordBatch$create(a = 2:5) %>%
mutate(a * log(2)) %>%
collect()
#> Warning: Expression a * log(2) not supported in Arrow; pulling data into R
#> # A tibble: 4 × 2
#> a `a * log(2)`
#> <int> <dbl>
#> 1 2 1.39
#> 2 3 2.08
#> 3 4 2.77
#> 4 5 3.47
# works!
RecordBatch$create(a = 2:5) %>%
mutate(a * log(Expression$scalar(2))) %>%
collect()
#> # A tibble: 4 × 2
#> a `a * log(Expression$scalar(2))`
#> <int> <dbl>
#> 1 2 1.39
#> 2 3 2.08
#> 3 4 2.77
#> 4 5 3.47 |
Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com>
…ions (either can be scalar or Expression)
|
With updated handling of library(arrow, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)
# with base as a column
RecordBatch$create(a = 2:5, b = 2:5) %>%
mutate(log(a, base = b)) %>%
collect()
#> # A tibble: 4 × 3
#> a b `log(a, base = b)`
#> <int> <int> <dbl>
#> 1 2 2 1
#> 2 3 3 1
#> 3 4 4 1
#> 4 5 5 1
# with base as a scalar
RecordBatch$create(a = 2:5) %>%
mutate(log(a, base = 3)) %>%
collect()
#> # A tibble: 4 × 2
#> a `log(a, base = 3)`
#> <int> <dbl>
#> 1 2 0.631
#> 2 3 1
#> 3 4 1.26
#> 4 5 1.46
# with x as a scalar
RecordBatch$create(a = 2:5) %>%
mutate(log(2, base = a)) %>%
collect()
#> # A tibble: 4 × 2
#> a `log(2, base = a)`
#> <int> <dbl>
#> 1 2 1
#> 2 3 0.631
#> 3 4 0.5
#> 4 5 0.431
# errors for base with length != 1
RecordBatch$create(a = 2:5) %>%
mutate(log(a, base = 3:4)) %>%
collect()
#> Warning: In log(a, base = 3:4), base must be a column or a length-1 numeric;
#> other values not supported by Arrow; pulling data into R
#> # A tibble: 4 × 2
#> a `log(a, base = 3:4)`
#> <int> <dbl>
#> 1 2 0.631
#> 2 3 0.792
#> 3 4 1.26
#> 4 5 1.16
# errors for x with length != 1
RecordBatch$create(a = 2:5) %>%
mutate(log(a, base = 3:4)) %>%
collect()
#> Warning: In log(a, base = 3:4), base must be a column or a length-1 numeric;
#> other values not supported by Arrow; pulling data into R
#> # A tibble: 4 × 2
#> a `log(a, base = 3:4)`
#> <int> <dbl>
#> 1 2 0.631
#> 2 3 0.792
#> 3 4 1.26
#> 4 5 1.16Created on 2021-09-14 by the reprex package (v2.0.1) |
nealrichardson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One note about a test I think we need to add but otherwise this is excellent, thank you!
Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Implements `log(x, base = (something other than 2, e, or 10))`. Closes apache#11133 from paleolimbot/arrow-13626-log-base-b Authored-by: Dewey Dunnington <dewey@fishandwhistle.net> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
Implements
log(x, base = (something other than 2, e, or 10)).