ARROW-14819: [R] Binding for lubridate::qday #13440

rok · 2022-06-27T14:44:59Z

This adds lubridate-like qday function. Counts number of days elapsed since beginning of the quarter.

github-actions · 2022-06-27T14:58:55Z

https://issues.apache.org/jira/browse/ARROW-14819

github-actions · 2022-06-27T14:58:58Z

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

r/src/compute.cpp

paleolimbot · 2022-06-27T23:57:36Z

(I'm on vacation this week but look forward to taking a look on Monday!)

r/NEWS.md

thisisnic

Looks good, just one comment needs adding to clarify things, but otherwise looks fine to me. Thanks!

r/R/dplyr-funcs-datetime.R

r/tests/testthat/test-dplyr-funcs-datetime.R

More updates were needed - review no longer valid

r/R/dplyr-funcs-datetime.R

r/tests/testthat/test-dplyr-funcs-datetime.R

rok · 2022-06-29T12:09:34Z

@dragosmg
I added 2 suggestions.

Merged, thank you!

rok · 2022-07-05T18:23:36Z

@thisisnic after the rebase this is now (almost) c++ free :).

paleolimbot

Awesome!

paleolimbot · 2022-07-06T00:26:49Z

r/tests/testthat/test-dplyr-funcs-datetime.R

Rather than test_df, it might make sense for this binding to use a sequence of dates (or datetimes) that spans a year (or maybe that spans a year and a leap year if that matters here). Maybe something like

tibble::tibble(date = seq(as.Date("2000-01-01"), as.Date("2000-12-31"), by = "day")) #> # A tibble: 366 × 1 #> date #> <date> #> 1 2000-01-01 #> 2 2000-01-02 #> 3 2000-01-03 #> 4 2000-01-04 #> 5 2000-01-05 #> 6 2000-01-06 #> 7 2000-01-07 #> 8 2000-01-08 #> 9 2000-01-09 #> 10 2000-01-10 #> # … with 356 more rows tibble::tibble(datetime = seq(as.POSIXct("2000-01-01 00:00:00"), as.POSIXct("2000-12-31 23:00:00"), by = "day")) #> # A tibble: 366 × 1 #> datetime #> <dttm> #> 1 2000-01-01 00:00:00 #> 2 2000-01-02 00:00:00 #> 3 2000-01-03 00:00:00 #> 4 2000-01-04 00:00:00 #> 5 2000-01-05 00:00:00 #> 6 2000-01-06 00:00:00 #> 7 2000-01-07 00:00:00 #> 8 2000-01-08 00:00:00 #> 9 2000-01-09 00:00:00 #> 10 2000-01-10 00:00:00 #> # … with 356 more rows

I love the idea of more complete tests! I would actually propose a development-time test suite (it seems overkill for CI) that tests every moment over the past century.

The test you're proposing however hits this bug where rounding kernels interpret 32 bit arrays as 64 bit ones (ARROW-16142) so I suppose we really need to fix this now.

You could try that and see how long it takes...it might only be a few ms and then I'd say keep it in the regular test suite. We do have a mechanism for running extra tests but right now it's limited to the large memory tests (via the env var ARROW_LARGE_MEMORY_TESTS). Given that a single real-world poke at this exposed an error, I'd say at least a year is a must in our normal test suite.

Yeah, why not. I'll give it a try.

Ok, ARROW-16142 was resolved so this is now ready.

r/tests/testthat/test-dplyr-funcs-datetime.R

jonkeane

This looks good. One very minor suggestion for even more comment 🚲 🏠 ing (do feel free to ignore it thought, it is really minor)

jonkeane · 2022-07-18T19:59:36Z

r/R/dplyr-funcs-datetime.R

This is extremely minor, and take it or leave it: but it did take me a second to think through why we are adding 1 here. Maybe we could add it to the comment up above?

Suggested change

# We calculate day of quarter by flooring timestamp to beginning of quarter and

# calculating days between beginning of quarter and timestamp/date in question.

floored_x <- build_expr("floor_temporal", x, options = list(unit = 9L))

build_expr("days_between", floored_x, x) + Expression$scalar(1L)

# We calculate day of quarter by flooring timestamp to beginning of quarter and

# calculating days between beginning of quarter and timestamp/date in question.

# Since we use one one-based numbering we add one.

floored_x <- build_expr("floor_temporal", x, options = list(unit = 9L))

build_expr("days_between", floored_x, x) + Expression$scalar(1L)

@jonkeane that's a fair question especially since documentation on qday is not really available.

paleolimbot

I think one test will be better if you make the timestamp UTC but other than that this looks good to me!

paleolimbot · 2022-07-19T01:01:04Z

r/tests/testthat/test-dplyr-funcs-datetime.R

Would seq(as.POSIXct("1999-12-31", tz = "UTC"), as.POSIXct("2001-01-01", tz = "UTC"), by = "day") let you drop the ignore_attr bit below? As a new reader of this code I'm wondering why the timezone needs to be ignored.

That's a good point @paleolimbot! It's as if calling qday with mutate in compare_dplyr_binding returns an int64 with tzone = "UTC". Meanwhile calling it with transmute returns correctly.

Would this be considered a sharp edge?

I believe that's an error in restoring the R metadata (this PR doesn't touch R attributes to my reading). It would be helpful to file a reprex() in a JIRA so that we can fix this later (an int64 should never have a timezone attribute!), but it's definitely not this PR's problem.

In that case I'll open a Jira and merge this.

https://issues.apache.org/jira/browse/ARROW-17132

Co-authored-by: Dragoș Moldovan-Grünfeld <dragos.mold@gmail.com>

ursabot · 2022-07-21T07:41:39Z

Benchmark runs are scheduled for baseline = 39980dc and contender = 0330353. 0330353 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Failed ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.1% ⬆️0.0%] test-mac-arm
[Failed ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.36% ⬆️0.04%] ursa-thinkcentre-m75q
Buildkite builds:
[Failed] 0330353a ec2-t3-xlarge-us-east-2
[Failed] 0330353a test-mac-arm
[Failed] 0330353a ursa-i9-9960x
[Finished] 0330353a ursa-thinkcentre-m75q
[Failed] 39980dcd ec2-t3-xlarge-us-east-2
[Failed] 39980dcd test-mac-arm
[Failed] 39980dcd ursa-i9-9960x
[Finished] 39980dcd ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

github-actions bot added the Component: R label Jun 27, 2022

rok force-pushed the ARROW-14819 branch 3 times, most recently from 77dc96c to 8a8e467 Compare June 27, 2022 17:23

github-actions bot added the Component: Documentation label Jun 27, 2022

rok force-pushed the ARROW-14819 branch from 8a8e467 to 84584e8 Compare June 27, 2022 18:09

rok commented Jun 27, 2022

View reviewed changes

r/src/compute.cpp Outdated Show resolved Hide resolved

rok requested a review from paleolimbot June 27, 2022 18:45

kou reviewed Jun 28, 2022

View reviewed changes

r/NEWS.md Outdated Show resolved Hide resolved

rok force-pushed the ARROW-14819 branch from 84584e8 to e561ad5 Compare June 28, 2022 06:53

rok requested a review from thisisnic June 28, 2022 07:00

thisisnic previously approved these changes Jun 28, 2022

View reviewed changes

r/R/dplyr-funcs-datetime.R Outdated Show resolved Hide resolved

dragosmg reviewed Jun 28, 2022

View reviewed changes

r/tests/testthat/test-dplyr-funcs-datetime.R Outdated Show resolved Hide resolved

rok force-pushed the ARROW-14819 branch 2 times, most recently from 409b3e1 to b1926a3 Compare June 29, 2022 00:11

thisisnic reviewed Jun 29, 2022

View reviewed changes

r/R/dplyr-funcs-datetime.R Outdated Show resolved Hide resolved

dragosmg reviewed Jun 29, 2022

View reviewed changes

r/tests/testthat/test-dplyr-funcs-datetime.R Outdated Show resolved Hide resolved

r/tests/testthat/test-dplyr-funcs-datetime.R Outdated Show resolved Hide resolved

r/tests/testthat/test-dplyr-funcs-datetime.R Outdated Show resolved Hide resolved

rok force-pushed the ARROW-14819 branch from 003ec65 to da9b739 Compare June 30, 2022 20:03

github-actions bot added the Component: C++ label Jun 30, 2022

rok requested a review from thisisnic June 30, 2022 20:06

rok force-pushed the ARROW-14819 branch 2 times, most recently from 4cb6f6b to 37ee4c1 Compare July 5, 2022 18:20

paleolimbot reviewed Jul 6, 2022

View reviewed changes

rok force-pushed the ARROW-14819 branch from 37ee4c1 to 5fdd07f Compare July 6, 2022 12:57

rok force-pushed the ARROW-14819 branch 2 times, most recently from e372aec to c0a8da9 Compare July 11, 2022 13:45

rok requested a review from paleolimbot July 11, 2022 13:48

rok force-pushed the ARROW-14819 branch from c0a8da9 to 7154fb6 Compare July 13, 2022 23:20

rok requested a review from jonkeane July 15, 2022 10:23

rok force-pushed the ARROW-14819 branch from 7154fb6 to 5c033d9 Compare July 18, 2022 19:36

jonkeane approved these changes Jul 18, 2022

View reviewed changes

paleolimbot approved these changes Jul 19, 2022

View reviewed changes

rok force-pushed the ARROW-14819 branch 3 times, most recently from 3a9fba4 to cbfb599 Compare July 19, 2022 16:27

rok requested a review from paleolimbot July 19, 2022 16:28

rok and others added 7 commits July 20, 2022 01:19

Add qday

3b90e25

Switching away from days_between

e0676b0

Apply suggestions from code review

05c8a24

Co-authored-by: Dragoș Moldovan-Grünfeld <dragos.mold@gmail.com>

Fix timezone bug

b018e3c

Review feedback - expanding test data range

3247ce8

Update r/R/dplyr-funcs-datetime.R

37ebd87

Review feedback

ef84a7a

rok force-pushed the ARROW-14819 branch from cbfb599 to ef84a7a Compare July 19, 2022 23:19

rok merged commit 0330353 into apache:master Jul 20, 2022

rok deleted the ARROW-14819 branch July 20, 2022 07:54

asfimport mentioned this pull request Jul 21, 2022

[R] Binding for lubridate::qday #30351

Closed

-    # We calculate day of quarter by flooring timestamp to beginning of quarter and
-    # calculating days between beginning of quarter and timestamp/date in question.
-    floored_x <- build_expr("floor_temporal", x, options = list(unit = 9L))
-    build_expr("days_between", floored_x, x) + Expression$scalar(1L)
+    # We calculate day of quarter by flooring timestamp to beginning of quarter and
+    # calculating days between beginning of quarter and timestamp/date in question.
+    # Since we use one one-based numbering we add one.
+    floored_x <- build_expr("floor_temporal", x, options = list(unit = 9L))
+    build_expr("days_between", floored_x, x) + Expression$scalar(1L)

ARROW-14819: [R] Binding for lubridate::qday #13440

ARROW-14819: [R] Binding for lubridate::qday #13440

Uh oh!

Conversation

rok commented Jun 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jun 27, 2022

Uh oh!

github-actions bot commented Jun 27, 2022

Uh oh!

Uh oh!

paleolimbot commented Jun 27, 2022

Uh oh!

Uh oh!

thisisnic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rok commented Jun 29, 2022

Uh oh!

rok commented Jul 5, 2022

Uh oh!

paleolimbot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jonkeane left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

paleolimbot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ursabot commented Jul 21, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

rok commented Jun 27, 2022 •

edited

Loading

jonkeane left a comment •

edited

Loading