Skip to content

Conversation

@rok
Copy link
Member

@rok rok commented Mar 17, 2022

This implements RoundTemporalOptions.ceil_is_strictly_greater and RoundTemporalOptions.calendar_based_origin parameters in C++ to enable temporal rounding in R (ARROW-14821).

@github-actions
Copy link

@github-actions
Copy link

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

@rok
Copy link
Member Author

rok commented Mar 17, 2022

@djnavarro this is almost ready for review.

@rok rok force-pushed the ARROW-14821-change_on_boundary branch 4 times, most recently from b25de93 to f65728c Compare March 30, 2022 13:48
@rok rok force-pushed the ARROW-14821-change_on_boundary branch 2 times, most recently from ba00ddb to 92ad204 Compare March 30, 2022 19:27
@rok rok marked this pull request as ready for review March 30, 2022 19:29
@rok
Copy link
Member Author

rok commented Mar 30, 2022

@djnavarro do you think this will cover requirements on R side?
I've tried to use R to create the test data where possible and do sanity checks but I'm slowly developing test blindness :)

@pitrou @jorisvandenbossche could you please review the changes here? The R style rounding demanded some more logic and maybe something I'm missing something that can be simplified.
Also: perhaps we can find better variable names to replace change_on_boundary and calendar_based_origin?

@rok rok changed the title ARROW-14821: [R] Implement bindings for lubridate's floor_date, ceiling_date, and round_date ARROW-14821: [C++][R] Implement bindings for lubridate's floor_date, ceiling_date, and round_date Apr 1, 2022
@rok
Copy link
Member Author

rok commented Apr 6, 2022

ping :)

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Congrats for working on this @rok. Here are some comments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is only for ceil, perhaps rename it strict_ceil or something similar? @jorisvandenbossche what do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to strict_ceil. Would still be interesting to hear from @jorisvandenbossche

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know if "strict ceil" is a typical term for this behaviour?
(it could also be "ceil_on_boundary" to make it more explicit about "ceil")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to ceil_on_boundary. @pitrou do you think that'll work?

@rok rok force-pushed the ARROW-14821-change_on_boundary branch 2 times, most recently from 657b3d7 to 3ae71e7 Compare April 21, 2022 13:19
@rok
Copy link
Member Author

rok commented Apr 21, 2022

@pitrou Could you please do another pass here?

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have much to add, see below.

@rok rok force-pushed the ARROW-14821-change_on_boundary branch 2 times, most recently from 6a919a8 to d5bc076 Compare April 21, 2022 14:46
@rok
Copy link
Member Author

rok commented Apr 21, 2022

@pitrou Added suggested comments.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'll let @jorisvandenbossche do the final review.

@rok rok force-pushed the ARROW-14821-change_on_boundary branch from d5bc076 to 18a0fdc Compare April 21, 2022 18:09
@rok
Copy link
Member Author

rok commented Apr 22, 2022

CI issues seem unrelated.

@rok
Copy link
Member Author

rok commented Apr 22, 2022

@jorisvandenbossche this would unblock #12154 which would be great to have for the release :).

rok and others added 9 commits May 30, 2022 16:16
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
@rok
Copy link
Member Author

rok commented May 30, 2022

There was the issue about calendar_based_origin=True potentially breaking sortedness (as discussed in #12528).

Are we fine with this caveat? Maybe yes, since it is also what lubridate does (but mention it in the docs?). Or would it be possible to guard the user from that ever happening?

I think we're ok with calendar_based_origin=True breaking sortedness as it would be an explicitly chosen way of rounding for users outside of lubridate. I'll add a note about this to docs.
We could also introduce perserve_sortedness flag in #12528 and handle calendar_based_origin=True case.

For example, if you round by "x" minutes, the origin unit is an hour, and in that case we could error if one hour is not dividable by "x"? Although for hours and below that can work, but I suppose for days, months, years that is not possible?

Lubridate actually limits to time multiples smaller or equal than the greater time unit - e.g. "61min" is not a valid rounding, but "60min" is. Your proposal makes things safer but we might want to "deregulate" it later. Leaving things as is gives more options but we might need to limit them later. Dunno, I don't have an opinion on this.

@rok rok force-pushed the ARROW-14821-change_on_boundary branch from 8c2d6ad to 422b022 Compare May 30, 2022 19:58
@rok
Copy link
Member Author

rok commented May 30, 2022

@jorisvandenbossche added an example to doc to present the issue with calendar_based_origin=True.

@jorisvandenbossche jorisvandenbossche changed the title ARROW-14821: [C++][R] Implement bindings for lubridate's floor_date, ceiling_date, and round_date ARROW-14821: [C++] Add ceil_is_strictly_greater and calendar_based_origin temporal round options (to mimic lubridate's date rounding) May 31, 2022
@rok
Copy link
Member Author

rok commented May 31, 2022

@pitrou do you think this is ready to merge?

Co-authored-by: Antoine Pitrou <pitrou@free.fr>
@pitrou
Copy link
Member

pitrou commented May 31, 2022

TBH, I'm starting to think that calendar-based rounding is sufficiently different that we might define separate functions for it (e.g. "ceil_calendar" as opposed to "ceil_temporal"). We might also not bother refactoring this again :-)

@rok
Copy link
Member Author

rok commented May 31, 2022

Thanks for the review, added the changes.

TBH, I'm starting to think that calendar-based rounding is sufficiently different that we might define separate functions for it (e.g. "ceil_calendar" as opposed to "ceil_temporal"). We might also not bother refactoring this again :-)

I had that thought yesterday too. Now that tests are worked out it wouldn't be too much work I feel. Perhaps after #12528 is in :).

@rok rok requested a review from pitrou May 31, 2022 13:59
@rok
Copy link
Member Author

rok commented Jun 1, 2022

@pitrou or are you saying we should refactor now? :)

@pitrou
Copy link
Member

pitrou commented Jun 1, 2022

@rok As you prefer. We can merge this as is if you want.

@rok
Copy link
Member Author

rok commented Jun 1, 2022

@pitrou i would prefer merging this and then discuss/do the refactor separately.

@pitrou
Copy link
Member

pitrou commented Jun 2, 2022

Ok, let's do it.

@pitrou pitrou merged commit fc082c5 into apache:master Jun 2, 2022
@rok
Copy link
Member Author

rok commented Jun 2, 2022

Thank you for the reviews and feedback @pitrou & @jorisvandenbossche !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants