Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 10 additions & 3 deletions cpp/src/arrow/compute/api_scalar.cc
Original file line number Diff line number Diff line change
Expand Up @@ -332,7 +332,10 @@ static auto kRoundOptionsType = GetFunctionOptionsType<RoundOptions>(
static auto kRoundTemporalOptionsType = GetFunctionOptionsType<RoundTemporalOptions>(
DataMember("multiple", &RoundTemporalOptions::multiple),
DataMember("unit", &RoundTemporalOptions::unit),
DataMember("week_starts_monday", &RoundTemporalOptions::week_starts_monday));
DataMember("week_starts_monday", &RoundTemporalOptions::week_starts_monday),
DataMember("ceil_is_strictly_greater",
&RoundTemporalOptions::ceil_is_strictly_greater),
DataMember("calendar_based_origin", &RoundTemporalOptions::calendar_based_origin));
static auto kRoundToMultipleOptionsType = GetFunctionOptionsType<RoundToMultipleOptions>(
DataMember("multiple", &RoundToMultipleOptions::multiple),
DataMember("round_mode", &RoundToMultipleOptions::round_mode));
Expand Down Expand Up @@ -491,11 +494,15 @@ RoundOptions::RoundOptions(int64_t ndigits, RoundMode round_mode)
constexpr char RoundOptions::kTypeName[];

RoundTemporalOptions::RoundTemporalOptions(int multiple, CalendarUnit unit,
bool week_starts_monday)
bool week_starts_monday,
bool ceil_is_strictly_greater,
bool calendar_based_origin)
: FunctionOptions(internal::kRoundTemporalOptionsType),
multiple(std::move(multiple)),
unit(unit),
week_starts_monday(week_starts_monday) {}
week_starts_monday(week_starts_monday),
ceil_is_strictly_greater(ceil_is_strictly_greater),
calendar_based_origin(calendar_based_origin) {}
constexpr char RoundTemporalOptions::kTypeName[];

RoundToMultipleOptions::RoundToMultipleOptions(double multiple, RoundMode round_mode)
Expand Down
23 changes: 22 additions & 1 deletion cpp/src/arrow/compute/api_scalar.h
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,9 @@ enum class CalendarUnit : int8_t {
class ARROW_EXPORT RoundTemporalOptions : public FunctionOptions {
public:
explicit RoundTemporalOptions(int multiple = 1, CalendarUnit unit = CalendarUnit::DAY,
bool week_starts_monday = true);
bool week_starts_monday = true,
bool ceil_is_strictly_greater = false,
bool calendar_based_origin = false);
static constexpr char const kTypeName[] = "RoundTemporalOptions";
static RoundTemporalOptions Defaults() { return RoundTemporalOptions(); }

Expand All @@ -117,6 +119,25 @@ class ARROW_EXPORT RoundTemporalOptions : public FunctionOptions {
CalendarUnit unit;
/// What day does the week start with (Monday=true, Sunday=false)
bool week_starts_monday;
/// Enable this flag to return a rounded value that is strictly greater than the input.
/// For example: ceiling 1970-01-01T00:00:00 to 3 hours would yield 1970-01-01T03:00:00
/// if set to true and 1970-01-01T00:00:00 if set to false.
/// This applies for ceiling only.
bool ceil_is_strictly_greater;
/// By default time is rounded to a multiple of units since 1970-01-01T00:00:00.
/// By setting calendar_based_origin to true, time will be rounded to a number
/// of units since the last greater calendar unit.
/// For example: rounding to a multiple of days since the beginning of the month or
/// to hours since the beginning of the day.
/// Exceptions: week and quarter are not used as greater units, therefore days will
/// will be rounded to the beginning of the month not week. Greater unit of week
/// is year.
/// Note that ceiling and rounding might change sorting order of an array near greater
/// unit change. For example rounding YYYY-mm-dd 23:00:00 to 5 hours will ceil and
/// round to YYYY-mm-dd+1 01:00:00 and floor to YYYY-mm-dd 20:00:00. On the other hand
/// YYYY-mm-dd+1 00:00:00 will ceil, round and floor to YYYY-mm-dd+1 00:00:00. This
/// can break the order of an already ordered array.
bool calendar_based_origin;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, (before looking at the implementation) this explanation is not really clear to me what the effect exactly is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By looking at the code and the inline comments, it became clear. One of the comments is

    // Round to a multiple of units since the last greater unit.
    // For example: round to multiple of days since the beginning of the month or
    // to hours since the beginning of the day.

I wanted to suggest using that. Although reading the above again, that's actually already quite similar. But the additional example might help to clarify.

One question I have is how it is exactly defined what the "one less precise calendar unit" / "last greater unit" is? Eg we have "week" between "day" and "month", but we are still using "month" as the greater unit for "day"? (which probably makes sense in practice, but we should maybe document this somewhere more explicitly)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I'll move the comments around.
Regarding the unit ordering I'm basically implementing for lubridate parity (and dropping bimonth, season, halfyear units). I suppose we can either document behaviour or let user choose the "origin unit". I would prefer to just document for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about round_from_greater_unit ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only relevant if multiple is not 1, right? Because in that case it could also be something like multiple_since_greater_unit

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, only for multiple != 1. multiple_since_greater_unit sounds good.

Copy link
Member Author

@rok rok May 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to multiple_since_greater_unit.

};

class ARROW_EXPORT RoundToMultipleOptions : public FunctionOptions {
Expand Down
Loading