Consolidate the conversion between Granularity and VirtualColumn, and improve the mapping of granularity usage in projections.#18403
Conversation
| * <li>Period('PT1H') in America/Los_Angeles can be mapped to Period('PT1H') in UTC</li> | ||
| * <li>Period('P1D') in America/Los_Angeles cannot be mapped to Period('P1D') in UTC</li> | ||
| */ | ||
| public boolean canBeMappedTo(PeriodGranularity target) |
There was a problem hiding this comment.
admittedly i'm still digesting how exactly this method works, but it seems kind of expensive to do to match for every projection we consider of every segment when the timezones don't match (at least getUtcMappablePeriodSecondsOrThrow seems expensive). Perhaps we should make a cache of these conversions so we can re-use the work we've done since its likely going to be a lot of the same checks over and over?
There was a problem hiding this comment.
also these methods feel like they could use some additional comments to make it clearer what is going on and why this works
There was a problem hiding this comment.
added some additional comments. also added PERIOD_GRAN_CACHE in Projections.java.
| // we don't allow: | ||
| // 1. virtual column on __time if there's no grouping on __time | ||
| // 2. non-UTC or non-epoch origin period granularity | ||
| throw InvalidInput.exception( | ||
| "cannot use granularity[%s] on column[%s] as projection time column", | ||
| maybeGranularity, | ||
| dimension.getName() | ||
| ); |
There was a problem hiding this comment.
I don't think this needs to be an exception, we just need to not consider it as a time column, the same way as a coarser granularity would not be __time. These columns can still be substituted for query virtual columns, so there is value in being able to pre-compute them, they just cannot serve as a replacement for __time (or time floor on __time) in queries.
There was a problem hiding this comment.
ah i was thinking about some cases when there's a time-like virtual column (non-utc) and another time-like virtual column (utc, can be converted to gran), e.x. PT1H in Pacific and P1D in UTC, if we just ignore, we would just use floor(__time, P1D) as the time column, and the virtual column PT1H in Pacific would just be computed based on floor(__time, P1D), that would be incorrect.
it seems simple and safe to just assume the first time-like grouping column must be gran, and the following must be coarser, e.x. PT1H in UTC and P1D in Pacific is an acceptable config. Another case is PT2H, PT3H, PT1H, it seems a bit more complex (or calculated) if we don't have the FIRST time-like must be gran assumption.
There was a problem hiding this comment.
actually realized non-UTC vc would just be treated as regular grouping column, for PT2H and PT3H, it'd just choose the finer one PT2H.
| // determine the granularity and time column name for the projection, based on the first time-like grouping column. | ||
| // if there are multiple time-like grouping columns, they must be "coarser" than the first time-like grouping column. |
There was a problem hiding this comment.
this is too restrictive, the __time column does not need to be first, we just need to replace any candidate __time column with the finer granularity column, since what we are looking for is the finest granularity column we can use as a substitute for __time in any queries against the base table
There was a problem hiding this comment.
finer is not completely safe, we're looking for the most compatible gran, e.x. between PT2H and PT3H, we can find PT1H as compatible (but this is not currently supported, i'm just using this as an example).
and deciding on the most compatible is tricky since we don't want non-UTC time to sneak in, e.x. PT30M in Pacific time would sort of invalidate our PT1H gran in UTC, handling this just makes things more complicated, thus choose to handle in a strict way here
There was a problem hiding this comment.
actually the compatible is PT6H. i realized projection need to find coarse ones as compatible
There was a problem hiding this comment.
updated this logic to handle the gran, it should never throw exception now, skips non-utc and non-mappable vc
There was a problem hiding this comment.
this comment is stale now since it no longer requires time to be first and additional time groupings to be coarser
| .addReferencedPhysicalColumn(ColumnHolder.TIME_COLUMN_NAME); | ||
| } else if (virtualGranularity.equals(Granularities.NONE) | ||
| || projection.getEffectiveGranularity().equals(Granularities.ALL)) { | ||
| return null; |
There was a problem hiding this comment.
what is this check for none and all separately for? Are there other query virtual column granularities that are ok with an all granularity projection?
There was a problem hiding this comment.
if query granularity is NONE, and projection gran is not (would already been caught by the first if statement), we just can't map, e.x. format(__time, 'YYYY-MM-DD HH:mm:ss') query can't use hourly projection. This exits early.
but i also reformatted this part to be more readable, so return null should be at bottom now
There was a problem hiding this comment.
i should be more specific, previous this is using a simple isFinerThan comparison, which is not safe, because PT2H is finer than PT3H but the former can't be mapped to the latter. now we're switching to canBeMappedTo, but it only applies to period gran, so adding this special handling for Granularities.NONE and Granularities.ALL.
|
|
||
| public class Projections | ||
| { | ||
| private static final Map<byte[], Boolean> PERIOD_GRAN_CACHE = new HashMap<>(); |
There was a problem hiding this comment.
this needs to be a concurrent map or use a lock or something
There was a problem hiding this comment.
updated to use ConcurrentHashMap, no lock should be fine since we're only using computeIfAbsent
| // determine the granularity and time column name for the projection, based on the first time-like grouping column. | ||
| // if there are multiple time-like grouping columns, they must be "coarser" than the first time-like grouping column. |
There was a problem hiding this comment.
this comment is stale now since it no longer requires time to be first and additional time groupings to be coarser
| PeriodGranularity projectionGran = (PeriodGranularity) projection.getEffectiveGranularity(); | ||
| byte[] combinedKey = StringUtils.toUtf8(projectionGran + "->" + virtualGran); |
There was a problem hiding this comment.
it might be better to use getCacheKey() from the granularities instead of toString conversion, this whole bit could be
byte[] combinedKey = new CacheKeyBuilder(0).appendCacheable(projectionGran).appendCacheable(virtualGran).build()
Description
Consolidate the conversion between
GranularityandVirtualColumn, and improve the mapping of granularity usage in projections.Before this PR, projection has issues handling granularity with time zones, examples:
After this PR, the granularity matches as following:
This is done through:
VirtualColumn->Granularityshould be able to handle allTimestampFloorExprregardless of the time zone.Granularity->VirtualColumnshould always be able to convertPeriodGranularitytoTimestampFloorExpr, regardless of the time zone.PeriodGranularityhas a newcanBeMappedTofunction, in a simple way, a finer gran can be mapped to a coarser gran. It also considers timezone, origin, month/year, week, day, hour compatible stuff. E.x.VirtualColumn, unlessGranularities.ALL.canBeMappedToto query __gan, this is more restrictive thanisFinerThan.Additional restrictions on projection granularity:
__timecolumn. Without this requirement, we can't simply maptimeColumnNamein projection to__time.Key changed/added classes in this PR
ProjectionsGranularitiesPeriodGranularityPeriodGranularityTestCursorFactoryProjectionTestThis PR has: