[MSQ] Confirm that the allocated segment's granularity matches the requested granularity#14475
Conversation
| * Note: ALL granularity isn't aligned to any interval, however this method is defines that ALL granularity matches | ||
| * an interval with boundary ({@code DateTimes.MIN}, {@code DateTimes.MAX}) | ||
| */ | ||
| public static boolean doesIntervalMatchesGranularity(final Interval interval, final Granularity granularity) |
There was a problem hiding this comment.
Nit: align probably suits better.
| public static boolean doesIntervalMatchesGranularity(final Interval interval, final Granularity granularity) | |
| public static boolean isAlignedWithGranularity(final Interval interval, final Granularity granularity) |
There was a problem hiding this comment.
Left a comment on the alternative implementation, which suggests to me that aligned is probably a misnomer as we want both the following conditions to hold true:
- The interval is aligned with the granularity
- The interval should fit exactly one "unit" of the granularity.
In case that comment holds true, we shouldn't name it aligned. I am open to alternatives as matches isn't very descriptive either 😥
| return retVal; | ||
| } | ||
|
|
||
| /** |
There was a problem hiding this comment.
Maybe add this method to Intervals class instead, as it could have uses outside the MSQ extension too.
There was a problem hiding this comment.
Left this method in the interval utils since the top-level Javadoc for IntervalUtils mentions that the methods are specific to MSQ for now, and would make more sense to move to Intervals if MSQ wasn't an extension
| // the requested segment is present and that segment completely overlaps the request. Throw an error if the interval | ||
| // doesn't match the granularity requested | ||
| if (allocation == null | ||
| || !IntervalUtils.doesIntervalMatchesGranularity(allocation.getInterval(), segmentGranularity)) { |
There was a problem hiding this comment.
I think the error message should be different in the null case and the unexpected interval case.
There was a problem hiding this comment.
Noted! I think we can make a new fault class then and signify to the user the pre-existing segment so that the user can use REPLACE, change the granularity in the query or not insert in the given time interval
| return (interval.getStartMillis() == DateTimes.MIN.getMillis()) | ||
| && (interval.getEndMillis() == DateTimes.MAX.getMillis()); | ||
| } | ||
| return granularity.isAligned(interval) && granularity.bucketEnd(interval.getStart()).equals(interval.getEnd()); |
There was a problem hiding this comment.
I think just isAligned should be enough. This is the code for PeriodGranularity.isAligned:
@Override
public boolean isAligned(Interval interval)
{
return bucket(interval.getStart()).equals(interval);
}
There was a problem hiding this comment.
The above suggestion won't hold true in case the interval is a multiple of the granularity. For example an interval like doesIntervalMatchesGranularity("2022-01-01/2022-01-03", Granularities.MONTH) would return true with granularity.isAligned(), however false with the current implementation which is the desired behavior.
If we specify PARTITIONED BY DAY, we shouldn't allow an allocation of 2 days, therefore we check the partition end as well.
WDYT?
There was a problem hiding this comment.
Yeah, but Granularity.isAligned already does just that. You can verify it by doing this:
Assert.assertTrue(Granularities.DAY.isAligned(Intervals.of("2011-01-01/2011-01-02")));
Assert.assertFalse(Granularities.DAY.isAligned(Intervals.of("2011-01-01/2011-01-03")));
In fact, it would be nice if you could include this test in this PR so that future devs are not confused.
You probably find the name isAligned a little confusing for this purpose. But if you think about it, here alignment doesn't mean that the start and end of the interval should be aligned with the cut marks of this granularity. Rather, it means that the interval must exactly fit into the scheme of this granularity. The javadoc also says something to this effect:
/**
* Return true if time chunks populated by this granularity includes the given interval time chunk.
*/
public abstract boolean isAligned(Interval interval);
We can update this javadoc to say "Returns true only if time chunks ..." to reduce confusion.
There was a problem hiding this comment.
Thanks for the clarification, I understand why isAligned checks that the interval fits exactly into the granularity. Refactored
51fc129 to
881e8e5
Compare
|
Thanks for the review! |
kfaraz
left a comment
There was a problem hiding this comment.
Minor nitpicks, otherwise changes look good.
| * This is used to check if the granularity allocation made by the overlord is the same as the one requested in the | ||
| * SQL query | ||
| */ | ||
| public static boolean isEternityOrDoesIntervalAlignWithGranularity( |
There was a problem hiding this comment.
Nit: you need not mention the part about eternity as an eternity interval does satisfy the definition of alignment with AllGranularity.
So this method can just be called isAligned or something to reflect the underlying usage of granularity.isAligned.
| final Granularity granularity | ||
| ) | ||
| { | ||
| // AllGranularity needs special handling since AllGranularity#bucketStart always returns false |
There was a problem hiding this comment.
I suppose we could fix AllGranularity.isAligned() to return true when the interval is eternity.
There was a problem hiding this comment.
I was not confident if that would break any pre-existing logic. Also, Javadoc states:
/**
* No interval is aligned with all granularity since it's infinite.
*/
Therefore I found suitable to extract the logic as a helper method
| | <a name="error_ColumnNameRestricted">`ColumnNameRestricted`</a> | The query uses a restricted column name. | `columnName`: The restricted column name. | | ||
| | <a name="error_ColumnTypeNotSupported">`ColumnTypeNotSupported`</a> | The column type is not supported. This can be because:<br /> <br /><ul><li>Support for writing or reading from a particular column type is not supported.</li><li>The query attempted to use a column type that is not supported by the frame format. This occurs with ARRAY types, which are not yet implemented for frames.</li></ul> | `columnName`: The column name with an unsupported type.<br /> <br />`columnType`: The unknown column type. | | ||
| | <a name="error_InsertCannotAllocateSegment">`InsertCannotAllocateSegment`</a> | The controller task could not allocate a new segment ID due to conflict with existing segments or pending segments. Common reasons for such conflicts:<br /> <br /><ul><li>Attempting to mix different granularities in the same intervals of the same datasource.</li><li>Prior ingestions that used non-extendable shard specs.</li></ul>| `dataSource`<br /> <br />`interval`: The interval for the attempted new segment allocation. | | ||
| | <a name="error_InsertCannotAllocateSegment">`InsertAllocatedIncorrectSegment`</a> | The controller task could not allocate a new segment ID of the specified granularity due to conflict with existing segments or pending segments. <br /> <br /> This happens when a coarser segment is already overlapping the interval for which the allocation was requested. Either use a REPLACE to overwrite over the existing overlapping segment or re-run INSERT with the pre-existing segment granularity in order to append to the interval| `dataSource`<br /> <br />`requestedInterval`: The interval for the attempted new segment allocation. <br /><br />`allocatedInterval` The interval allocated for the requested segment | |
There was a problem hiding this comment.
Nit: This fault says that the requested segment could not be allocated, but something else could be.
In that sense, it doesn't seem too different from the InsertCannotAllocateSegment.
Do you think just a different error message in the same fault type would work?
…ularity (apache#14475) Changes: - Throw an `InsertCannotAllocateSegmentFault` if the allocated segment is not aligned with the requested granularity. - Tests to verify new behaviour
Description
Currently, if you run the two queries like below:
The second one allocates a segment with MONTH granularity, even though the partitioning has stated DAY granularity.
This is because the overlord does a best-effort job of fetching the segment with the requested granularity and in case a pre-existing segment is there that overlaps the requested segment, the overlord returns that segment (which might not match the requested granularity).
This is semantically incorrect since the user has requested a segment of DAY granularity and this should return an error instead. This PR adds the desired checks to prevent the described case from happening.
Release notes
A regression has been fixed in MSQ which can cause INSERT to allocate segments of incorrect granularity in case pre-existing segments of coarser granularity are present. Such cases now fail with the
InsertCannotAllocateSegmentfault with an appropriate actionable error message.This PR has: