Skip to content

Add pair Latest/Earliest aggregators#14195

Closed
imply-cheddar wants to merge 1 commit intoapache:masterfrom
imply-cheddar:first-last-pairs
Closed

Add pair Latest/Earliest aggregators#14195
imply-cheddar wants to merge 1 commit intoapache:masterfrom
imply-cheddar:first-last-pairs

Conversation

@imply-cheddar
Copy link
Copy Markdown
Contributor

The current Latest/Earliest aggregators all do
finalization to pretend like they are dealing with their payload type instead of exposing that they are actually dealing with pairs. This adds a STRING_LATEST and STRING_EARLIEST that explicitly deal with pairs and also adds PAIR_LEFT and PAIR_RIGHT functions that can access the two sides of the pairs.

This does not add the numerical versions yet, though that can be an easy iteration on top of this.

The current Latest/Earliest aggregators all do
finalization to pretend like they are dealing with
their payload type instead of exposing that they are
actually dealing with pairs.  This adds a STRING_LATEST
and STRING_EARLIEST that explicitly deal with pairs
and also adds PAIR_LEFT and PAIR_RIGHT functions that
can access the two sides of the pairs.

This does not add the numerical versions yet, though
that can be an easy iteration on top of this.
return ExpressionType.LONG;
}
}
throw new UOE("Invalid input type [%s] to pair_left, expected a complex pair type");

Check failure

Code scanning / CodeQL

Missing format argument

This format call refers to 1 argument(s) but only supplies 0 argument(s).
return ExpressionType.STRING;
}
}
throw new UOE("Invalid input type [%s] to pair_right, expected a complex pair type");

Check failure

Code scanning / CodeQL

Missing format argument

This format call refers to 1 argument(s) but only supplies 0 argument(s).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both the missing format arguments need to be introduced


import javax.annotation.Nullable;

public class PairOperatorConversion implements SqlOperatorConversion
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this implement DirectOperatorConversion instead ?

validationHelperCheckArgumentCount(args, 1);
Expr arg = args.get(0);

class PairLeft extends ExprMacroTable.BaseScalarUnivariateMacroFunctionExpr
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit. Might call it PairLeftExpression

Also, we can move these to individual classes

@JsonProperty("fieldName") @Nullable final String fieldName,
@JsonProperty("timeColumn") @Nullable final String timeColumn,
@JsonProperty("maxStringBytes") Integer maxStringBytes
@JsonProperty("foldColumn") @Nullable final String foldColumn,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need a separate foldColumn? It seems like instead we could just not auto initialize the timeColumn, and in factorize methods examine the ColumnCapabilities of ColumnSelectorFactory of the fieldName column and factorize into either the folding aggregator or the building aggregator as appropriate

Comment on lines +51 to +52
// Less efficient code path when folding is a possibility (we must read the value selector first just in case
// it's a foldable object).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: stale comment copied from old combined agg?

@JsonProperty("fieldName") @Nullable final String fieldName,
@JsonProperty("timeColumn") @Nullable final String timeColumn,
@JsonProperty("maxStringBytes") Integer maxStringBytes
@JsonProperty("foldColumn") @Nullable final String foldColumn,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment about foldColumn vs just using ColumnCapabilities

@Override
public ExpressionType getOutputType(InputBindingInspector inspector)
{
final ExpressionType inputType = inspector.getType(arg.getBindingIfIdentifier());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inputType could be null here. You could alternatively use arg.getOutputType(inspector) though it can also still return null.

If the argument must be an identifier, i would recommend pulling arg.getBindingIfIdentifier() and validating that it is not null separately.

return ExpressionType.LONG;
}
}
throw new UOE("Invalid input type [%s] to pair_left, expected a complex pair type");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

macro expression implement an interface called NamedFunction that provide a bunch of validation helper methods to help provide consistent error messaging for expressions, so could do something like

validationFailed("Invalid input type [%s], expected [%s]", inputType, EXPRESSION_TYPE)

@Override
public ExpressionType getOutputType(InputBindingInspector inspector)
{
final ExpressionType inputType = inspector.getType(arg.getBindingIfIdentifier());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment about using arg.getOutputType(inspector)

public static final byte KLL_FLOATS_SKETCH_TO_QUANTILE_CACHE_TYPE_ID = 42;
public static final byte KLL_FLOATS_SKETCH_TO_QUANTILES_CACHE_TYPE_ID = 43;
public static final byte KLL_FLOATS_SKETCH_TO_STRING_CACHE_TYPE_ID = 44;
public static final byte PAIRS = 45;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't needed?

Comment on lines 92 to +94
case STRING:
case COMPLEX:
return new StringFirstAggregatorFactory(name, fieldName, timeColumn, maxStringBytes);
return new StringFirstAggregatorFactory(name, fieldName, timeColumn, null, maxStringBytes, true);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this right if we go forward with separate foldColumn? Shouldn't the complex case put fieldName in the foldColumn?

If you drop foldColumn and use the selector factory + capabilities (my preference at least at the time of writing this comment) instead then this doesn't need to change for now I think. Though, this part will also need to change if the PR that adds ingest time support for numeric first/last, since we'll need to get the complex type name and pick the correct folding aggregator...

case STRING:
case COMPLEX:
return new StringLastAggregatorFactory(name, fieldName, timeColumn, maxStringBytes);
return new StringLastAggregatorFactory(name, fieldName, timeColumn, null, maxStringBytes, true);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment about COMPLEX case and foldColumn.

Comment on lines +57 to +72
public class EarliestLatestPairSqlAggregator implements SqlAggregator
{
public static final SqlAggregator[] DEFINED_AGGS = new SqlAggregator[]{
new EarliestLatestPairSqlAggregator(AggregatorType.EARLIEST, PayloadType.STRING),
new EarliestLatestPairSqlAggregator(AggregatorType.LATEST, PayloadType.STRING)
};

enum AggregatorType
{
EARLIEST, LATEST
}

enum PayloadType
{
STRING, LONG, DOUBLE
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need new functions to do aggregations on complex types? can't EARLIEST and LATEST detect when their inputs are a pair and do the right thing?

@github-actions
Copy link
Copy Markdown

This pull request has been marked as stale due to 60 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If you think
that's incorrect or this pull request should instead be reviewed, please simply
write any comment. Even if closed, you can still revive the PR at any time or
discuss it on the dev@druid.apache.org list.
Thank you for your contributions.

@github-actions github-actions Bot added the stale label Feb 12, 2024
@github-actions
Copy link
Copy Markdown

This pull request/issue has been closed due to lack of activity. If you think that
is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions
Copy link
Copy Markdown

This pull request has been marked as stale due to 60 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If you think
that's incorrect or this pull request should instead be reviewed, please simply
write any comment. Even if closed, you can still revive the PR at any time or
discuss it on the dev@druid.apache.org list.
Thank you for your contributions.

@github-actions github-actions Bot added the stale label May 21, 2024
@github-actions
Copy link
Copy Markdown

This pull request/issue has been closed due to lack of activity. If you think that
is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions Bot closed this Jun 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants