Skip to content

No way to get the schema for sliding accumulator state #14701

@mwylde

Description

@mwylde

The AggregateUDF trait includes a function fn state_fields(&self, args: StateFieldsArgs) -> Result<Vec<Field>> to get the types for the intermediate state of the aggregate. This is useful if we need to store the states, for example for multi-level aggregation.

For our use-case we also need to store the accumulator states as part of our checkpointing system. This works so long as we're using the standard accumulators, but breaks down if you want to use sliding accumulators. This is because some aggregates (for example, sum) have different state fields in sliding mode (for sum, this is additional "count" field, used to determine when we've retracted all of the data).

But there doesn't seem to be any way to determine what the state fields will be for a sliding accumulator. A couple of possible options here:

  • Follow the pattern of is_distinct, which also can produce different accumulators. This is passed in to the state_fields function as a field on the StateFieldsArgs struct; we could add a similar one for is_sliding
  • It seems like state_fields is really a property of the accumulator, not of the aggregate (as various aggregates may produce different accumulators depending on the options and which accumulator function is called), so it might be better to have the state_fields function on the accumulator instead of the aggregate.

We've gone ahead and implemented the first approach in our fork, but would be nice to get something in upstream that addresses this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions