Skip to content

Trouble getting fancy with ARRAY_AGG (DISTINCT ARRAY_AGG) #6743

@bellwether-softworks

Description

@bellwether-softworks

Describe the bug

ARRAY_AGG usage in SQL results in an error when used in conjunction with DISTINCT parameter alongside another aggregate field.

To Reproduce

  1. Establish example data:
CREATE TABLE example(id INT, parent_id INT, tag VARCHAR) AS VALUES
    (1, 0, 'bob'),
    (2, 0, 'cat'),
    (3, 1, 'tom'),
    (4, 1, 'cat'),
    (5, 1, 'tom');
  1. Execute query using ARRAY_AGG and DISTINCT parameter:
SELECT
        parent_id,
        COUNT(id) AS count_of,
        ARRAY_AGG(DISTINCT tag) AS tags
    FROM example
    GROUP BY parent_id;

Executing the above results in the following message:

ArrowError(ExternalError(Internal("Inconsistent types in ScalarValue::iter_to_array. Expected Utf8, got List([tom,cat])")))

Expected behavior

Desired output should be similar to the following:

parent_id count_of tags
1 3 [tom, cat]
0 2 [bob, cat]

Additional context

Omitting either the COUNT field, or the DISTINCT clause in ARRAY_AGG, allows the query to complete successfully.

The above was initially discovered in v17.0.0 and verified to still be presenting in v26.0.0.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions