A small note on the order of cube result

In the **Description** in `?cube`, we find that [`cube`]
> Reflects SQLs GROUPING SETS

Then, the **Value** section is rather vague: 
>A data.table with various aggregates

...but **References** are provided to [PostgreSQL 7.2.4. GROUPING SETS, CUBE, and ROLLUP](https://www.postgresql.org/docs/9.5/queries-table-expressions.html#QUERIES-GROUPING-SETS).

There we find that:

    CUBE ( a, b, c )
is equivalent to

    GROUPING SETS (
        ( a, b, c ),
        ( a, b    ),
        ( a,    c ),
        ( a       ),
        (    b, c ),
        (    b    ),
        (       c ),
        (         )
    )

It seems like the _first_ grouping variable varies _slowest_, and the last variable varies fastest. **Disclaimer**: I don't have access to PostgreSQL, so I can't confirm that this actually reflects the final output ;) 

In `cube`, it is the other way around: the _last_ grouping variable in `by` varies _slowest_, and the first fastest, like:

    GROUPING SETS (
        ( a, b, c ),
        (    b, c ),
        ( a,    c ),
        (       c ),
        ( a, b    ),
        (    b    ),
        ( a       ),
        (         )
    )


I don't claim that PostgreSQL is "right", but _given_ that no explicit **Value** section is provided in `?cube`, and that the help text instead refers to PostgreSQL docs, the order of `cube` output may be considered inconsistent (but again, note my disclaimer above). In addition, I find the PostgreSQL ordering more intuitive, a matter of taste perhaps.

---------

An example to illustrate the order of `cube`:

    set.seed(1)
    d <- data.table(a = rep(1:2, each = 8),
                    b = rep(1:2, each = 4),
                    c = rep(1:2, each = 2),
                    val = sample(0:1, 16, replace = TRUE))

    all.equal(
      cube(d, j= sum(val), by = c("a", "b", "c")),
      groupingsets(d, j = sum(val), by = c("a", "b", "c"),
                   sets = list(c("a", "b", "c"),
                               c(     "b", "c"),
                               c("a",      "c"),
                               c(          "c"),
                               c("a", "b"     ),
                               c(     "b"     ),
                               c("a"          ),
                               character()))
    )
    # [1] TRUE

-----------

**Update**

When I added an `id`, it's obvious that the counter is in fact based on the PostgreSQL order, which in the current output order becomes non-consecutive. Somewhat odd. It seems to me that the output rather could have the PostgreSQL order right away. 


    cube(d, j= sum(val), by = c("a", "b", "c"), id = TRUE)
        grouping  a  b  c V1
     1:        0  1  1  1  0 #  ( a, b, c )
     2:        0  1  1  2  2
     3:        0  1  2  1  1
     4:        0  1  2  2  2
     5:        0  2  1  1  1
     6:        0  2  1  2  0
     7:        0  2  2  1  1
     8:        0  2  2  2  1
     9:        4 NA  1  1  1 # (    b, c )
    10:        4 NA  1  2  2
    11:        4 NA  2  1  2
    12:        4 NA  2  2  3
    13:        2  1 NA  1  1 # ( a,    c )
    14:        2  1 NA  2  4
    15:        2  2 NA  1  2
    16:        2  2 NA  2  1
    17:        6 NA NA  1  3 # (       c )
    18:        6 NA NA  2  5
    19:        1  1  1 NA  2 # ( a, b    )
    20:        1  1  2 NA  3
    21:        1  2  1 NA  1
    22:        1  2  2 NA  2
    23:        5 NA  1 NA  3 # (    b    )
    24:        5 NA  2 NA  5
    25:        3  1 NA NA  5 # ( a       )
    26:        3  2 NA NA  3
    27:        7 NA NA NA  8 # (         )
     grouping  a  b  c V1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A small note on the order of cube result #3179

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

A small note on the order of cube result #3179

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions