In the Description in ?cube, we find that [cube]
Reflects SQLs GROUPING SETS
Then, the Value section is rather vague:
A data.table with various aggregates
...but References are provided to PostgreSQL 7.2.4. GROUPING SETS, CUBE, and ROLLUP.
There we find that:
is equivalent to
GROUPING SETS (
( a, b, c ),
( a, b ),
( a, c ),
( a ),
( b, c ),
( b ),
( c ),
( )
)
It seems like the first grouping variable varies slowest, and the last variable varies fastest. Disclaimer: I don't have access to PostgreSQL, so I can't confirm that this actually reflects the final output ;)
In cube, it is the other way around: the last grouping variable in by varies slowest, and the first fastest, like:
GROUPING SETS (
( a, b, c ),
( b, c ),
( a, c ),
( c ),
( a, b ),
( b ),
( a ),
( )
)
I don't claim that PostgreSQL is "right", but given that no explicit Value section is provided in ?cube, and that the help text instead refers to PostgreSQL docs, the order of cube output may be considered inconsistent (but again, note my disclaimer above). In addition, I find the PostgreSQL ordering more intuitive, a matter of taste perhaps.
An example to illustrate the order of cube:
set.seed(1)
d <- data.table(a = rep(1:2, each = 8),
b = rep(1:2, each = 4),
c = rep(1:2, each = 2),
val = sample(0:1, 16, replace = TRUE))
all.equal(
cube(d, j= sum(val), by = c("a", "b", "c")),
groupingsets(d, j = sum(val), by = c("a", "b", "c"),
sets = list(c("a", "b", "c"),
c( "b", "c"),
c("a", "c"),
c( "c"),
c("a", "b" ),
c( "b" ),
c("a" ),
character()))
)
# [1] TRUE
Update
When I added an id, it's obvious that the counter is in fact based on the PostgreSQL order, which in the current output order becomes non-consecutive. Somewhat odd. It seems to me that the output rather could have the PostgreSQL order right away.
cube(d, j= sum(val), by = c("a", "b", "c"), id = TRUE)
grouping a b c V1
1: 0 1 1 1 0 # ( a, b, c )
2: 0 1 1 2 2
3: 0 1 2 1 1
4: 0 1 2 2 2
5: 0 2 1 1 1
6: 0 2 1 2 0
7: 0 2 2 1 1
8: 0 2 2 2 1
9: 4 NA 1 1 1 # ( b, c )
10: 4 NA 1 2 2
11: 4 NA 2 1 2
12: 4 NA 2 2 3
13: 2 1 NA 1 1 # ( a, c )
14: 2 1 NA 2 4
15: 2 2 NA 1 2
16: 2 2 NA 2 1
17: 6 NA NA 1 3 # ( c )
18: 6 NA NA 2 5
19: 1 1 1 NA 2 # ( a, b )
20: 1 1 2 NA 3
21: 1 2 1 NA 1
22: 1 2 2 NA 2
23: 5 NA 1 NA 3 # ( b )
24: 5 NA 2 NA 5
25: 3 1 NA NA 5 # ( a )
26: 3 2 NA NA 3
27: 7 NA NA NA 8 # ( )
grouping a b c V1
In the Description in
?cube, we find that [cube]Then, the Value section is rather vague:
...but References are provided to PostgreSQL 7.2.4. GROUPING SETS, CUBE, and ROLLUP.
There we find that:
is equivalent to
It seems like the first grouping variable varies slowest, and the last variable varies fastest. Disclaimer: I don't have access to PostgreSQL, so I can't confirm that this actually reflects the final output ;)
In
cube, it is the other way around: the last grouping variable inbyvaries slowest, and the first fastest, like:I don't claim that PostgreSQL is "right", but given that no explicit Value section is provided in
?cube, and that the help text instead refers to PostgreSQL docs, the order ofcubeoutput may be considered inconsistent (but again, note my disclaimer above). In addition, I find the PostgreSQL ordering more intuitive, a matter of taste perhaps.An example to illustrate the order of
cube:Update
When I added an
id, it's obvious that the counter is in fact based on the PostgreSQL order, which in the current output order becomes non-consecutive. Somewhat odd. It seems to me that the output rather could have the PostgreSQL order right away.