Skip to content

groupingsets with autonamed j creates extra column for ungrouped set #3653

@Henrik-P

Description

@Henrik-P

I stumbled over this when I was rolling up sample size (.N) for different sets of grouping variables, including no grouping.

If using groupingsets with an unnamed expression in j which is "autonamed" (e.g. .N) and including an "ungrouped" grouping set, then an additional variable is created for the ungrouped set.

Some data:

d <- data.table(x = c("a", "a", "b"))
d
#    x
# 1: a
# 2: a
# 3: b

With a "grouped" grouping set only, an unnamed expression with .N in j works: the result is autonamed N:

groupingsets(d, j = .N, by = "x", sets = list("x"))
#    x N
# 1: a 2
# 2: b 1

However, when also including an "ungrouped" grouping set (character()), an additional column is generated for the ungrouped result:

groupingsets(d, j = .N, by = "x", sets = list("x", character()))
#       x  N V1
# 1:    a  2 NA
# 2:    b  1 NA
# 3: <NA> NA  3

With an ungrouped grouping set only, the N column is included despite it was only used for the grouped result in the previous example:

groupingsets(d, j = .N, by = "x", sets = list(character()))
#       x  N V1
# 1: <NA> NA  3

Similar examples with named expressions in j, like j = .(N = .N), works.

Additional (admittedly contrived) attempts with .GRP and .I in j give similar results. Thus, it seems like this behaviour may be related to autonaming.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions