FR: expansion of melt functionality for handling names of output

I'm trying to summarize some data which I have stored (in wide form) as repeated cross-sections; the way to go would appear to be to summarize and melt, but there's some crucial information lost in the process -- variable names are tossed & it's not clear we can predict the resulting order.

An example:

```
set.seed(10239)
DT <- setnames(as.data.table(replicate(8, runif(10))),
               paste0("meas", rep(1:3, c(3, 3, 2)), 
                      rep(c("_jan", "_feb", "_mar"))))
DT
#        meas1_jan meas1_feb  meas1_mar  meas2_jan
#  1: 0.6600937557 0.3694227 0.04429776 0.02691871
#  2: 0.5562533571 0.2260477 0.99935621 0.62940965
#  3: 0.7800425424 0.9500860 0.05847563 0.90400149
#  4: 0.3372507696 0.1853700 0.96434815 0.23819722
#  5: 0.0008951684 0.2743278 0.33468493 0.55100025
#  6: 0.4064611420 0.6773607 0.76984110 0.03732329
#  7: 0.5057564366 0.6481860 0.46362786 0.10610388
#  8: 0.2232209216 0.3236526 0.94694898 0.32251033
#  9: 0.9359388519 0.1219411 0.79383274 0.52573046
# 10: 0.1367496471 0.0121765 0.80466101 0.86377631
#       meas2_feb  meas2_mar  meas3_jan meas3_feb
#  1: 0.438328012 0.48073709 0.15475894 0.6031853
#  2: 0.951975449 0.38903743 0.41983518 0.1806789
#  3: 0.006685105 0.53661894 0.01102345 0.6490597
#  4: 0.719372826 0.49205557 0.24464973 0.9969787
#  5: 0.318251195 0.98845165 0.75659034 0.3054178
#  6: 0.117685706 0.10667707 0.60810269 0.1472457
#  7: 0.210446290 0.76481919 0.70049390 0.4345142
#  8: 0.900549510 0.68583183 0.93117619 0.3676016
#  9: 0.628168668 0.02526911 0.55742879 0.7729294
# 10: 0.498467443 0.17884387 0.97037215 0.1924867
```

We want to `melt` grouping each month's data, so `patterns` is quite helpful:

```
melt(DT[ , lapply(.SD, mean)],
     measure.vars = patterns("_jan", "_feb", "_mar"))
```

Looks beautiful, but the output less so:

```
#    variable    value1    value2    value3
#1:        1 0.4542663 0.3788571 0.6180074
#2:        2 0.4204972 0.4789930 0.4648342
#3:        3 0.5354431 0.4650098        NA
```

We've lost a lot of info. Does `value1` correspond to `jan`, `feb`, or `mar`? Does `variable == 1` mean `meas1`, `meas2`, or `meas3`?

I hope that `value1` means `jan`, `value2` means `feb`, and `value3` means `mar`, but this is unclear; ditto that `valuek` corresponds to `variable == k`. This is especially true if the column order of the input is potentially unknown.

The `value.name` and `variable.name` fields are pretty useless to help us here.

It seems the robust way to deal with this as of now is to specify the `measure.vars` in a `list`, like so:

```
melt(DT[ , lapply(.SD, mean)],
     measure.vars = list(paste0("meas", 1:3, "_jan"),
                         paste0("meas", 1:3, "_feb"),
                         paste0("meas", 1:2, "_mar")))
```

With this, confidence is restored in the order of output, so we can at least rename the output without worry of mis-naming something. But this seems verbose and ugly, and to make `patterns` seem a lot less helpful as a function.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FR: expansion of melt functionality for handling names of output #1547

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

FR: expansion of melt functionality for handling names of output #1547

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions