diff --git a/vignettes/datatable-reshape.Rmd b/vignettes/datatable-reshape.Rmd index 68ef57e6cf..ba8758b0bb 100644 --- a/vignettes/datatable-reshape.Rmd +++ b/vignettes/datatable-reshape.Rmd @@ -142,18 +142,18 @@ So far we've seen features of `melt` and `dcast` that are implemented efficientl However, there are situations we might run into where the desired operation is not expressed in a straightforward manner. For example, consider the `data.table` shown below: ```{r} -s2 <- "family_id age_mother dob_child1 dob_child2 dob_child3 gender_child1 gender_child2 gender_child3 -1 30 1998-11-26 2000-01-29 NA 1 2 NA -2 27 1996-06-22 NA NA 2 NA NA -3 26 2002-07-11 2004-04-05 2007-09-02 2 2 1 -4 32 2004-10-10 2009-08-27 2012-07-21 1 1 1 -5 29 2000-12-05 2005-02-28 NA 2 1 NA" +s2 <- "family_id age_mother name_child1 name_child2 name_child3 gender_child1 gender_child2 gender_child3 + 1 30 Ben Anna NA 1 2 NA + 2 27 Tom NA NA 2 NA NA + 3 26 Lia Sam Amy 2 2 1 + 4 32 Max Zoe Joe 1 1 1 + 5 29 Dan Eva NA 2 1 NA" DT <- fread(s2) DT ## 1 = female, 2 = male ``` -And you'd like to combine (`melt`) all the `dob` columns together, and `gender` columns together. Using the old functionality, we could do something like this: +And you'd like to combine (`melt`) all the `name` columns together, and `gender` columns together. Using the old functionality, we could do something like this: ```{r} DT.m1 = melt(DT, id.vars = c("family_id", "age_mother")) @@ -161,12 +161,12 @@ DT.m1[, c("variable", "child") := tstrsplit(variable, "_", fixed = TRUE)] DT.c1 = dcast(DT.m1, family_id + age_mother + child ~ variable, value.var = "value") DT.c1 -str(DT.c1) ## gender column is class IDate now! +str(DT.c1) ## gender column is character type now! ``` #### Issues -1. What we wanted to do was to combine all the `dob` and `gender` type columns together respectively. Instead, we are combining *everything* together, and then splitting them again. I think it's easy to see that it's quite roundabout (and inefficient). +1. What we wanted to do was to combine all the `name` and `gender` type columns together respectively. Instead, we are combining *everything* together, and then splitting them again. I think it's easy to see that it's quite roundabout (and inefficient). As an analogy, imagine you've a closet with four shelves of clothes and you'd like to put together the clothes from shelves 1 and 2 together (in 1), and 3 and 4 together (in 3). What we are doing is more or less to combine all the clothes together, and then split them back on to shelves 1 and 3! @@ -189,9 +189,9 @@ Since we'd like for `data.table`s to perform this operation straightforward and The idea is quite simple. We pass a list of columns to `measure.vars`, where each element of the list contains the columns that should be combined together. ```{r} -colA = paste0("dob_child", 1:3) +colA = paste0("name_child", 1:3) colB = paste0("gender_child", 1:3) -DT.m2 = melt(DT, measure.vars = list(colA, colB), value.name = c("dob", "gender")) +DT.m2 = melt(DT, measure.vars = list(colA, colB), value.name = c("name", "gender")) DT.m2 str(DT.m2) ## col type is preserved @@ -206,7 +206,7 @@ str(DT.m2) ## col type is preserved Usually in these problems, the columns we'd like to melt can be distinguished by a common pattern. We can use the function `patterns()`, implemented for convenience, to provide regular expressions for the columns to be combined together. The above operation can be rewritten as: ```{r} -DT.m2 = melt(DT, measure.vars = patterns("^dob", "^gender"), value.name = c("dob", "gender")) +DT.m2 = melt(DT, measure.vars = patterns("^name", "^gender"), value.name = c("name", "gender")) DT.m2 ``` @@ -305,7 +305,7 @@ We can now provide **multiple `value.var` columns** to `dcast` for `data.table`s ```{r} ## new 'cast' functionality - multiple value.vars -DT.c2 = dcast(DT.m2, family_id + age_mother ~ variable, value.var = c("dob", "gender")) +DT.c2 = dcast(DT.m2, family_id + age_mother ~ variable, value.var = c("name", "gender")) DT.c2 ```