Unexpected results when combining `on` with `keyby` or `by` 

I was working my way through the `data.table` vignette [Secondary indices and auto indexing](https://rawgit.com/wiki/Rdatatable/data.table/vignettes/datatable-secondary-indices-and-auto-indexing.html). To make it easier to compare original data and results, I replaced the non-minimal "flights" data with much smaller datasets throughout.

It worked fine until section 2f "_Aggregation using `by`"_, where `on` is combined with `keyby`. There we find code to "_Get the maximum departure delay for each month corresponding to origin = "JFK". Order the result by month_": 

    flights["JFK", max(dep_delay), keyby = month, on = "origin"]

When I tried the equivalent code on two different data sets, two different issues appeared:

### 1. Label of the `keyby` and `by` variable identical for different levels


    d1 <- data.table(x = rep(c("a", "b"), each = 4), y = 1:0, z = c(3, 6, 8, 5, 4, 1, 2, 7))
    d1
    #    x y z
    # 1: a 1 3
    # 2: a 0 6
    # 3: a 1 8
    # 4: a 0 5
    # 5: b 1 4 <~~ max z for x = b & y = 1
    # 6: b 0 1
    # 7: b 1 2
    # 8: b 0 7 <~~ max z for x = b & y = 0

Translating the code from the vignette: Get the maximum `z` for each `y` corresponding to `x = "b"`. Order the result by `y`
  
    d1["b", max(z), keyby = y, on = "x"]
    #    y V1
    # 1: 1  7 <~~ y should be 0 here
    # 2: 1  4
 
The label of the `keyby` variable `y` is erroneously `1` also for the `0` level.

Also when using `by` instead of `keyby` together with `on` the labels are wrong:

    d1["b", max(z), by = y, on = "x"]
    #    y V1
    # 1: 1  4 
    # 2: 1  7 <~~ y should be 0 here
   
Just to verify that corresponding code without `on` works fine:

    d1[x == "b", max(z), keyby = y]
    #    y V1
    # 1: 0  7
    # 2: 1  4
    
    d1[x == "b", max(z), by = y]
    #    y V1
    # 1: 1  4
    # 2: 0  7

### 2. (a) Label of the `keyby` (or `by`) variable and the resulting value do not match. (b) Result not sorted by the `keyby` variable.
 
    d2 <- as.data.table(mtcars[mtcars$cyl %in% c(6, 8), c("am", "vs", "hp")])

Again, equivalent desired outcome: get the maximum `hp` for each `vs` corresponding to `am = 0`. Order the result by `vs`.


First, just look at the data corresponding to `am = 0` to easier spot the desired result:

    d2[.(0), on = "am"]
    
    #     am vs  hp
    # 1:   0  1 110
    # 2:   0  0 175
    # 3:   0  1 105
    # 4:   0  0 245 <~~ max hp for vs = 0
    # 5:   0  1 123 <~~ max hp for vs = 1 
    # 6:   0  1 123
    # 7:   0  0 180
    # 8:   0  0 180
    # 9:   0  0 180
    # 10:  0  0 205
    # 11:  0  0 215
    # 12:  0  0 230
    # 13:  0  0 150
    # 14:  0  0 150
    # 15:  0  0 245
    # 16:  0  0 175
    
    
When combining `keyby` and `on`, the result is not sorted and the labels don't match the values:
    
    d2[.(0), max(hp), keyby = vs, on = "am"]
    #    vs  V1
    # 1:  1 245
    # 2:  0 123

When combining `by` and `on`, the labels don't match the values:
    
    d2[.(0), max(hp), by = vs, on = "am"]
    #    vs  V1
    # 1:  0 123
    # 2:  1 245

Subsetting without `on` works fine:
    
    d2[am == 0, max(hp), keyby = vs]
    #    vs  V1
    # 1:  0 245
    # 2:  1 123
    
    
    d2[am == 0 , max(hp), by = vs]
    #    vs  V1
    # 1:  1 123
    # 2:  0 245

----
So far I have not been able to discern any particular pattern in the data which generates these results (e.g. any particular order of `on` and/or `keyby` variables in the original and/or subset data).

Can you spot any mistakes in my code or is there something strange going on here?

----

R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
data.table_1.9.8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected results when combining `on` with `keyby` or `by` #1943

1. Label of the `keyby` and `by` variable identical for different levels

2. (a) Label of the `keyby` (or `by`) variable and the resulting value do not match. (b) Result not sorted by the `keyby` variable.

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unexpected results when combining on with keyby or by #1943

Description

1. Label of the keyby and by variable identical for different levels

2. (a) Label of the keyby (or by) variable and the resulting value do not match. (b) Result not sorted by the keyby variable.

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Unexpected results when combining `on` with `keyby` or `by` #1943

1. Label of the `keyby` and `by` variable identical for different levels

2. (a) Label of the `keyby` (or `by`) variable and the resulting value do not match. (b) Result not sorted by the `keyby` variable.