Skip to content

merge.data.table no error when "on" instead of "by" and merge on all common vars empty #7088

@urfos

Description

@urfos

This works:

library(data.table)
dt1 = data.table(v1 = c("a", "b"), v2 = 1:2, v3 = 3:4)
dt2 = data.table(v1 = c("b", "c"), v4 = 5:6)
dt = merge(dt1, dt2)
dt
# Key: <v1>
#       v1    v2    v3    v4
#  <char> <int> <int> <int>
# 1:      b     2     4     5

But if I make a non-overlapping v2 common across the two, an empty DT is created without error, which probably makes sense:

dt2 = data.table(v1 = c("b", "c"), v2 = 5:6, v4 = 5:6)
dt = merge(dt1, dt2)
dt
# Empty data.table (0 rows and 4 cols): v1,v2,v3,v4

The problem is that similar behaviour occurs when "on" is supplied instead of "by" (this may happen given that dt1[dt2, on = "v1"] is an alternative), only a warning is issued. An error would be desirable.

dt = merge(dt1, dt2, on = "v1")
# Message d'avis :
# Dans merge.data.table(dt1, dt2, on = "v1") :
#   Unknown argument 'on' has been passed.
dt 
# Empty data.table (0 rows and 4 cols): v1,v2,v3,v4

Session info :

> sessionInfo()
R version 4.4.1 (2024-06-14 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26100)

Matrix products: default


locale:
[1] LC_COLLATE=French_France.utf8  LC_CTYPE=French_France.utf8   
[3] LC_MONETARY=French_France.utf8 LC_NUMERIC=C                  
[5] LC_TIME=French_France.utf8    

time zone: Europe/Paris
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.16.0

loaded via a namespace (and not attached):
[1] compiler_4.4.1    cli_3.6.3         tools_4.4.1       rstudioapi_0.16.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions