Skip to content

Arrow write_parquet removes .internal.selfref, data.table warning message not helpful #6737

@nicki-dese

Description

@nicki-dese

The bug can be replicated as follows (I'm on Windows 11, using version 4.4.2 of R). It is new behaviour as of arrow 17.0.

library(arrow)          # version 18.1.0.1 
library(data.table)   # version 1.16.4

dt <- data.table(x = 1:3)

names(attributes(dt))

# returns
# "names"             "row.names"         "class"             ".internal.selfref"

#works, creating a new column by reference. 
dt[, y := letters[1:3]]

# save file using write_parquet
write_parquet(dt, "test.parquet")

# read file back in using read_parquet
dt_after_parquet <- read_parquet("test.parquet")

# this has stripped away the .internal.selfref attribute
names(attributes(dt_after_parquet))
# returns
# "names"             "row.names"         "class" 

# meaning that this works but with the following warning message.
dt_after_parquet[, z := 4:6]

# Warning message:
# In `[.data.table`(dt_after_parquet, , `:=`(z, 4:6)) :
#   Invalid .internal.selfref detected and fixed by taking a (shallow) copy of the 
# data.table so that := can add this new column by reference. At an earlier point, 
# this data.table has been copied by R (or was created manually using structure() 
# or similar). Avoid names<- and attr<- which in R currently (and oddly) may copy 
# the whole data.table. Use set* syntax instead to avoid copying: ?set, ?setnames 
# and ?setattr. If this message doesn't help, please report your use case to the 
# data.table issue tracker so the root cause can be fixed or this message improved.

What was happening took effort to track down, because it was not obvious to me that writing and reading a data.table file was covered by the warning message. (With the added complication that I was using targets, which called write/read parquet in the background because I'd selected to save my targets as parquet files).

I have reported the bug to arrow, here. I debated whether to cross-post, but given the request in the warning message itself, decided to. Please delete/close if this cross-posting was ill-advised.

Metadata

Metadata

Assignees

No one assigned

    Labels

    messageMessages, warnings, errors

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions