Skip to content

.SD locked error using foverlaps after ordering by variable #2099

@jrderuiter

Description

@jrderuiter

I am trying to sort out why some code from an existing R package (not my own code) runs fine on data.table version 1.9.4 but gives an error with later versions of data.table.

The code the package uses looks something like this, in which foverlaps is used to determine the overlap between a set of copy number segments and the corresponding probes (seg.cna represents the segments, and markers represents the probes):

library(data.table)

seg.cna <- data.table(
    Sample=c('S001', 'S002', 'S003', 'S004', 'S005', 'S006'), 
    Chromosome=rep('1', 6), Start=rep(3252007, 6), 
    End=c(3252007, 3714033, 3714033, 4083031, 4083031, 4214828), 
    LogRatio=c(-0.166154, -0.660141, 0.404224, -0.375556, 0.0, -0.354481), 
    Segment=c('Seg1', 'Seg2', 'Seg3', 'Seg4', 'Seg5', 'Seg6'))

markers <- data.table(
    Name=c('P0001', 'P0002', 'P0003'),
    Chromosome=rep('1', 3), 
    Position=c(3252007, 3714033, 4083031))

# Without this ordering it works fine.
seg.cna <- seg.cna[,Chromosome:=ordered(toupper(Chromosome), c('1'))]

setkey(seg.cna, Chromosome, Start, End)

seg.markers <- seg.cna[, 
    foverlaps(markers[, .(Name, Chromosome, Position, Position2=Position)],
             .SD,
             by.x=c('Chromosome', 'Position','Position2'),
             type='within', nomatch=0L)[, .(ProbesNo=.N), by=Segment],
             .SDcols=c('Chromosome', 'Start', 'End', 'Segment'),
    by=Sample][, Sample:=NULL]

Running this code in data.table > 1.9.4 gives the following error:

Error in set(i, j = lc, value = newfactor) : 
  .SD is locked. Updating .SD by reference using := or set are reserved for future use. Use := in j directly. Or use copy(.SD) as a (slow) last resort, until shallow() is exported.

With the following traceback:

7: set(i, j = lc, value = newfactor)
6: bmerge(xx, ii, seq_along(xx), seq_along(xx), haskey(xx), integer(0), 
       mult = mult, ops = rep(1L, length(xx)), integer(0), 1L, verbose = verbose, 
       ...)
5: matches(x, y, intervals[2L], rollends = rep(type == "any", 2L), 
       ...)
4: indices(uy, y, yintervals, nomatch = 0L, roll = roll)
3: foverlaps(markers[, list(Name, Chromosome, Position, Position2 = Position)], 
       .SD, by.x = c("Chromosome", "Position", "Position2"), type = "within", 
       nomatch = 0L)
2: `[.data.table`(seg.cna, , foverlaps(markers[, .(Name, Chromosome, 
       Position, Position2 = Position)], .SD, by.x = c("Chromosome", 
       "Position", "Position2"), type = "within", nomatch = 0L)[, 
       .(ProbesNo = .N), by = Segment], .SDcols = c("Chromosome", 
       "Start", "End", "Segment"), by = Sample)
1: seg.cna[, foverlaps(markers[, .(Name, Chromosome, Position, Position2 = Position)], 
       .SD, by.x = c("Chromosome", "Position", "Position2"), type = "within", 
       nomatch = 0L)[, .(ProbesNo = .N), by = Segment], .SDcols = c("Chromosome", 
       "Start", "End", "Segment"), by = Sample]

This seems similar to the error encountered in #1341, however the error we observe still persists in data.table >= 1.9.8. What seems to trigger the error is the following line in the above code (as without this line the example runs fine):

seg.cna <- seg.cna[,Chromosome:=ordered(toupper(Chromosome), c('1'))]

Any idea if this is related to the previous issue?

R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.4

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.10.5

loaded via a namespace (and not attached):
[1] tools_3.3.2

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions