Skip to content

Fread: possible segfault when reading a file with extra sep on out-of-sample row #2523

@st-pasha

Description

@st-pasha

Test file can be generated using the following python script:

nrows = 2100       # need >2000 in order to trigger nJumps==10
ncols = 1024 * 16  # "round" number increases probability that 
                   # `sizes` array will not be overallocated
row = ','.join(list('abcdefghijklmnop') * int(ncols / 16))
rows = [row] * nrows
rows[111] += ","
src = "\n".join(rows)
open("test.txt", "w").write(src)

Then

> for(b in 1:100){ fread("test.txt", header=F) -> DT }

 *** caught bus error ***
address 0x7fd839704e40, cause 'non-existent physical address'

The primary cause are the following lines in "TEPID" section of freadMain:

         else if (eol(&tch)) {
            int8_t thisSize = size[j];
            ((char **) targets)[thisSize] += thisSize;

Here j can be equal to ncol, which causes OOB access in array size, and even if that succeeds by accident, the retrieved value thisSize can be anything, causing OOB access on array targets.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions