Skip to content

data.table::fread CSV logic fails with complex field #2051

@scarrascoso

Description

@scarrascoso

Hi:

I'm trying to load a 1.6GB csv file with data.table::fread. The process fails at some point complaining about a specific line:

Read 76.2% of 5288107 rowsError in fread("2015-03.csv") :
Expecting 50 cols, but line 4128650 contains text after processing all cols. 
Try again with fill=TRUE. Another reason could be that fread's logic in distinguishing one or more fields having embedded sep=',' 
and/or (unescaped) '\n' characters within unbalanced unescaped quotes has failed. 
If quote='' doesn't help, please file an issue to figure out if the logic could be improved.

I have checked that the offending line is ok (with csvfix and also https://csvlint.io/). I have included this line in the following example file (which contains the header, a non-failing line and then the failing line):

test.txt

As you can see, it has some non-trivial quoting and escaping

Do you think the fread csv logic could be extended to be able to deal with things like this?

Thanks a lot, best regards!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions