Hi,
how do I properly encode / import string fields with double quotes in them?
The docs say:
character columns can be quoted (...,2,"Joe Bloggs",3.14,...) or not quoted (...,2,Joe Bloggs,3.14,...).
Due to the restrictions on unquoted char cols, my cols are always quoted (a sep can appear in them)
Thus, unescaped quotes may be present in a quoted field (...,2,"Joe, "Bloggs"",3.14,...) as well as
escaped quotes (...,2,"Joe ",Bloggs"",3.14,...). If an embedded quote is followed by the separator
inside a quoted field, the embedded quotes up to that point in that field must be balanced; e.g.
...,2,"www.blah?x="one",y="two"",3.14,....
Due to the restrictions on "normal dquotes" inside of the string, I have to escape them.
Now the problem is that this is imported correctly, as I want it:
"a", "b"
"x", "my name is "joe""
See here
d = fread("test_dt.csv", header = FALSE, sep = ",", stringsAsFactors = FALSE, data.table = FALSE)
V1 V2
1 a "b"
2 x "my name is "joe""
But this is what I have to use, but the backslashes used for quoting the extra dquotes now get doubled
File:
"a", "b"
"x", "my name is \"joe\""
fread output:
V1 V2
1 a "b"
2 x "my name is \\"joe\\""
Note: I have some control over the csv files, as I am already preprocessing them a bit. But I need a routine that works on general files, so in my string columns I have to expect arbitrary input.
Hi,
how do I properly encode / import string fields with double quotes in them?
The docs say:
Due to the restrictions on unquoted char cols, my cols are always quoted (a sep can appear in them)
Due to the restrictions on "normal dquotes" inside of the string, I have to escape them.
Now the problem is that this is imported correctly, as I want it:
See here
But this is what I have to use, but the backslashes used for quoting the extra dquotes now get doubled
File:
fread output:
Note: I have some control over the csv files, as I am already preprocessing them a bit. But I need a routine that works on general files, so in my string columns I have to expect arbitrary input.