A new parameter bad.lines (or similar) is proposed. This parameter adjusts fread's strategy when dealing with lines that are "broken" (i.e. have less or more than the required number of columns). This parameter may take the following values:
"error" (default) -- stop scanning the file and raise an exception.
"fill" (currently achieved with fill=TRUE) -- any lines having too few fields are padded with NAs. Here "too few" means less than the maximum number of fields observed across all rows in the file.
"skip" -- broken lines are simply ignored.
"extract" -- any broken lines are placed into a separate datatable, whereas the "main" datatable retains empty rows in their place. The extra datatable will have at least the following fields: lineno (line number in the original data file), rowno (corresponding row number in the "main" datatable), line (the text of the line), nfields (number of fields detected on that line)
Additionally, there should be parameter report (default FALSE), which is used for strategies "fill" and "skip", and instructs fread to report to the user line numbers that were filled/skipped.
A new parameter
bad.lines(or similar) is proposed. This parameter adjustsfread's strategy when dealing with lines that are "broken" (i.e. have less or more than the required number of columns). This parameter may take the following values:"error"(default) -- stop scanning the file and raise an exception."fill"(currently achieved withfill=TRUE) -- any lines having too few fields are padded with NAs. Here "too few" means less than the maximum number of fields observed across all rows in the file."skip"-- broken lines are simply ignored."extract"-- any broken lines are placed into a separate datatable, whereas the "main" datatable retains empty rows in their place. The extra datatable will have at least the following fields:lineno(line number in the original data file),rowno(corresponding row number in the "main" datatable),line(the text of the line),nfields(number of fields detected on that line)Additionally, there should be parameter
report(default FALSE), which is used for strategies"fill"and"skip", and instructsfreadto report to the user line numbers that were filled/skipped.