-
Notifications
You must be signed in to change notification settings - Fork 123
Description
I've noticed that the csv gem produces a placeholder error message when attempting to parse a TSV with ambiguous parsing options.
If one were to parse the string
foo\t\tbar
when directed to strip whitespace (i.e., to treat whitespace as insignificant) and use tabs as field separators (i.e., to treat whitespace as significant), then should one parse it as ["foo\t\tbar"], ["foo", "bar"], or ["foo", "", "bar"]? There's no way to choose a parsing strategy that wouldn't cause a reasonable surprise to someone.
I think an error should be produced that informs the user that this is a problem; at the moment, this doesn't happen. With Ruby 3.0.2 and csv 3.2.1, the file
require "csv"
sep = "\t"
tsv_file = "example.tsv"
File.open(tsv_file, "w") { |f| f.puts("foo#{sep}bar") }
CSV.read(tsv_file, col_sep: sep, strip: true)produces the error
csv-3.2.1/lib/csv/parser.rb:935:in `parse_quotable_robust': TODO: Meaningful message in line 1. (CSV::MalformedCSVError)
I think this is misleading; the file is not really "malformed".
Do you agree with the above? If so, I'm happy to offer up a merge request around this (e.g. to check the options before beginning the parsing, to make sure that if strip is set to true then col_sep must not be whitespace).