Skip to content

A placeholder error message is printed when parsing files with ambiguous options #225

@adamroyjones

Description

@adamroyjones

I've noticed that the csv gem produces a placeholder error message when attempting to parse a TSV with ambiguous parsing options.

If one were to parse the string

foo\t\tbar

when directed to strip whitespace (i.e., to treat whitespace as insignificant) and use tabs as field separators (i.e., to treat whitespace as significant), then should one parse it as ["foo\t\tbar"], ["foo", "bar"], or ["foo", "", "bar"]? There's no way to choose a parsing strategy that wouldn't cause a reasonable surprise to someone.

I think an error should be produced that informs the user that this is a problem; at the moment, this doesn't happen. With Ruby 3.0.2 and csv 3.2.1, the file

require "csv"

sep = "\t"
tsv_file = "example.tsv"

File.open(tsv_file, "w") { |f| f.puts("foo#{sep}bar") }
CSV.read(tsv_file, col_sep: sep, strip: true)

produces the error

csv-3.2.1/lib/csv/parser.rb:935:in `parse_quotable_robust': TODO: Meaningful message in line 1. (CSV::MalformedCSVError)

I think this is misleading; the file is not really "malformed".

Do you agree with the above? If so, I'm happy to offer up a merge request around this (e.g. to check the options before beginning the parsing, to make sure that if strip is set to true then col_sep must not be whitespace).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions