Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions lib/csv.rb
Original file line number Diff line number Diff line change
Expand Up @@ -854,6 +854,15 @@ def initialize(message, line_number)
end
end

# The error thrown when the parser encounters invalid encoding in CSV.
class InvalidEncodingError < MalformedCSVError
attr_reader :encoding
def initialize(encoding, line_number)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a reader for the target encoding?

Suggested change
def initialize(encoding, line_number)
attr_reader :encoding
def initialize(encoding, line_number)
@encoding = encoding

Copy link
Member

@kou kou Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't get the target encoding if we don't set it to @encoding. (See the above suggestion.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops!

@encoding = encoding
super("Invalid byte sequence in #{encoding}", line_number)
end
end

#
# A FieldInfo Struct contains details about a field's position in the data
# source it was read from. CSV will pass this Struct to some blocks that make
Expand Down
6 changes: 2 additions & 4 deletions lib/csv/parser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -414,8 +414,7 @@ def parse(&block)
else
lineno = @lineno + 1
end
message = "Invalid byte sequence in #{@encoding}"
raise MalformedCSVError.new(message, lineno)
raise InvalidEncodingError.new(@encoding, lineno)
rescue UnexpectedError => error
if @scanner
ignore_broken_line
Expand Down Expand Up @@ -876,8 +875,7 @@ def build_scanner
!line.valid_encoding?
end
if index
message = "Invalid byte sequence in #{@encoding}"
raise MalformedCSVError.new(message, @lineno + index + 1)
raise InvalidEncodingError.new(@encoding, @lineno + index + 1)
end
end
Scanner.new(string)
Expand Down
6 changes: 3 additions & 3 deletions test/csv/interface/test_read.rb
Original file line number Diff line number Diff line change
Expand Up @@ -113,11 +113,11 @@ def test_open_encoding_invalid
file << "\u{1F600},\u{1F601}"
end
CSV.open(@input.path, encoding: "EUC-JP") do |csv|
error = assert_raise(CSV::MalformedCSVError) do
error = assert_raise(CSV::InvalidEncodingError) do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also change the below assert_equal(...) to the following?

assert_equal([Encoding::EUC_JP, "Invalid byte sequence in EUC-JP in line 1."],
             [error.encoding, error.message])

csv.shift
end
assert_equal("Invalid byte sequence in EUC-JP in line 1.",
error.message)
assert_equal([Encoding::EUC_JP, "Invalid byte sequence in EUC-JP in line 1."],
[error.encoding, error.message])
end
end

Expand Down
6 changes: 3 additions & 3 deletions test/csv/test_encodings.rb
Original file line number Diff line number Diff line change
Expand Up @@ -280,12 +280,12 @@ def test_row_separator_detection_with_invalid_encoding
def test_invalid_encoding_row_error
csv = CSV.new("valid,x\rinvalid,\xF8\r".force_encoding("UTF-8"),
encoding: "UTF-8", row_sep: "\r")
error = assert_raise(CSV::MalformedCSVError) do
error = assert_raise(CSV::InvalidEncodingError) do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

csv.shift
csv.shift
end
assert_equal("Invalid byte sequence in UTF-8 in line 2.",
error.message)
assert_equal([Encoding::UTF_8, "Invalid byte sequence in UTF-8 in line 2."],
[error.encoding, error.message])
end

def test_string_input_transcode
Expand Down