Skip to content

Conversation

@mikicz
Copy link
Collaborator

@mikicz mikicz commented Feb 27, 2024

Basically we came across csvs which had patterns equal to patterns in csv_gp_python/tests/fixtures/quote_and_newline.csv, which are valid (get loaded fine by excel & pandas), but we didn't handle correctly

@mikicz mikicz force-pushed the bugfix/incorrect-handling-quotes branch 2 times, most recently from 954496d to 4207d4b Compare February 27, 2024 15:17
@mikicz mikicz requested a review from blaginin February 27, 2024 15:18
@mikicz mikicz requested a review from blaginin February 28, 2024 11:10
@blaginin
Copy link

now this one fails as invalid

a,b,c
"lll","""",""",
"

@mikicz
Copy link
Collaborator Author

mikicz commented Feb 28, 2024

fuuuuuuuuck

@mikicz mikicz force-pushed the bugfix/incorrect-handling-quotes branch from 9a3c236 to dc1980f Compare February 28, 2024 15:46
@mikicz
Copy link
Collaborator Author

mikicz commented Feb 28, 2024

Ok, throw some more cases at me! 😄

@blaginin
Copy link

Ok, throw some more cases at me! 😄

well if you're asking 😉

@blaginin
Copy link

blaginin commented Feb 28, 2024

I think this one is also not correctly handled

a,b,c
1,2 "and three",3

quotes might be part of the cell, and not necessarily wrap the whole cell. This reads correctly by pandas or excel. But your version marks it as incorrect

image

Not sure if it's the same bug / worth fixing in the same pr tho

@blaginin
Copy link

blaginin commented Feb 28, 2024

if you're going to fix this here: maybe check that that you won't accidentally make this valid

a,b,c
1,2 "and
three",3

@mikicz
Copy link
Collaborator Author

mikicz commented Feb 28, 2024

Those do get detected as invalid, see the test I just added

@blaginin
Copy link

blaginin commented Feb 28, 2024

but the first one really should be valid?

@mikicz
Copy link
Collaborator Author

mikicz commented Feb 29, 2024

Hmmm according to RFC 4180 it's valid but it it's not what we ask for when we ask for csvs - all cells to be quoted and enclosed in a quoted cell

@mikicz
Copy link
Collaborator Author

mikicz commented Feb 29, 2024

I messed that sentence up in the rewrite

  • all quotes to be escaped, and enclosed in quoted cells

@mikicz mikicz merged commit 31993c8 into main Feb 29, 2024
@mikicz mikicz deleted the bugfix/incorrect-handling-quotes branch February 29, 2024 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

6 participants