-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Describe the bug
I would like to read a .tsv file using the datafusion-cli. However, the file isn't recognised because the file extension is .tsv instead of the default .csv. In vanilla datafusion, I can specify CsvReadOptions::new().file_extension(".tsv"), but from what I can see there is no similar option available in the datafusion-cli (correct me if I'm wrong).
To Reproduce
In bash:
echo "col1, col2" > test.tsv
echo "1, 2" >> test.tsv
In datafusion-cli:
create external table test stored as csv with header row location "test.tsv";
select * from test;
gives:
0 rows in set. Query took 0.001 seconds.
Expected behavior
Technically this is the expected behaviour, but it would be nice if there was a way to read the .tsv file and return the rows from it.
It would also be nice if the file_extension was only needed if the specified location is a directory. I.e. if I specify a file, I don't see why there's a need to separately specify the extension.
Additional context
I'd like to work on this, but I don't know what the best approach is.
E.g. a few ways I can think of are:
- making this specifiable in the sql statement itself, as is the case with
delimiter xandwith header row - adding a global option
datafusion.catalog.file_extesion - (if possible) using a method like Hive's
tblproperties
Let me know what you think, cheers
P.S. I have another issue reading .tsv files using the datafusion-cli here: #6397.