Skip to content

No way to specify file extension in datafusion-cli #6403

@casperhart

Description

@casperhart

Describe the bug

I would like to read a .tsv file using the datafusion-cli. However, the file isn't recognised because the file extension is .tsv instead of the default .csv. In vanilla datafusion, I can specify CsvReadOptions::new().file_extension(".tsv"), but from what I can see there is no similar option available in the datafusion-cli (correct me if I'm wrong).

To Reproduce

In bash:

echo "col1, col2" > test.tsv
echo "1, 2" >> test.tsv 

In datafusion-cli:

create external table test stored as csv with header row location "test.tsv";
select * from test;

gives:

0 rows in set. Query took 0.001 seconds.

Expected behavior

Technically this is the expected behaviour, but it would be nice if there was a way to read the .tsv file and return the rows from it.

It would also be nice if the file_extension was only needed if the specified location is a directory. I.e. if I specify a file, I don't see why there's a need to separately specify the extension.

Additional context

I'd like to work on this, but I don't know what the best approach is.

E.g. a few ways I can think of are:

  • making this specifiable in the sql statement itself, as is the case with delimiter x and with header row
  • adding a global option datafusion.catalog.file_extesion
  • (if possible) using a method like Hive's tblproperties

Let me know what you think, cheers

P.S. I have another issue reading .tsv files using the datafusion-cli here: #6397.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions