-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Describe the bug
I am trying to read parquet files from a Delta table. The parquet files are snappy compressed. My Delta table has 3 partition columns, so the folder structure of the Delta table looks something like this:
table
-- Col1=ABC
---- Col2=123
------ Col3=abc
-------- part-0000-xxxx-asdf.c00.snappy.parquet
-- Col1=XYZ
---- Col2=789
------ Col3=xyz
-------- part-0000-xyzj-qplg.c00.snappy.parquet
I was originally trying to read a list of files (I need to be able to read an arbitrary list of files), but debugging my issue has brought me to trying to read a single parquet file. My code for this is as follows:
async fn read_parquet(path: String) -> Result<(), DataFusionError> {
let session = SessionContext::new();
let df = session.read_parquet("/Users/me/Downloads/table/Col1=ABC/Col2=123/Col3=abc/part-0000-xxxx-asdf.c00.snappy.parquet", ParquetReadOptions::default()).await?;
Ok(())
}
When I run this, I get:
thread 'main' panicked at src/main.rs:7:35:
called `Result::unwrap()` on an `Err` value: ObjectStore(NotFound { path: "/Users/me/Downloads/table/Col1=ABC/Col2=123/Col3=abc/part-0000-xxxx-asdf.c00.snappy.parquet", source: Os { code: 2, kind: NotFound, message: "No such file or directory" } })
HOWEVER, if I run this same code, but instead of passing the full path to the parquet file, I pass only the directory the file is in ("/Users/me/Downloads/table/Col1=ABC/Col2=123/Col3=abc"), I get no such error and I'm able to successfully read the parquet file.
I'm not sure if I'm doing something wrong, or if this is some kind of bug.
To Reproduce
Try to read a single parquet file from a local, partitioned Delta table, using SessionContext::read_parquet.
Expected behavior
I expect the file to be read into DataFusion as a DataFrame.
Additional context
No response