Skip to content

Conversation

@glfeng318
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

This allows users to specify the file extendsion in ParquetReadOptions

let opt = ParquetReadOptions::new().file_extension("").table_partition_cols(vec![("dt".as_str, DataType::Utf8)]);

What changes are included in this PR?

add a new fn file_extension for ParquetReadOptions

Are these changes tested?

tests covered by existing tests

Are there any user-facing changes?

@github-actions github-actions bot added the core Core DataFusion crate label Nov 11, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @glfeng318 -- this looks like a nice improvement to me

@glfeng318 glfeng318 requested a review from alamb November 13, 2024 07:39
@glfeng318
Copy link
Contributor Author

cargo fmt check failed and i had removed the whitespace in line 270

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

Curious, is the only use-case to set this to ""?

@glfeng318
Copy link
Contributor Author

No,
in most case, file_extension is set by the defualt .parquet
and some framework/apps(in my case: hive) may generate parquet files without extension, so it was set to ""

@alamb alamb merged commit de450d4 into apache:main Nov 14, 2024
@alamb
Copy link
Contributor

alamb commented Nov 14, 2024

Thanks again @glfeng318

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants