Skip to content

Draft Parquet Docs#140

Open
bandle wants to merge 2 commits intomasterfrom
bandle/docs
Open

Draft Parquet Docs#140
bandle wants to merge 2 commits intomasterfrom
bandle/docs

Conversation

@bandle
Copy link
Contributor

@bandle bandle commented Feb 12, 2026

No description provided.

They can be used to import information from different data systems into CedarDB.

{{<callout type="info">}}
While interactive querying of Parquet files is also possible, CedarDB is optimized for it's own storage engine.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
While interactive querying of Parquet files is also possible, CedarDB is optimized for it's own storage engine.
While interactive querying of Parquet files is also possible, CedarDB is optimized for its own storage engine.

## Examples
Read a parquet file
```sql
SELECT * FROM 'test.parquet';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also show the non-shortened syntax with parquet_view?


## Creating a Table from parquet

Either you can load parquet data directly into a table:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Either you can load parquet data directly into a table:
You can either load parquet data directly into a table:


CedarDB's parquet scan is optimized for full parquet file imports.
The scan is fully multi-threaded and only reads the columns that are queried by the user.
We do not yet push-down filters into the parquet rowgroups to prune based on parquet statistics and metadata.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would phrase this more positively for us, something like this:

Suggested change
We do not yet push-down filters into the parquet rowgroups to prune based on parquet statistics and metadata.
Evaluating filter predicates is most efficient in CedarDB's native data format.

And then you could add a "Not yet supported" thing in the "Enhanced Features" table below saying "Using Parquet statistics for better filtering" or something in that direction

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants