Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/content/append-table/bucketed.md
Original file line number Diff line number Diff line change
Expand Up @@ -196,4 +196,4 @@ The `spark.sql.sources.v2.bucketing.enabled` config is used to enable bucketing
Spark will recognize the specific distribution reported by a V2 data source through SupportsReportPartitioning, and
will try to avoid shuffle if necessary.

The costly join shuffle will be avoided if two tables have same bucketing strategy and same number of buckets.
The costly join shuffle will be avoided if two tables have the same bucketing strategy and same number of buckets.
4 changes: 2 additions & 2 deletions docs/content/append-table/query-performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ filtering, if the filtering effect is good, the query would have been minutes of
milliseconds to complete the execution.

Often the data distribution is not always effective filtering, so if we can sort the data by the field in `WHERE` condition?
You can take a look to [Flink COMPACT Action]({{< ref "maintenance/dedicated-compaction#sort-compact" >}}) or
You can take a look at [Flink COMPACT Action]({{< ref "maintenance/dedicated-compaction#sort-compact" >}}) or
[Flink COMPACT Procedure]({{< ref "flink/procedures" >}}) or [Spark COMPACT Procedure]({{< ref "spark/procedures" >}}).

## Data Skipping By File Index
Expand All @@ -54,7 +54,7 @@ file is too small, it will be stored directly in the manifest, otherwise in the
corresponds to an index file, which has a separate file definition and can contain different types of indexes with
multiple columns.

Different file index may be efficient in different scenario. For example bloom filter may speed up query in point lookup
Different file indexes may be efficient in different scenarios. For example bloom filter may speed up query in point lookup
scenario. Using a bitmap may consume more space but can result in greater accuracy.

`Bloom Filter`:
Expand Down
2 changes: 1 addition & 1 deletion docs/content/primary-key-table/compaction.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ Paimon also provides a configuration that allows for regular execution of Full C

1. 'compaction.optimization-interval': Implying how often to perform an optimization full compaction, this
configuration is used to ensure the query timeliness of the read-optimized system table.
2. 'full-compaction.delta-commits': Full compaction will be constantly triggered after delta commits. its disadvantage
2. 'full-compaction.delta-commits': Full compaction will be constantly triggered after delta commits. Its disadvantage
is that it can only perform compaction synchronously, which will affect writing efficiency.

## Compaction Options
Expand Down
2 changes: 1 addition & 1 deletion docs/content/primary-key-table/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,6 @@ Records within a data file are sorted by their primary keys. Within a sorted run

{{< img src="/img/sorted-runs.png">}}

As you can see, different sorted runs may have overlapping primary key ranges, and may even contain the same primary key. When querying the LSM tree, all sorted runs must be combined and all records with the same primary key must be merged according to the user-specified [merge engine]({{< ref "primary-key-table/merge-engine/overview" >}}) and the timestamp of each record.
As you can see, different sorted runs may have overlapped primary key ranges, and may even contain the same primary key. When querying the LSM tree, all sorted runs must be combined and all records with the same primary key must be merged according to the user-specified [merge engine]({{< ref "primary-key-table/merge-engine/overview" >}}) and the timestamp of each record.

New records written into the LSM tree will be first buffered in memory. When the memory buffer is full, all records in memory will be sorted and flushed to disk. A new sorted run is now created.
2 changes: 1 addition & 1 deletion docs/content/primary-key-table/query-performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ For Merge On Read table, the most important thing you should pay attention to is
the concurrency of reading data.

For MOW (Deletion Vectors) or COW table or [Read Optimized]({{< ref "concepts/system-tables#read-optimized-table" >}}) table,
There is no limit to the concurrency of reading data, and they can also utilize some filtering conditions for non-primary-key columns.
there is no limit to the concurrency of reading data, and they can also utilize some filtering conditions for non-primary-key columns.

## Data Skipping By Primary Key Filter

Expand Down
Loading