Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 33 additions & 1 deletion docs/connectors/warehouses-and-lake/paimon.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,4 +103,36 @@ Add a [Type Modification Processor](../../data-transformation/process-node.md#ty

If the test fails, follow the on-screen hints to fix the issue.

:::
:::



## Advanced Node Features

When Paimon is used as the target node in a data replication or transformation task, you can further configure table creation and write-related settings in the node's advanced settings to better balance write performance, table layout, and storage cost.


![Paimon Node Advanced Settings](../../images/paimon_node_advanced_settings.png)


:::tip

The table creation settings below mainly take effect when the target table does not exist and is created automatically by Tapdata. If the target table already exists, Tapdata keeps the existing table schema and table options instead of overwriting them.

:::

| Configuration | Description |
| --- | --- |
| **Hash Key** | When enabled, if there are many primary key or update-condition fields, Tapdata automatically adds an `_hash_key` field to the Paimon table and uses it as the primary key to reduce write overhead in wide-key scenarios. Enable it only when many key fields are affecting write performance. |
| **Partition Key** | Specifies the partition fields for the target table. Leaving it empty means partitioning is disabled. It is recommended for large tables or scenarios where data needs to be organized by date or business dimension. |
| **Bucket Mode** | Supports **Dynamic** and **Fixed** modes. Dynamic mode is suitable for general scenarios and lets the system assign buckets automatically. Fixed mode usually provides more stable write performance, but it should be used together with **Bucket Count**. |
| **Bucket Count** | Takes effect only when **Bucket Mode** is set to **Fixed**. It defines the number of buckets, with a default value of **4**. Set it based on data volume, write concurrency, and small-file control requirements. |
| **File Format** | Specifies the underlying file format used when the table is created. Supported formats include ORC, Parquet, Avro, CSV, JSON, Lance, and Blob. Choose the format based on query engine compatibility, compression ratio, and read/write performance requirements. |
| **Compression Format** | Specifies the compression format for data files. Supported options include None, Snappy, LZ4, ZSTD, GZIP, and BZIP2. This setting usually involves a trade-off between compression ratio, CPU usage, and read/write performance. |
| **Table Properties** | Lets you append Paimon table properties in key-value form for finer-grained control over table behavior. This is useful when you need to customize table creation parameters further. |
| **Write Buffer Size (MB)** | Controls the in-memory buffer size used for writes. The default value is **256 MB**. Increasing it can improve throughput, but it also increases memory consumption. |
| **Write Threads** | Controls the number of parallel write threads. The default value is **4**. Increasing it can improve write concurrency when enough resources are available, but it also increases resource usage. |
| **Enable Auto Compaction** | Controls whether automatic compaction is enabled. When enabled, it helps reduce small files and improve query performance. When disabled, it reduces compaction overhead but may result in more small files. |
| **Compaction Interval (minutes)** | Takes effect after **Enable Auto Compaction** is enabled. It controls how often automatic compaction runs, with a default value of **30** minutes. |
| **Target File Size (MB)** | Controls the target size of data files. The default value is **128 MB**. Increasing it can help reduce the number of small files, but it also increases the processing cost of each individual file. |
| **Enable Primary Key Update Detection** | When enabled, if a primary key value changes, Tapdata converts the update into a delete of the old record followed by an insert of the new record. This feature requires the source to provide before-update data. If that data is unavailable, the task will fail, and enabling this feature will reduce update performance. |
Binary file added docs/images/paimon_node_advanced_settings.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
30 changes: 30 additions & 0 deletions docs/release-notes-on-prem.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,36 @@
<TabItem value="Version 4.x" default>
```

## 4.15.0

### New Features

- The [Paimon connector](./connectors/warehouses-and-lake/paimon.md#advanced-node-features) now supports per-table partitioning configuration, allowing customization of partition keys, buckets, compression formats, and other attributes to improve data management flexibility.

### Enhancements

- Significantly improved the data writing performance of the Paimon connector under multiple update conditions.
- The StarRocks connector now adapts to High Availability (HA) architectures, ensuring continuous and stable data writing.
- Added WSS protocol support for the communication port (8246) between the engine and the management node, enhancing communication security.

Check failure on line 27 in docs/release-notes-on-prem.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`WSS` is not a recognized word. (unrecognized-spelling)

### Bug Fixes

- Fixed an issue where AVRO formatted data with Debezium structure in Kafka could not be parsed correctly.

Check failure on line 31 in docs/release-notes-on-prem.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`AVRO` is not a recognized word. (unrecognized-spelling)
- Fixed an issue where the MariaDB to Doris synchronization task failed during the CDC phase after the full sync was completed.
- Fixed a synchronization failure in MySQL tasks caused by a field rename processor exception.
- Fixed a task execution error caused by `BSONRegExp` type data when syncing from MongoDB to MySQL.
- Fixed an issue where PostgreSQL CDC tasks could not be recovered after an engine restart due to cleaned-up WAL logs.
- Fixed a task error when enabling the full breakpoint resume function for TDSQL.
- Fixed an issue where the API Server audit log still contained exception information when the request was successful (Code 200).
- Fixed an issue where duplicate name validation was missing when creating an API.
- Fixed an issue where published API interfaces could not correctly handle parameter filtering.
- Fixed a failure when publishing an API based on MySQL.
- Fixed an issue where the generated API could not query data by primary key after the `_id` field was deleted in the synchronization task.
- Fixed a task startup failure when using MongoDB 8.0 as intermediate storage.
- Fixed a performance degradation in full data batch writing after enabling multi-table concurrent reading.
- Fixed a false alarm stating the tag did not exist when starting a task after modifying the connection's engine tag.
- Fixed the loss of query filter status when returning to the list page from the edit page on the connection management page.
- Fixed an issue where scheduled validation tasks abnormally triggered secondary validations in a loop.

## 4.14.0

Expand Down
Loading