[Proposal] Doris support version column for REPLACE aggregate type

### BackGround
Doris currently use REPLACE to update data, but the replacement order cannot be guaranteed for the data import of the same batch. The user needs to guarantee that there is no same key column in the imported data of the same batch to guarantee the replacement order, which is very inconvenient for the user. To solve this problem, we can use a **version** column to specify the replacement order.

### Goal
The user specifies a **version column** when creating the table. Doris relies on this column to update the data of REPLACE type. The larger version column data can REPLACE the data of the smaller version column, while the data of the smaller version column cannot REPLACE the larger version column data.

### Create Table Interface
```
CREATE TABLE `test` (
`id` bigint(20) NOT NULL,
`date` date NOT NULL,
`group_id` bigint(20) NOT NULL,
`version` int MAX NOT NULL,
`keyword` varchar(128) REPLACE NOT NULL,
`clicks` bigint(20) SUM NULL DEFAULT "0" ,
`cost` bigint(20) SUM NULL DEFAULT "0" 
) ENGINE=OLAP
AGGREGATE KEY(`id`, `date`, `group_id`)
DISTRIBUTED BY HASH(`id`) BUCKETS 16
PROPERTIES (
  "replace_version_column" = "version"
);
```
When creating a table, the user simply adds the **replace_version_column** attribute in PROPERTIES to identify the version column, which requires a MAX aggregation type to ensure that only the largest version column is retained for the same key column.

### Query 
When a user's query does not contain the REPLACE column, the original logic follows. When a user's query contains REPLACE columns, BE needs to extend the Version column on which the REPLACE column depends, and compare the value column when it is aggregated. These operations can be done by extending **Reader return columns**, and in FE，the **isPreAggregation** is OFF because of the REPLACE column is value column in StorageEngine
，which means the storage engine needs to aggregate the data before returning to scan node，so we can guarantee that the same key columns will be aggregated in Reader.


### Compaction
Base and Cumulative Compaction use Reader to aggregate data, and it use all tablet columns as return columns, so similar to the query processing, we can use Reader for replace based on version columns.

### Load
With the same batch of data load, Doris uses one or more **MemTable**.  We need to ensure that the same key column in one MemTable, columns of REPLACE type are replaced with version column, while the data in different MemTable is not guaranteed in LOAD because Query and Compaction guarantee the order of replacement.

### RollUp
If rollup contains a column of REPLACE type, we need the user to add the Replace version column or extend the column automatically.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Doris support version column for REPLACE aggregate type #3930

BackGround

Goal

Create Table Interface

Query

Compaction

Load

RollUp

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Proposal] Doris support version column for REPLACE aggregate type #3930

Description

BackGround

Goal

Create Table Interface

Query

Compaction

Load

RollUp

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions