Skip to content

Conversation

@CalvinKirs
Copy link
Member

@CalvinKirs CalvinKirs commented Aug 18, 2023

effect version 2.0

  • How to reproduce:
CREATE TABLE `user` (
  `id` bigint(20) NOT NULL COMMENT '用户ID',
  `user_name` text NULL,
  `password` text NULL,
   ...
) ENGINE=OLAP
UNIQUE KEY(`id`)
COMMENT 'OLAP'
PARTITION BY RANGE(`id`)
(PARTITION i1 VALUES [("-9223372036854775808"), ("5000000")),
PARTITION i2 VALUES [("5000000"), ("10000000")),
DISTRIBUTED BY HASH(`id`) BUCKETS 16
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"is_being_synced" = "false",
"storage_format" = "V2",
"enable_unique_key_merge_on_write" = "true",
"light_schema_change" = "true",
"function_column.sequence_col" = "order_id",
"disable_auto_compaction" = "false",
"enable_single_replica_compaction" = "false"
);

Routine Load Sql

CREATE ROUTINE LOAD test_load ON user
WITH MERGE
COLUMNS TERMINATED BY ",",
DELETE ON `y_n` = -1,
ORDER BY order_id
PROPERTIES
(
"desired_concurrent_number" = "1",
"max_error_number" = "0",
"max_batch_interval" = "5",
"max_batch_rows" = "300000",
"max_batch_size" = "209715200",
"format" = "json",
"strip_outer_array" = "false",
"num_as_string" = "false",
"fuzzy_parse" = "false",
"strict_mode" = "false",
"timezone" = "Asia/Shanghai",
"exec_mem_limit" = "2147483648"
)
FROM kafka(
...
)

error log

org.apache.doris.common.UserException: errCode = 2, detailMessage = Table t_user has sequence column, need to specify the sequence column
	at org.apache.doris.planner.external.LoadScanProvider.initColumns(LoadScanProvider.java:197) ~[classes/:?]
	at org.apache.doris.planner.external.LoadScanProvider.createContext(LoadScanProvider.java:104) ~[classes/:?]
	at org.apache.doris.planner.FileLoadScanNode.initParamCreateContexts(FileLoadScanNode.java:131) ~[classes/:?]
	at org.apache.doris.planner.FileLoadScanNode.init(FileLoadScanNode.java:125) ~[classes/:?]
	at org.apache.doris.planner.StreamLoadPlanner.plan(StreamLoadPlanner.java:254) ~[classes/:?]
	at org.apache.doris.planner.StreamLoadPlanner.plan(StreamLoadPlanner.java:115) ~[classes/:?]
	at org.apache.doris.load.routineload.RoutineLoadJob.plan(RoutineLoadJob.java:872) ~[classes/:?]
	at org.apache.doris.load.routineload.KafkaTaskInfo.rePlan(KafkaTaskInfo.java:129) ~[classes/:?]
	at org.apache.doris.load.routineload.KafkaTaskInfo.createRoutineLoadTask(KafkaTaskInfo.java:98) ~[classes/:?]
	at org.apache.doris.load.routineload.RoutineLoadTaskScheduler.scheduleOneTask(RoutineLoadTaskScheduler.java:184) ~[classes/:?]
	at org.apache.doris.load.routineload.RoutineLoadTaskScheduler.process(RoutineLoadTaskScheduler.java:111) ~[classes/:?]
	at org.apache.doris.load.routineload.RoutineLoadTaskScheduler.runAfterCatalogReady(RoutineLoadTaskScheduler.java:84) ~[classes/:?]
	at org.apache.doris.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:58) ~[classes/:?]
	at org.apache.doris.common.util.Daemon.run(Daemon.java:116) ~[classes/:?]
  • Fix
    When the column is not set, the sequence column is automatically added to it

@CalvinKirs
Copy link
Member Author

run buildall

@CalvinKirs
Copy link
Member Author

run buildall

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.64 seconds
stream load tsv: 545 seconds loaded 74807831229 Bytes, about 130 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.9 seconds inserted 10000000 Rows, about 346K ops/s
storage size: 17162206104 Bytes

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.11 seconds
stream load tsv: 545 seconds loaded 74807831229 Bytes, about 130 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 29.2 seconds inserted 10000000 Rows, about 342K ops/s
storage size: 17162066403 Bytes

Copy link
Contributor

@zhannngchen zhannngchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 18, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@morningman morningman merged commit 6847592 into apache:master Aug 18, 2023
@CalvinKirs CalvinKirs deleted the master-load-seq branch August 18, 2023 13:49
xiaokang pushed a commit that referenced this pull request Aug 21, 2023
…ed Sequence column (#23167)

[Fix](RoutineLoad)Fix when Unique (MoW) routineload imports unspecified Sequence column
@xiaokang xiaokang mentioned this pull request Aug 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.0.1-merged p0_w reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants