Kafka backend #799

morningman · 2019-03-25T03:20:34Z

No description provided.

1. Limit the number of balance tablet selection. The number of tablet selection should be less than the low load path number. 2. Limit the max number of balance task to 500.

…ors (apache#678)

1. The balance task does not taking storage medium into account. 2. When repairing tablet with version incomplete, tablet with replica (2-xx), (2-xx), (2-0) can't be handled. 3. Show proc stmt may throw null pointer exception when all replicas are missing.

apache#731

add_pending_version() is not idempotent upon rpc retry. Transaction will be garbaged collection falsely.

1. Some previous doris version may cause some invalid replica last failed version. 2. Also modify the CREATE TABLE help doc, remove row storage type and random distribution.

…ne (apache#757)

apache#783

SchemaChange convert segment groups in reverse. So SegmentGroup with segment_group_id = 1 may be handled before SegmentGroup with segment_group_id = 0. This will leads to acquiring delta not be allocated. It will be core dump in SIGSEGV.

In streaming ingestion, segment group is set to be one in creation. Upon closing, reference count should to be released. Otherwise, file descriptor and segment group object in memory can not be freed.

…ache#792)

* Modify the logic of setting password 1. User can set password for current_user() or if it has GRANT priv 2. And USER() function support

1. Use submit_routine_load_task instead of agentTaskQueue 2. Remove thrift dependency in StreamLoadPlanner and StreamLoadScanNode

1. Add batch submit interface 2. Add Kafka Event callback to catch Kafka events

1. fix the nesting lock of db and txn 2. the txn of task will be init in task scheduler before take task from queue

1. Check if properties is null before check routine load properties 2. Change transactionStateChange reason to string 3. calculate current num by beId 4. Add kafka offset properties 5. Prefer to use previous be id 6. Add before commit listener of txn: if txn is committed after task is aborted, commit will be aborted 7. queryId of stream load plan = taskId

…apache#723)

1. init cmt offset in stream load context 2. init default max error num = 5000 rows / per 10000 rows 3. add log builder for routine load job and task 4. clone plan fragment param for every task 5. be does not throw too many filter rows while the init max error ratio is 1

1. stream load executor will abort txn when no correct data in task 2. change txn label to DebugUtil.print(UUID) which is same as task id printed by be 3. change print uuid to hi-lo

1. the stopped and cancelled job will be cleaned after the interval of clean second 2. the interval of clean second * 1000 = current timestamp - end timestamp 3. if job could not fetch topic metadata when need_schedule, job will be cancelled 4. fix the deadlock of job and txn. the lock of txn must be in front of the lock of job 5. the job will be paused or cancelled depend on the abort reason of txn 6. the job will be cancelled immediately if the abort reason named offsets out of range

1. add job id and cluster name to Task info 2. Simplify the logic of getting beIdToMaxConcurrentTaskNum

1. ShowRoutineLoadStmt is sames like class description. It does not support show all of routine load job in all of db 2. ShowRoutineLoadTaskStmt is sames like class description. It does not support show all of routine laod task in all of job 3. Init partitionIdsToOffset in constructor of KafkaProgress 4. Change Create/Pause/Resume/Stop routine load job to LabelName such as [db.]name 5. Exclude final job when updating job 6. Catch all of exception when scheduling one job. The exception will not block the another jobs.

1. Reserve the column order in load stmt. 2. Fix some replay bugs of routine load task.

* Add metrics for routine load * limit the max number of routine load task in backend to 10 * Fix bug that some partitions will no be assigned

* topn detail query optimization: runtime predicate for rows * add missing new files runtime_predicate.h/cpp * topn detail query opt: runtime predicate for segments and pages * topn detail query opt: fe check query to enable opt * do not use topn opt for order by string type * fix nullpointer when check type * support more types and using TypeIndex in runtime_predicate * [enhance-WIP](topn-two-phase) implement topn two phase read (apache#783) * [fix](topn) fix conjunct_expr_root nullptr (apache#794) * [enhance](topn-two-phase) support VOlapScanNode (apache#799) * [enhance](topn) trick setParallelExecNum(1) when using topn optimization (apache#802) * use HeapSorter when _use_topn_opt or no var length field * remove debug log Co-authored-by: Kang <kxiao.tiger@gmail.com>

* Opt perf topn (apache#805) * topn detail query optimization: runtime predicate for rows * add missing new files runtime_predicate.h/cpp * topn detail query opt: runtime predicate for segments and pages * topn detail query opt: fe check query to enable opt * do not use topn opt for order by string type * fix nullpointer when check type * support more types and using TypeIndex in runtime_predicate * [enhance-WIP](topn-two-phase) implement topn two phase read (apache#783) * [fix](topn) fix conjunct_expr_root nullptr (apache#794) * [enhance](topn-two-phase) support VOlapScanNode (apache#799) * [enhance](topn) trick setParallelExecNum(1) when using topn optimization (apache#802) * use HeapSorter when _use_topn_opt or no var length field * remove debug log Co-authored-by: Kang <kxiao.tiger@gmail.com> * [chore](topn-optimize) make code more readable Co-authored-by: Kang <kxiao.tiger@gmail.com>

morningman and others added 30 commits February 28, 2019 10:11

Limit the number of balance task (apache#674)

b2d8fcd

1. Limit the number of balance tablet selection. The number of tablet selection should be less than the low load path number. 2. Limit the max number of balance task to 500.

Remove the running partition mark when delete operation encounter err…

acf839c

…ors (apache#678)

Clear etl job files when job finished (apache#680)

f2bd98f

Remove sensitive info (apache#692)

4dbbd32

Add esquery function (apache#652)

7965a71

Fix balance with diff storage medium (apache#705)

584b437

Fix bug that compareTo in PartitionKey throws cast error (apache#720)

b3fd53a

Clean timeout tablets channel in TabletWriterMgr (apache#718)

d67aeb8

Fix the error of variable_length for Decimal (apache#724)

e2717e1

Support calculate unix_timestamp() on Frontend (apache#732)

5f9e82b

apache#731

Fix not matched error code (apache#740)

7feb27e

Rollback the fix of variable_length for Decimal (apache#744)

cc2fd43

Fix transaction non-idempotency error (apache#749)

e970f28

add_pending_version() is not idempotent upon rpc retry. Transaction will be garbaged collection falsely.

Add comment to avoid modification for variable_length (apache#750)

297a5f2

Fix bug of invalid replica last failed version (apache#746)

c11e78c

1. Some previous doris version may cause some invalid replica last failed version. 2. Also modify the CREATE TABLE help doc, remove row storage type and random distribution.

Fix bug that balance slot may not be released when balance task is do…

4a3d9dd

…ne (apache#757)

Update compile instruction in README.md (apache#763)

28ea424

Remove colocate table meta when drop db (apache#761)

2a152e0

Update curl version (apache#766)

5e80dca

Add EsTableDescriptor in be (apache#775)

1f092bb

Add http post feature for HttpClient (apache#773)

fb4e77d

Fix bug: stream load ignore last line with no-newline (apache#785)

11307b2

apache#783

Release SegmentGroup reference count (apache#790)

e60b71d

In streaming ingestion, segment group is set to be one in creation. Upon closing, reference count should to be released. Otherwise, file descriptor and segment group object in memory can not be freed.

Decimal optimize branch apache#695 (apache#727)

c34b306

Fix bug that throws exception when pruning partition type is date (ap…

722a4db

…ache#792)

Fix doris on es bug (apache#791)

f4a63b2

Fix bug that Greatest get wrong function's symbol (apache#796)

504fbc3

Modify the logic of setting password (apache#798)

d47600e

* Modify the logic of setting password 1. User can set password for current_user() or if it has GRANT priv 2. And USER() function support

morningman and others added 23 commits March 25, 2019 10:16

Implement the routine load process of Kafka on Backend (apache#671)

bb3d5f2

Add unit test (apache#675)

970e4e7

Submit routine load task immediately (apache#682)

33c57f2

1. Use submit_routine_load_task instead of agentTaskQueue 2. Remove thrift dependency in StreamLoadPlanner and StreamLoadScanNode

Modify interface (apache#684)

e8ea90e

1. Add batch submit interface 2. Add Kafka Event callback to catch Kafka events

Put begin txn into task scheduler (apache#687)

872c4dc

1. fix the nesting lock of db and txn 2. the txn of task will be init in task scheduler before take task from queue

Add a data consumer pool to reuse the data consumer (apache#691)

c20548e

Add missing files (apache#696)

6d2f9ba

Missing to set auth code (apache#699)

a19ebaa

Add some logs (apache#711)

12d9385

Fix bug that data consumer should be removed from pool when being got (…

86217bd

…apache#723)

Stream load with no data will abort txn (apache#735)

de51fc2

1. stream load executor will abort txn when no correct data in task 2. change txn label to DebugUtil.print(UUID) which is same as task id printed by be 3. change print uuid to hi-lo

Add persist operations for routine load job (apache#754)

11ecd71

modify the replay logic of routine load job (apache#762)

3b99673

Modify some task scheduler logic (apache#767)

dce0dd4

1. add job id and cluster name to Task info 2. Simplify the logic of getting beIdToMaxConcurrentTaskNum

Fix routine load replay bugs (apache#770)

4ba4001

Add a cleaner bg thread to clean idle data consumer (apache#776)

5be079d

Fix some routine load bugs (apache#787)

e95d081

1. Reserve the column order in load stmt. 2. Fix some replay bugs of routine load task.

Add metrics for routine load (apache#795)

1f7f3db

* Add metrics for routine load * limit the max number of routine load task in backend to 10 * Fix bug that some partitions will no be assigned

Merge master and fix BE ut

f459ddf

morningman closed this Mar 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kafka backend #799

Kafka backend #799

Uh oh!

morningman commented Mar 25, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Kafka backend #799

Kafka backend #799

Uh oh!

Conversation

morningman commented Mar 25, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants