-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Kafka backend #799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Kafka backend #799
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1. Limit the number of balance tablet selection. The number of tablet selection should be less than the low load path number. 2. Limit the max number of balance task to 500.
1. The balance task does not taking storage medium into account. 2. When repairing tablet with version incomplete, tablet with replica (2-xx), (2-xx), (2-0) can't be handled. 3. Show proc stmt may throw null pointer exception when all replicas are missing.
add_pending_version() is not idempotent upon rpc retry. Transaction will be garbaged collection falsely.
1. Some previous doris version may cause some invalid replica last failed version. 2. Also modify the CREATE TABLE help doc, remove row storage type and random distribution.
SchemaChange convert segment groups in reverse. So SegmentGroup with segment_group_id = 1 may be handled before SegmentGroup with segment_group_id = 0. This will leads to acquiring delta not be allocated. It will be core dump in SIGSEGV.
In streaming ingestion, segment group is set to be one in creation. Upon closing, reference count should to be released. Otherwise, file descriptor and segment group object in memory can not be freed.
* Modify the logic of setting password 1. User can set password for current_user() or if it has GRANT priv 2. And USER() function support
1. Use submit_routine_load_task instead of agentTaskQueue 2. Remove thrift dependency in StreamLoadPlanner and StreamLoadScanNode
1. Add batch submit interface 2. Add Kafka Event callback to catch Kafka events
1. fix the nesting lock of db and txn 2. the txn of task will be init in task scheduler before take task from queue
1. Check if properties is null before check routine load properties 2. Change transactionStateChange reason to string 3. calculate current num by beId 4. Add kafka offset properties 5. Prefer to use previous be id 6. Add before commit listener of txn: if txn is committed after task is aborted, commit will be aborted 7. queryId of stream load plan = taskId
1. init cmt offset in stream load context 2. init default max error num = 5000 rows / per 10000 rows 3. add log builder for routine load job and task 4. clone plan fragment param for every task 5. be does not throw too many filter rows while the init max error ratio is 1
1. stream load executor will abort txn when no correct data in task 2. change txn label to DebugUtil.print(UUID) which is same as task id printed by be 3. change print uuid to hi-lo
1. the stopped and cancelled job will be cleaned after the interval of clean second 2. the interval of clean second * 1000 = current timestamp - end timestamp 3. if job could not fetch topic metadata when need_schedule, job will be cancelled 4. fix the deadlock of job and txn. the lock of txn must be in front of the lock of job 5. the job will be paused or cancelled depend on the abort reason of txn 6. the job will be cancelled immediately if the abort reason named offsets out of range
1. add job id and cluster name to Task info 2. Simplify the logic of getting beIdToMaxConcurrentTaskNum
1. ShowRoutineLoadStmt is sames like class description. It does not support show all of routine load job in all of db 2. ShowRoutineLoadTaskStmt is sames like class description. It does not support show all of routine laod task in all of job 3. Init partitionIdsToOffset in constructor of KafkaProgress 4. Change Create/Pause/Resume/Stop routine load job to LabelName such as [db.]name 5. Exclude final job when updating job 6. Catch all of exception when scheduling one job. The exception will not block the another jobs.
1. Reserve the column order in load stmt. 2. Fix some replay bugs of routine load task.
* Add metrics for routine load * limit the max number of routine load task in backend to 10 * Fix bug that some partitions will no be assigned
mrhhsg
pushed a commit
to mrhhsg/doris
that referenced
this pull request
Oct 4, 2022
* topn detail query optimization: runtime predicate for rows * add missing new files runtime_predicate.h/cpp * topn detail query opt: runtime predicate for segments and pages * topn detail query opt: fe check query to enable opt * do not use topn opt for order by string type * fix nullpointer when check type * support more types and using TypeIndex in runtime_predicate * [enhance-WIP](topn-two-phase) implement topn two phase read (apache#783) * [fix](topn) fix conjunct_expr_root nullptr (apache#794) * [enhance](topn-two-phase) support VOlapScanNode (apache#799) * [enhance](topn) trick setParallelExecNum(1) when using topn optimization (apache#802) * use HeapSorter when _use_topn_opt or no var length field * remove debug log Co-authored-by: Kang <kxiao.tiger@gmail.com>
liaoxin01
pushed a commit
to liaoxin01/doris
that referenced
this pull request
Nov 18, 2022
* Opt perf topn (apache#805) * topn detail query optimization: runtime predicate for rows * add missing new files runtime_predicate.h/cpp * topn detail query opt: runtime predicate for segments and pages * topn detail query opt: fe check query to enable opt * do not use topn opt for order by string type * fix nullpointer when check type * support more types and using TypeIndex in runtime_predicate * [enhance-WIP](topn-two-phase) implement topn two phase read (apache#783) * [fix](topn) fix conjunct_expr_root nullptr (apache#794) * [enhance](topn-two-phase) support VOlapScanNode (apache#799) * [enhance](topn) trick setParallelExecNum(1) when using topn optimization (apache#802) * use HeapSorter when _use_topn_opt or no var length field * remove debug log Co-authored-by: Kang <kxiao.tiger@gmail.com> * [chore](topn-optimize) make code more readable Co-authored-by: Kang <kxiao.tiger@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.