[fix](multi-catalog)put java udf to custom lib #35983

wsjz · 2024-06-06T08:54:56Z

Proposed changes

* [chore](binlog) Add logs about binlog gc (#34359) * [feature](binlog) Support gc binlogs by history nums and size (#34888)

followup #35241 In #35241, we update the doris-shade version to 2.1.0, which already contains dlf dependencies. pick part of #34749, to remove dlf dependencies in fe/pom.xml

* [improvement](mtmv) Split the expression mapping in LogicalCompatibilityContext for performance (#34646) Need query to view expression mapping when check the logic of hyper graph is equals or not. Getting all expression mapping one-time may affect performance. So split the expresson to three type JOIN_EDGE, NODE, FILTER_EDGE and get them step by step. * fix code style

Support Single table query rewrite with out group by this is useful for complex filter or expresission the mv def and query is as following which can be query rewritten mv def: ``` select * from lineitem where l_comment like '%xx%' ``` query: ``` select l_linenumber, l_receiptdate from lineitem where l_comment like '%xx%' ``` Co-authored-by: zfr9527 <qhu15zhang3294197@163.com>

(cherry picked from commit adcbc8c)

…oid> (#34873) Followup #34797 `static_cast<void>` has ignored the wrong status, some of them should make the query finished with error status, so replace `static_cast<void>` with `RETURN_IF_ERROR`. The following three scenarios need to be handled separately and cannot be simply replaced: 1. The outer function returns void; 2. Call status function inner constructors or destructors; 3. Call status function with best effort, and should ignore the wrong status.

…_rules=PRUNE_EMPTY_PARTITION (#35151)

…log (#35137)

…partitions (#35131)

Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>

…rflow (#35206) Co-authored-by: Luennng <luennng@gmail.com>

…higher priority (#35295)

…s toThrift method (#35274)

@1

pick from master #35200 Description: The sql execute much slow when the literal value with string format in `in predicate`; and the real data is integral type。 ``` mysql> set enable_nereids_planner = false; Query OK, 0 rows affected (0.03 sec) mysql> select id,sum(clicks) from a_table where id in ('787934713', '306960695') group by id limit 10; +------------+---------------+ | id | sum(`clicks`) | +------------+---------------+ | 787934713 | 2838 | | 306960695 | 339 | +------------+---------------+ 2 rows in set (1.81 sec) mysql> set enable_nereids_planner = true; Query OK, 0 rows affected (0.02 sec) mysql> select id,sum(clicks) from a_table where id in ('787934713', '306960695') group by id limit 10; +------------+-------------+ | id | sum(clicks) | +------------+-------------+ | 787934713 | 2838 | | 306960695 | 339 | +------------+-------------+ 2 rows in set (28.14 sec) ``` Reason: In legacy planner, the string literal with convert to integral value, but in the nereids planner do not do this convert and with do string matching in BE。 Solved: do process string literal with numeric in `in predicate` like in `comparison predicate`; test table: ``` create table a_table( k1 BIGINT NOT NULL, k2 VARCHAR(100) NOT NULL, v1 INT SUM NULL DEFAULT "0" ) ENGINE=OLAP AGGREGATE KEY(k1,k2) distributed BY hash(k1) buckets 2 properties("replication_num" = "1"); insert into a_table values (10, 'name1', 10),(20, 'name2', 10); explain plan select * from a_table where k1 in ('10', '20001'); ``` before optimize: ``` +--------------------------------------------------------------------------------------------------------------------------------------+ | Explain String(Nereids Planner) | +--------------------------------------------------------------------------------------------------------------------------------------+ | ========== PARSED PLAN (time: 1ms) ========== | | UnboundResultSink[4] ( ) | | +--LogicalProject[3] ( distinct=false, projects=[*], excepts=[] ) | | +--LogicalFilter[2] ( predicates='k1 IN ('10001', '20001') ) | | +--LogicalCheckPolicy ( ) | | +--UnboundRelation ( id=RelationId#0, nameParts=a_table ) | | | | ========== ANALYZED PLAN (time: 2ms) ========== | | LogicalResultSink[15] ( outputExprs=[k1#0, k2#1, v1#2] ) | | +--LogicalProject[13] ( distinct=false, projects=[k1#0, k2#1, v1#2], excepts=[] ) | | +--LogicalFilter[11] ( predicates=cast(k1#0 as TEXT) IN ('10001', '20001') ) | | +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=<index_not_selected>, selectedIndexId=12003, preAgg=UNSET ) | | | | ========== REWRITTEN PLAN (time: 6ms) ========== | | LogicalResultSink[45] ( outputExprs=[k1#0, k2#1, v1#2] ) | | +--LogicalFilter[43] ( predicates=cast(k1#0 as TEXT) IN ('10001', '20001') ) | | +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=a_table, selectedIndexId=12003, preAgg=OFF, No aggregate on scan. ) | | | | ========== OPTIMIZED PLAN (time: 6ms) ========== | | PhysicalResultSink[90] ( outputExprs=[k1#0, k2#1, v1#2] ) | | +--PhysicalDistribute[87]@1 ( stats=0.33, distributionSpec=DistributionSpecGather ) | | +--PhysicalFilter[84]@1 ( stats=0.33, predicates=cast(k1#0 as TEXT) IN ('10001', '20001') ) | | +--PhysicalOlapScan[a_table]@0 ( stats=1 ) | +--------------------------------------------------------------------------------------------------------------------------------------+ ``` after optimize: ``` +--------------------------------------------------------------------------------------------------------------------------------------+ | Explain String(Nereids Planner) | +--------------------------------------------------------------------------------------------------------------------------------------+ | ========== PARSED PLAN (time: 15ms) ========== | | UnboundResultSink[4] ( ) | | +--LogicalProject[3] ( distinct=false, projects=[*], excepts=[] ) | | +--LogicalFilter[2] ( predicates='k1 IN ('10001', '20001') ) | | +--LogicalCheckPolicy ( ) | | +--UnboundRelation ( id=RelationId#0, nameParts=a_table ) | | | | ========== ANALYZED PLAN (time: 11ms) ========== | | LogicalResultSink[15] ( outputExprs=[k1#0, k2#1, v1#2] ) | | +--LogicalProject[13] ( distinct=false, projects=[k1#0, k2#1, v1#2], excepts=[] ) | | +--LogicalFilter[11] ( predicates=k1#0 IN (10001, 20001) ) | | +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=<index_not_selected>, selectedIndexId=12003, preAgg=UNSET ) | | | | ========== REWRITTEN PLAN (time: 12ms) ========== | | LogicalResultSink[45] ( outputExprs=[k1#0, k2#1, v1#2] ) | | +--LogicalFilter[43] ( predicates=k1#0 IN (10001, 20001) ) | | +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=a_table, selectedIndexId=12003, preAgg=OFF, No aggregate on scan. ) | | | | ========== OPTIMIZED PLAN (time: 4ms) ========== | | PhysicalResultSink[90] ( outputExprs=[k1#0, k2#1, v1#2] ) | | +--PhysicalDistribute[87]@1 ( stats=0, distributionSpec=DistributionSpecGather ) | | +--PhysicalFilter[84]@1 ( stats=0, predicates=k1#0 IN (10001, 20001) ) | | +--PhysicalOlapScan[a_table]@0 ( stats=2 ) | +--------------------------------------------------------------------------------------------------------------------------------------+ ```

) (#35354)

support count(*) used for window function CREATE TABLE `t1` ( `id` INT NULL, `dt` TEXT NULL ) DISTRIBUTED BY HASH(`id`) BUCKETS 10 PROPERTIES ( "replication_allocation" = "tag.location.default: 1" ); select *, count(*) over() from t1;

) Co-authored-by: morrySnow <101034200+morrySnow@users.noreply.github.com>

add a FunctionSignature for If to support return Type is JsonType.

Co-authored-by: Luennng <luennng@gmail.com>

* Issue: Doris occasionally encounters an issue where memory usage becomes exceptionally high and does not decrease. The leaked memory is occupied by Bloom filters stored in memory. Reason: The segment cache stores segment objects read from files into memory. It functions as an LRU cache with an eviction strategy: when the number of segments exceeds the maximum number, or the total memory size of segment objects in the cache exceeds the maximum usage, it evicts the older segments. However, there is a piece of logic in the code that first reads the segment object into memory, assuming it occupies memory size A, then places the read segment object into the cache (at this point, the cache considers the segment object size to be A). It then reads the segment's Bloom filter from the file and assigns it to the segment's Bloom filter member variable, assuming the Bloom filter occupies memory size B. Thus, the total size of the segment object at this point is A+B. However, the cache does not update this size, leading to the actual size of the segment object stored in the cache (A+B) being larger than the size considered by the cache (A). When the number of segment objects in the cache increases to a certain extent, the used memory will surge dramatically. However, the cache does not perceive the size as reaching the eviction limit, so it does not evict the segment objects. In such cases, a memory leak issue arises. Solution: Since each segment object only reads the Bloom filter once, the issue can be resolved by changing the logic from reading the segment, placing it into the cache, and then reading the Bloom filter to reading the segment, reading the Bloom filter, and then placing it into the cache.

…test_tvf_based_broker_load (#35001)

…35826) ## Proposed changes Issue Number: close #xxx  This reverts commit #35641 because of the compilation of such is not successful on arm plateform.

… query (#35734) support data type ipv4/ipv6 with inverted index and then we can query like "> or < or >= or <= or in/not in " this conjuncts expr for ip with inverted index speeding up

…x default true #33434 (#35521)

… fold to null literal (#35842) pick from master #35811 ## Proposed changes Issue Number: close #xxx

cherry-pick #34313 to branch-2.1 MergePercentileToArray is to perform a transformation in this case: select ss_item_sk, percentile(ss_quantity,0.9), percentile(ss_quantity,0.6), percentile(ss_quantity,0.3) from store_sales group by ss_item_sk; ==> select ss_item_sk, percentile_array(ss_quantity,[0.3,0.6,0.9]) from store_sales group by ss_item_sk;

Previously, FE logs were written to files. The main FE logs include fe.log, fe.warn.log, fe.audit.log, fe.out, and fe.gc.log. In a K8s deployment environment, logs usually need to be output to standard output, and then other components process the log stream. This PR made the following changes: 1. Modified the log4j configuration template - When started with `--daemon`, logs are still written to various files, and the format remains unchanged. - When started with `--console`, all logs are output to standard output and marked with different prefixes: - `StdoutLogger`: logs for standard output - `StderrLogger`: logs for standard error output - `RuntimeLogger`: logs for fe.log or fe.warn.log - `AuditLogger:` logs for fe.audit.log - No prefix: logs for fe.gc.log Examples are as follows: ``` RuntimeLogger 2024-06-03 14:54:51,229 INFO (binlog-gcer|62) [BinlogManager.gc():359] begin gc binlog ``` 2. Added a new FE config: `enable_file_logger` Defaults to true. Indicates that logs will be recorded to files regardless of the startup method. For example, if it is started with `--console`, the log will be output to both the file and the standard output. If it is `false`, the log will not be recorded in the file regardless of the startup method. 3. Optimized the log format of standard output The byte streams of stdout and stderr are captured. The logs previously outputted using `System.out` will be captured in fe.log for unified management.

…e policy (#35839)

add logs for partial update the master PR is #35802 If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...

#35768)" (#35882)

…esource #32632" (#35848) same as #32632

…ion (#35859) (#35895) ## Proposed changes This PR enable `delete sub predicate v2` for compaction, and legacy version of delete predicate will be processed in the original way.

…35890)

…ng set and no grouping scalar function (#35872)

This pull request modifies the index_id type in inverted index storage format v2 to int64_t. The index_id is now stored in the inverted index file using 4 bytes.

here with some array with inverted index bugfix: see also: #34766 #35086 #34683 #34076

from #35821

bp #35686 Co-authored-by: zhangdong <493738387@qq.com>

…, but the partition storage medium for the mtmv is still HDD (#35644) (#35955) pick from master:#35644

…35975) bp #35833

Avro scanner is deprecated. Remove related test suits

#35977) skip null partition when get base tablets for each be (for further usage in dedup updated row count in MV) This may cause publish fail cherry pick master #35475

doris-robot · 2024-06-06T08:55:01Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

w41ter and others added 30 commits May 23, 2024 14:39

[feature](binlog) Support gc binlogs by history nums and size (#35250)

acf741f

* [chore](binlog) Add logs about binlog gc (#34359) * [feature](binlog) Support gc binlogs by history nums and size (#34888)

[fix](clean trash) Fix clean trash lost submit task (#35271)

0d2ab9d

[branch-2.1] remove dlf dependencies (#35292)

ed464ac

followup #35241 In #35241, we update the doris-shade version to 2.1.0, which already contains dlf dependencies. pick part of #34749, to remove dlf dependencies in fe/pom.xml

[feature](Nereids) support select distinct with aggregate (#35300)

bf37e5c

(cherry picked from commit adcbc8c)

fix case: test_create_table_without_distribution

b3f6668

[Fix](regression) fix test_user_var.groovy by add set disable_nereids…

4008dc0

…_rules=PRUNE_EMPTY_PARTITION (#35151)

[fix](routineload) fix data source properties do not persist in edit …

9ba9953

…log (#35137)

[opt](mtmv) generate bi-map between base table and materialized view …

a52ee6e

…partitions (#35131)

[feature](datatype) add BE config to allow zero date (#34961)

a6f7747

Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>

[fix](inverted index)Change index_id from int32 to int64 to avoid ove…

4b7608c

…rflow (#35206) Co-authored-by: Luennng <luennng@gmail.com>

[fix](nereids)days_diff should match datetimev2 function sigature in …

9277480

…higher priority (#35295)

[fix](nereids)should use nereids expr's nullable info when call Expr'…

bb3a0fd

…s toThrift method (#35274)

[fix](nereids)the preagg state for count(*) is wrong (#35326)

f062506

[improve](jdbc catalog) Remove all property checks during create (#35194

cf46ebe

) (#35354)

[fix](typo)fix show backend typo (#35198)

edb276a

[chore](backup) log backup/restore job during replay (#35234)

473e14c

[Fix](regression) fix show data regression case (#35218)

8c594c6

[fix](noexcept) Remove incorrect noexcept #35230

682d72b

[opt](thrift)update thrift to support pushing limit to local Agg (#35204

9427942

) Co-authored-by: morrySnow <101034200+morrySnow@users.noreply.github.com>

[fix](function) support return JsonType for If function (#35199)

dd567fa

add a FunctionSignature for If to support return Type is JsonType.

[fix](overflow) show backends overflow for backend ids (#35245)

78fab91

[test](inverted index) nonConcurrent is added to the test case (#35259)

07cd189

[optimize](regression)Add retry for curl request (#35260)

e02dcec

Co-authored-by: Luennng <luennng@gmail.com>

[fix](regression-test) line_delimiter parse error in regression_test …

0e2b748

…test_tvf_based_broker_load (#35001)

ByteYue and others added 26 commits June 3, 2024 23:22

Revert "[feature-wip](Cloud) Introduce azure core C++ sdk (#35208)" (#…

ba0161c

…35826) ## Proposed changes Issue Number: close #xxx  This reverts commit #35641 because of the compilation of such is not successful on arm plateform.

[Feature](IP) support ipv4/ipv6 with inverted index and conjuncts for…

fe1a4c4

… query (#35734) support data type ipv4/ipv6 with inverted index and then we can query like "> or < or >= or <= or in/not in " this conjuncts expr for ip with inverted index speeding up

2.1.4-rc02

398919d

[chore](index) add config enable_create_bitmap_index_as_inverted_inde…

bc6b316

…x default true #33434 (#35521)

[fix](nereids)keep equal predicate as join conjunct even if it can be…

c23ab25

… fold to null literal (#35842) pick from master #35811 ## Proposed changes Issue Number: close #xxx

[feature](merge-cloud) Change fe log rolling max size (#32777)

db3bbc2

[fix](log) Support fe log rollover size strategy (#34446)

f94222a

[Improvement](coldhot) add statement to show objects which use storag…

0585de1

…e policy (#35839)

Pick "[Fix](Tablet) Fix the issue of redundant loading of stale rowset (

c2b830e

#35768)" (#35882)

Pick "[feature](Resource) Support to specify the root path for hdfs r…

630fd06

…esource #32632" (#35848) same as #32632

[enhancement](delete-pred) enable delete sub predicate v2 for compact…

fdd87fe

…ion (#35859) (#35895) ## Proposed changes This PR enable `delete sub predicate v2` for compaction, and legacy version of delete predicate will be processed in the original way.

[fix](storage_policy) fix cannot cancel a partition's storage policy (#…

af31e96

…35890)

[enhancement](nereids)eliminate repeat node if there is only 1 groupi…

bcde9c6

…ng set and no grouping scalar function (#35872)

[Fix](inverted index) fix index_id wrong size in V2 (#35909)

efe1724

This pull request modifies the index_id type in inverted index storage format v2 to int64_t. The index_id is now stored in the inverted index file using 4 bytes.

[FIX] Pick array inverted index bugfix (#35837)

b5a35b9

here with some array with inverted index bugfix: see also: #34766 #35086 #34683 #34076

[fix](multi-catalog)fix hive partition insert regression case (#35846)

b6ab0c4

from #35821

[cherry-pick]Add workload metric query_be_memory (#35911)

5cecbfc

[fix](mtmv) mtmv disable dynamic partition (#35686) (#35949)

104fcea

bp #35686 Co-authored-by: zhangdong <493738387@qq.com>

[fix](mtmv) Fix that the storage medium specified for the mtmv is SSD…

cd808c3

…, but the partition storage medium for the mtmv is still HDD (#35644) (#35955) pick from master:#35644

[bugfix](hive)fix the error message when creating hive table for 2.1 (#…

726377a

…35975) bp #35833

[branch-2.1](test) comment out avro cases (#35963)

85328d4

Avro scanner is deprecated. Remove related test suits

[fix](statistics) NPE when drop partition during publish (pick #35475) (

fedb7e1

#35977) skip null partition when get base tablets for each be (for further usage in dedup updated row count in MV) This may cause publish fail cherry pick master #35475

[fix](branch-2.1)(jdbc catalog) fix mariadb test conf port (#35982)

1d5b7cb

1

bec486e

wsjz closed this Jun 6, 2024

github-actions bot added the meta-change label Jun 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix](multi-catalog)put java udf to custom lib #35983

[fix](multi-catalog)put java udf to custom lib #35983

Uh oh!

wsjz commented Jun 6, 2024

Uh oh!

doris-robot commented Jun 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

[fix](multi-catalog)put java udf to custom lib #35983

[fix](multi-catalog)put java udf to custom lib #35983

Uh oh!

Conversation

wsjz commented Jun 6, 2024

Proposed changes

Uh oh!

doris-robot commented Jun 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants