-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Open
Labels
Description
Behavior Changes
- Adjust the permission requirements for
show frontendsandshow backendsto align with the corresponding RESTful API, i.e., requiring theSELECT_PRIVpermission on theinformation_schemadatabase ([fix](restapi) Unify Permission Requirements for Executing SHOW FRONTENDS/BACKENDS And NODE RestAPI #50140) - Admin and root users with specified domains are no longer considered system users ([fix](auth)Only treat admin@% and root@% as system users. #50904)
- Storage: The default number of concurrent transactions per database is adjusted to 10000 ([chore](conf) change max_running_txn_num_per_db to 10000 #51367, branch-3.0: [chore](conf) change max_running_txn_num_per_db to 10000 #51367 #52380)
New Features
Lakehouse
(No specific content)
Asynchronous Materialized Views
(No specific content)
Query Optimizer
- Support MySQL's aggregate roll-up syntax
GROUP BY ... WITH ROLLUP([improvement](nereids)Support GROUP BY ... WITH ROLLUP syntax #51948)
Query Execution
Likestatement supportsescapesyntax ([Feature](function) support like with escape clause (#52146) #52540)
Semi-structured Data Management
- Support building non-tokenized inverted indexes and ngram bloomfilter indexes only for new data by setting the session variable
enable_add_index_for_new_data=true([feature](index change)Support light index change for inverted index without parser #52251, [feature](index change)Support light index change for ngram bf index #48461)
Storage
(No new features; relevant changes see Behavior Changes)
New Functions
- Added data functions:
cot/sec/cosec([Feature](function) support function cot/sec/cosec #52872)
Improvements
Data Ingestion
- Optimize error message prompts for
SHOW CREATE LOAD([chore](load) optimize show create load error message #53694, branch-3.0: [chore](load) optimize show create load error message #53694 #53730)
Primary Key Model
- Add segment key bounds truncation capability to avoid single large import failures ([opt](rowset meta) truncate segments key bounds if too large to avoid
RowsetMetaCloudPBexceeds fdb's 100KB limits #45287, branch-3.0: [opt](rowset meta) truncate segments key bounds if too large to avoidRowsetMetaCloudPBexceeds fdb's 100KB limits (#45287) #51595)
Storage
- Enhance the reliability of compaction and imported data ([fix](cloud) compaction and schema change potential data race when retrying prepare rowset #51048, [fix](cloud) compaction and schema change potential data race when retrying prepare rowset (#51048) #51852, [fix](cloud) potential data race when retrying prepare/commit rowset for load #51129, branch-3.0: [fix](cloud) potential data race when retrying prepare/commit rowset for load #51129 #51483)
- Optimize balance speed ([opt](cloud) Optimize balance speed by reducing the complexity of the rebalance algorithm #51733, branch-3.0: [opt](cloud) Optimize balance speed by reducing the complexity of the rebalance algorithm #51733 #52813, [fix](cloud) Fix
ConcurrentModificationExceptionin cloud rebalance #52013, branch-3.0: [fix](cloud) FixConcurrentModificationExceptionin cloud rebalance #52013 #52309) - Optimize table creation speed ([opt](create table) Fixed table creation becomes slower as the number of tablets increases #52688, branch-3.0: [opt](create table) Fixed table creation becomes slower as the number of tablets increases #52688 #52918)
- Optimize compaction default parameters and observability ([enhance](compaction) limit time series table max version using maximum of current backend #53244, [enhance](compaction) limit time series table max version using maximum of current backend (#53244) #53562, [enhance](compaction) optimize mow base compaction parameters #52321, branch-3.0: [enhance](compaction) optimize mow base compaction parameters #52321 #52605, [Enhancement](Compaction) Support auto set cumu compaction threads num base on cpu num #53133, branch-3.0: [Enhancement](Compaction) Support auto set cumu compaction threads num base on cpu num #53133 #53215, [Enhancement](Compaction) Make base compaction use the same tablet selection strategy as cumulative compaction #51649, branch-3.0: [Enhancement](Compaction) Make base compaction use the same tablet selection strategy as cumulative compaction #51649 #52389, [bvar](cloud-mow) Add bvar for mow compaction get delete bitmap lock backoff sleep time #52044, branch-3.0: [bvar](cloud-mow) Add bvar for mow compaction get delete bitmap lock backoff sleep time (#52044) #52297)
- Optimize the issue of query error -230 ([opt](rowset) Remote fetch rowsets to avoid -230 error when capturing rowsets (#52995) #52440, [fix](replica) Get tablet replica infos should return all primary backends except for warmup jobs #54131)
- Add system table
backend_tablets(branch-3.0: [Enhancement] add information_schema backend_tablets table #52195) - Optimize the performance of querying
information_schema.tablesfrom follower nodes in cloud mode ([opt](query) accelerate query information_schema.tables from follower node in cloud mode #51240, [opt](query) accelerate query information_schema.tables from follower node in cloud mode (#51240) #51405)
Storage-Compute Decoupled
- Enhance observability of Meta-service recycler ([enhance](meta-service)add bvar for fdb process status #52882, branch-3.0: [enhance](meta-service)add bvar for fdb process status #52882 #53100, branch-3.0: [Enhancement](systable)add information_schema backend_configuration table #51542, [enhance](meta-service)collect fdb kv meta range info as metrics #52430, branch-3.0: [enhance](meta-service)collect fdb kv meta range info as metrics #52430 #53116, [enhance](meta-service)add bytes for kv stats #52729, branch-3.0 [enhance](meta-service)add bytes for kv stats (#52729) #53351, [enhance](meta-service) add real request ip for be rpc #53114, branch-3.0 [enhance](meta-service) add real request ip for be rpc #53114. #53320, branch-3.0: [enhance](meta-service)add bvar for ms kv get del put count #52714, [enhance](metrics)add some compaction metrics #50910, branch-3.0: [enhance](metrics)add some compaction metrics #50910 #51487, [Feature](recycler) Add recycler metrics for recycler layer #51409, branch-3.0: [Feature](recycler) Add recycler metrics for recycler layer #51409 #51884, [feat](recycler) Add http api for statistics recycler metrics #52523, branch-3.0: [feat](recycler) Add http api for statistics recycler metrics #52523 #53117)
- Support cross-compute group incremental preheating during import compaction ([feature](cloud) support event driven or periodic warm up #52370, branch-3.0: [feature](cloud) support event driven or periodic warm up #52370 #52514, [fix](warmup) fix show warm up tables #53406, [fix](warmup) fix warm up jobs missing last batch #53860, [fix](warmup) fix warm up jobs missing last batch (#53860) #53861, [fix](load) fix bad load id in injection #52339, branch-3.0: [fix](load) fix bad load id in injection #52339 #52426, [test](p2) allow protocol prefix in S3 endpoint format in test_broker_load #53525, [test](p2) allow protocol prefix in S3 endpoint format in test_broker_load (#53525) #53530, [fix](warmup) avoid calling recycle_cache after rebalance #53339, [fix](warmup) avoid calling recycle_cache after rebalance (#53339) #53523, [metrics](warmup) add some metrics for warmup jobs #52991, [fix](warmup) prevent NPE when upgrading from older versions #53555, branch-3.0: [fix](warmup) prevent NPE when upgrading from older versions #53555 #53666, [feat](warmup) display tables in SHOW WARM UP JOB results #51594, [feat](warmup) display tables in SHOW WARM UP JOB results (#51594) #52291, [fix](vcg) use "vcg cancel" as cancel message for warm up jobs #53752)
- Optimize Storage vault connectivity check ([feat](storage vault) Check storage vault connectivity for BE when starting #51175, branch-3.0: [feat](storage vault) Check storage vault connectivity for BE when starting #51175 #52319, [fix](be) Fix
check_storage_vaultdeadlock #52541, branch-3.0: [fix](be) Fixcheck_storage_vaultdeadlock #52541 #52602, [fix](cloud) Sync storage resource once in read path when rowset._storage_resource.fs is null #53075, branch-3.0: [fix](cloud) Sync storage resource once in read path when rowset._storage_resource.fs is null #53075 #53227) - Support updating storage backend information via MS API ([feat](cloud) Support alter operation for obj_info and s3 vault obj_info #51162, branch-3.0: [feat](cloud) Support alter operation for obj_info and s3 vault obj_info #51162 #51685)
Lakehouse
- Optimize ORC zlib decompression performance in x86 environment and fix potential issues ([fix & opt](orc) ORC-1525: Fix bad read in RleDecoderV2::readByte and Decompress zlib by libdeflate. #51775)
- Optimize the default number of concurrent threads for external table reading ([opt](multi-catalog) Optimize remote scan concurrency. #51415)
- Optimize error messages for Catalogs that do not support DDL operations ([fix](iceberg)Table operations are not supported for catalogs of the dlf type. #50696)
Asynchronous Materialized Views
- Optimize the performance of transparent rewriting planning ([opt](mtmv) optimize mtmv rewrite performance #49514)
Query Optimizer
- The
group_concatfunction now allows parameters of non-string types ([opt](group_concat) allow args be types other than string #52805) - The
sumandavgfunctions allow parameters of non-numeric types ([opt](Nereids) aggregate function sum support string type as parameter #49954) - Expand the scope of support for delayed materialization in TOP-N queries, enabling delayed materialization when querying partial columns (branch-3.0: [opt](Nereids) support defer materialization with project #52522)
- When creating partitions, list partitions allow inclusion of
MAX_VALUE([fix](nereids)allow in partition contains MAX_VALUE #46076) - Optimize the performance of sampling and collecting statistical information for aggregate model tables ([improvement](statistics)Agg table set preagg on when doing sample analyzing. #49918)
- Optimize the accuracy of NDV values when sampling and collecting statistical information ([improvement](statistics)Eliminate null values while sample analyzing ndv. #50574)
Inverted Index
- Unify the order of properties displayed for inverted indexes in
show create table([enhancement](inverted index) ensure consistent ordering of properties for inverted index show #51467) - Add per-condition profile metrics (such as hit rows and execution time) for inverted index filter conditions to facilitate performance analysis ([feature](inverted index) Add profile statistics for each condition in inverted index filters #47504)
- Enhance the display of inverted index-related information in profiles ([opt](inverted index) Enhance I/O statistics collection for the inverted index in file cache scenarios #48950, [opt](inverted index) uniform profile naming convention #48826, [fix](inverted index) enhance inverted index profile #51495)
Permissions
- Ranger supports setting permissions for storage vault and compute group ([enhance](auth)ranger support storage vault and compute group #47925)
Bug Fixes
Data Ingestion
- Fix the correctness issue that may occur when importing CSV files with multi-character separators ([fix](csv reader) fix data loss when concurrency read using multi char line delimiter #53374, branch-3.0: [fix](csv reader) fix data loss when concurrency read using multi char line delimiter (#53374) #53634)
- Fix the issue where the result of
ROUTINE LOADtask display is incorrect after modifying task properties ([fix](job) fix show routine load job result incorrect after alter job property #53038, branch-3.0: [fix](job) fix show routine load job result incorrect after alter job property #53038 #53098) - Fix the issue where the one-stream multi-table import plan becomes invalid after primary node restart or Leader switch ([fix](load) fix multi table load plan fail after restart master Fe or leader change #53799, branch-3.0: [fix](load) fix multi table load plan fail after restart master Fe or leader change (#53799) #53829)
- Fix the issue where all scheduling tasks are blocked because
ROUTINE LOADtasks cannot find available BE nodes ([fix](job) fix routine load task scheduler block for one job can not find any BE #52654, branch-3.0: [fix](job) fix routine load task scheduler block for one job can not find any BE (#52654) #52791) - Fix the concurrent read-write conflict issue of
runningTxnIds([fix](load) fix concurrent read and write to runningTxnIds #51615, branch-3.0: [fix](load) fix concurrent read and write to runningTxnIds #51615 #51639)
Primary Key Model
- Optimize the import performance of mow tables under high-frequency concurrent imports ([Opt](cloud-mow) Skip MS RPC retry's backoff when encounter fdb txn conflict when mow load get ms delete bitmap lock #52360, branch-3.0: [Opt](cloud-mow) Skip MS RPC retry's backoff when encounter fdb txn conflict when mow load get ms delete bitmap lock #52360 #52439, [improve](cloud-mow) batch get tablet stats when get_delete_bitmap_update_lock #47281, branch-3.0: [improve](cloud-mow) batch get tablet stats when get_delete_bitmap_update_lock (#47281) #52225)
- mow table full compaction releases space of deleted data ([Opt](compaction) Prune rows with delete sign=1 in full compaction #51874, branch-3.0: [Opt](compaction) Prune rows with delete sign=1 in full compaction #51874 #52256)
- Fix the potential import failure issue of mow tables in extreme scenarios ([Opt](cloud-mow) Retry to commit txn when encounter stale calc delete bitmap response regardless of status code #52547, branch-3.0: [Opt](cloud-mow) Retry to commit txn when encounter stale calc delete bitmap response regardless of status code (#52547) #52848)
- Optimize the compaction performance of mow tables ([Opt](cloud-mow) Do fast retry when commit compaction job for mow tablet #52476, branch-3.0; [Opt](cloud-mow) Do fast retry when commit compaction job for mow tablet (#52476) #52952)
- Fix the potential correctness issue of mow tables during concurrent imports and schema changes ([Fix](mow) Fix
DeleteBitmap's assignment operator and constructor #52582, branch-3.0: [Fix](mow) FixDeleteBitmap's assignment operator and constructor #52582 #52974) - Fix the issue where schema change on empty mow tables may cause import stuck or schema change failure ([fix](mow) fix update delete bitmap lock not removed if schema change for empty tablet #51780, branch-3.0: [fix](mow) fix update delete bitmap lock not removed if schema change for empty tablet (#51780) #52166)
- Fix the memory leak issue of mow delete bitmap cache ([fix](mow) fix potential mem leak for DeleteBitmap::get_agg #52718, [fix](chore) fix cache release core #52756, branch-3.0:[fix](mow) fix potential mem leak for DeleteBitmap::get_agg (#52718, #52756) #52931)
- Fix the potential correctness issue of mow tables after schema change ([Fix](cloud-mow) Remove potential existing split delete bitmap KVs before update them in schema change #51353, branch-3.0: [Fix](cloud-mow) Remove potential existing split delete bitmap KVs before update them in schema change (#51353) #51531)
Storage
- Fix the missing rowset issue in clone process caused by compaction ([Fix](Compaction) Fix full clone failure when rowset missing #53984, branch-3.0: [Fix](Compaction) Fix full clone failure when rowset missing #53984 #54162, [Fix](Clone) Fix compaction and mow failure when missing rowset #52812, [Cherry-Pick](branch-3.0) Pick "[Fix](Clone) Fix compaction and mow failure when missing rowset (#52812)" #53497, [Enhancement](Log) Missing rowset clone should not print stack log #53193, [Cherry-Pick](branch-3.0) Pick "[Enhancement](Log) Missing rowset clone should not print stack log (#53193)" #53527)
- Fix the issue of inaccurate size calculation and default value for autobucket ([fix](auto bucket)Set the estimated partition size to 5G in non cloud #51258, branch-3.0: [fix](auto bucket)Set the estimated partition size to 5G in non cloud #51258 #51682, [fix](auto bucket)Fix auto bucket calc bucketnum err when partition size is invalid #52801, branch-3.0: [fix](auto bucket)Fix auto bucket calc bucketnum err when partition size is invalid #52801 #53250)
- Fix the potential correctness issue caused by bucket columns ([fix](schema-change) Forbid dropping distribution columns (branch-3.0) #54037, [fix](schema-change) Rebuild distribution info according to original order #54024, [fix](schema-change) Rebuild distribution info should update default distribution info #54072, [fix](schema-change) Rebuild distribution info according to original order (#54024) (#54072) #54109)
- Fix the issue where single-column tables cannot be renamed ([fix](schema-change) Fix single column table could not rename columns #47275, branch-3.0: [fix](schema-change) Fix single column table could not rename columns #47275 #52340)
- Fix the potential memory leak issue of memtable ([fix](load) Convert RowInBlock* to shared_ptr to fix potential memory leaks in MemTable #52902, branch-3.0: [fix](load) Convert RowInBlock* to shared_ptr to fix potential memory leaks in MemTable (#52902) #52965)
- Fix the inconsistent error reporting issue for unsupported operations in empty table transaction writes ([fix](txn load) fix delete in txn load #52133, branch-3.0: [fix](txn load) fix delete in txn load (#52133) #52635)
Storage-Compute Decoupled
- Several fixes for File cache ([enhancement](cloud) monitor evict size of file cache active gc #51197, branch-3.0: [enhancement](cloud) monitor evict size of file cache active gc #51197 #51793, [regression](filecache) fix regression failures part2 #53783, [regression](filecache) fix regression failures part2 (#53783) #53915, [fix](filecache) fix load_cache_info_into_memory crash #51684, branch-3.0: [fix](filecache) fix load_cache_info_into_memory crash #51684 #51904, [optimization](filecache) speed up filecache warm up #51776, [optimization](filecache) speed up filecache warm up #51776 #52556, [fix](cloud) fix file cache types priority order #51463, branch-3.0: [fix](cloud) fix file cache types priority order #51463 #51603, [enhancement](filecache) fix default capacity and add reset_capacity validation #51711, branch-3.0: [enhancement](filecache) fix default capacity and add reset_capacity validation #51711 #52152)
- Fix the issue where cumulative point may roll back during schema process ([fix](cloud) Fix roll-backed cumulative point of new tablet when doing schema change #53402, branch-3.0: [fix](cloud) Fix roll-backed cumulative point of new tablet when doing schema change #53402 #53446)
- Fix the issue where background tasks affect automatic restart ([fix](cloud)Fix auto start affected by daemon jobs #51729, branch-3.0: [fix](cloud)Fix auto start affected by daemon jobs #51729 #52519)
- Fix the unhandled exception issue in data recycling process in azure environment ([fix](recycler) Fix two errors for recycler #53042, branch-3.0: [fix](recycler) Fix two errors for recycler #53042 #53224)
- Fix the issue where file cache is not cleaned up in time when compacting a single rowset ([feat](cloud) Add unused rowset state for CloudTablet #51674)
Lakehouse
- Fix the transaction commit failure issue for Iceberg table writes in Kerberos environment ([fix](iceberg)Fix the thread pool issue used for commit. #51508)
- Fix the query issue for hudi in kerberos environment ([fix](hudi catalog) Fix the Kerberos authentication error when querying hudi table #51713)
- Fix the potential deadlock issue in multi-Catalog scenarios ([fix](catalog) fix deadlock of catalog and database #53626)
- Fix the metadata inconsistency issue caused by concurrent Catalog refresh in some cases ([fix](catalog) synchronize reset methods in catalog classes and remove Lombok annotations #51787)
- Fix the issue where ORC footer is read multiple times in some cases ([Optimize](orc-reader) Optimize stripe footer multiple reads in orc reader. #51277)
- Fix the issue where Table Valued Function cannot read compressed json files ([fix](tvf) support compressed json file for tvf and refactor code #51983)
- SQL Server Catalog supports identifying IDENTITY column information ([improvement](jdbc catalog) Optimize the acquisition of indentity type in SQLServer #51285)
- SQL Convertor supports specifying multiple URLs for high availability ([Enhancement](sql-dialect) Support multiple sql-converter service urls #52636)
Asynchronous Materialized Views
- Fix the issue where partition compensation may be performed incorrectly when the query is optimized to an empty result set ([fix](mtmv) Fix compensate union wrongly when direct query is empty relation #51700)
Query Optimizer
- Fix the issue where factors other than
sql_select_limitaffect DML execution results ([fix](sql_select_limit) sql_select_limit should not affect DML #53379) - Fix the issue where materialized CTEs may report errors in extreme cases when starting local shuffle ([fix](coordinator) fix cte with local shuffle throw exception #52870)
- Fix the issue where prepared insert statements cannot be executed on non-master nodes ([fix](Prepared Statment) Fix exec prepared insert stmt in non master error #48689)
- Fix the result error issue when casting
ipv4to string ([fix](nereids) fix cast ipv4 to string #51546)
Permissions
- When a user has multiple roles, the permissions of the multiple roles will be merged before authorization (branch-3.0:[fix](auth)fix when authentication, the permissions of multiple roles… #52948)
Query Execution
- Fix issues with some json functions ([fix](json) Add . after in JSON path to support correct token parsing (#52543) #52744, [fix](test) Avoid the configuration item disable_datev1 affecting other test cases #52915, [fix](json) incorrect results of json_contains (#53291) #53364)
- Fix the potential BE Core issue when the asynchronous thread pool is full ([fix](pipeline) premature exit causing core dump during concurrent pr… #52365)
- Fix the incorrect result issue of
hll_to_base64([Bug](function) fix wrong result of hll_to_base64 #51831) - Fix the result error issue when casting
decimal256to float ([fix](decimal256) fix casting decimal256 to float #54140) - Fix two memory leak issues ([bugfix](memleak) fix memory leak for tabletschema and result cache (#51931) #51952, [bugfix](memleak) fix memleak in arrow input stream #51929, branch-3.0: [Fix](field) Fix potential memory leak and wrong binary reading about JsonbField (#50174) #52542)
- Fix the be core issue caused by
bitmap_from_base64([Bug](function) fix bitmap_from_base64 function cause heap-buffer-overflow error #53018) - Fix the potential be core issue caused by
array_mapfunction ([bug](function) fix array_map cause coredump as NULL #51618) - Fix the potential error issue of
split_by_regexpfunction ([bug](function) fix split_by_regexp meet empty string return error #51293) - Fix the potential result error issue of
bitmap_unionfunction under extremely large data volumes ([Bug](aggregate) fix bitmap_union return error result in query sql #52033) - Fix the potential core issue of
format roundfunction under some boundary values ([branch-3.0](core) Fix format round would core under boundary conditions #53855)
Inverted Index
- Fix the memory leak issue of inverted indexes in abnormal situations ([fix](inverted index) fix index memeory leak for inverted index #53235)
- Fix the error reporting issue when writing and querying empty index files ([fix](inverted index) fix error when writing empty index file #51984, [fix](inverted index) fix error when open empty index file #51393)
- Capture IO exceptions in inverted index string reading to avoid process crash due to exceptions ([fix](inverted index) catch IO exception to avoid coredump in inverted index string reader #51844)
Complex Data Types
- Fix the potential type inference error when Variant Nested data types conflict ([fix](variant)fix variant type conflicts in nested types #52696)
- Fix the parameter type inference error of
mapfunction ([fix](variant)fix variant type conflicts in nested types #52696) - Fix the issue where data is incorrectly converted to NULL when specifying
'$.'as the path in jsonpath ([Fix](JsonReader) Fix the issue where the null bitmap of the JSON reader was not initialized when the JSON path is specified as '$.’ #52211) - Fix the issue where the serialization format cannot be restored when a subfield of Variant contains
.(brach-3.0 cherry-pick [Fix](Variant) fix serialize with json key contains.as name #51930)
Others
- Fix the insufficient length issue of the IP field in the auditlog table ([opt](auditlog) Use varchar(1024) for column frontend_ip of audit log table #52762, [opt](auditlog) Use varchar(1024) for column frontend_ip of audit log table #52762 #52984)
- Fix the issue where the query id recorded in the audit log is that of the previous query when SQL parsing fails ([fix](audit)Fixed an issue that the audit log would record the previous queryId when parseSQL fails. #53107)
colinmollenhour