2.1.5 Release Notes

# Behavior changes
- The default connection pool size for JDBC Catalog has been adjusted from 10 to 30. (#37023).  When creating a JDBC Catalog, the default value of the connection_pool_max_size parameter has been changed to 30 to avoid connection pool exhaustion in high-concurrency scenarios.
- The minimum value of the system's reserved memory, also known as the low water mark, has been adjusted to min(6.4G, MemTotal * 5%) to better prevent BE OOM (Out-Of-Memory) issues.
- The processing logic for multiple statements in a single request has been modified. When the client does not set the CLIENT_MULTI_STATEMENTS flag, only the result of the last statement will be returned instead of all statements.
- Direct modification of data in asynchronous materialized views is no longer allowed. (https://github.com/apache/doris/pull/37129)
- A session variable use_max_length_of_varchar_in_ctas has been added to control the behavior of varchar and char type length generation during CTAS (Create Table As Select). The default value is true. When set to false, the derived varchar length is used instead of the maximum length. (https://github.com/apache/doris/pull/37284)
- Statistics collection now defaults to enabling the functionality of estimating the number of rows in Hive tables based on file size. (https://github.com/apache/doris/pull/37694)
- The transparent rewrite mechanism for asynchronous materialized views is now enabled by default. (https://github.com/apache/doris/pull/35897)
- Transparent rewrite utilizes partitioned materialized views. If some partitions of the partitioned materialized view fail, the default behavior is to union all the base tables with the materialized view to ensure the correctness of query data. (https://github.com/apache/doris/pull/35897)

# New features
### Lakehouse
- The session variable read_csv_empty_line_as_null can be used to control whether empty lines are ignored when reading CSV format files. (#37153) By default, empty lines are ignored. When set to true, empty lines will be read as rows where all columns are null.
- Added compatibility with Presto's complex type output format (#37253)
You can control the output format of complex types to be consistent with Presto by setting set serde_dialect="presto". This is useful for smoothly migrating Presto business operations.

### Multi Table Materialized View
- Support for using non-deterministic functions in building materialized views: https://github.com/apache/doris/pull/37651

- Support for atomically replacing the definition of asynchronous materialized views: https://github.com/apache/doris/pull/37147
- Support for viewing the creation statement of asynchronous materialized views via show create materialized view: https://github.com/apache/doris/pull/37125
- Support for transparent rewriting of multi-dimensional aggregation queries: https://github.com/apache/doris/pull/37436
- Support for transparent rewriting of aggregation queries using non-aggregate materialized views: https://github.com/apache/doris/pull/37497
- Support for transparent rewriting of DISTINCT aggregations in queries using key columns: https://github.com/apache/doris/pull/37651
- Support for partitioning materialized views to roll up partitions using date_trunc
https://github.com/apache/doris/pull/31812
https://github.com/apache/doris/pull/35562
Support for partition TVF (Table-Valued Functions): https://github.com/apache/doris/pull/36479

### Semi-Structured Data Management
- Tables using the VARIANT type now support partial column updates: https://github.com/apache/doris/pull/34925
- PreparedStatement support is now enabled by default: https://github.com/apache/doris/pull/36581
- The VARIANT type now supports export to CSV format: https://github.com/apache/doris/pull/37857
- Support for the explode_json_object function to transpose JSON Object rows into columns: https://github.com/apache/doris/pull/36887
- The ES Catalog now maps ES nested or object types to the Doris JSON type: https://github.com/apache/doris/pull/37101
- By default, support_phrase is enabled for inverted indexes with specified analyzers to improve the performance of match_phrase series queries: https://github.com/apache/doris/pull/37949

### Query optimizer
- Support for explaining DELETE FROM statements: https://github.com/apache/doris/pull/37100
- Support for hint form of constant expression parameters: https://github.com/apache/doris/pull/37988

### Memory Management
- Added an HTTP API to clear the cache. https://github.com/apache/doris/pull/36599

### Permissions
- Support for authorization of resources within Table-Valued Functions (TVFs): https://github.com/apache/doris/pull/37132

# Improvements

### Lakehouse
- Upgraded Paimon to version 0.8.1
- Fixed an issue where querying Paimon tables sometimes resulted in a ClassNotFound error for org.apache.commons.lang.StringUtils (#37512)
- Added support for Tencent Cloud LakeFS: https://github.com/apache/doris/pull/36891
- Optimized the timeout duration when fetching file lists for external table queries (#36842)
- Configurable via the session variable fetch_splits_max_wait_time_ms
- Improved default connection logic for SQLServer JDBC Catalog (#36971)
By default, the connection encryption settings are not intervened. Only when force_sqlserver_jdbc_encrypt_false is set to true, encrypt=false is forcibly added to the JDBC URL to reduce authentication errors. This allows for more flexible control over encryption behavior, enabling it to be turned on or off as needed.
- Added serde properties to the show create table statement for Hive tables (#37096)
Changed the default cache time for Hive table lists on the FE from 1 day to 4 hours
Data export (Export/Outfile) now supports specifying compression formats for Parquet and ORC
- When creating a table using CTAS+TVF, partition columns in the TVF are automatically mapped to Varchar(65533) instead of String, allowing them to be used as partition columns for internal tables (#37161)
- Optimized the number of metadata accesses for Hive write operations (#37127)
- ES Catalog now supports mapping nested/object types to Doris's Json type (#37182)
- Improved error messages when connecting to Oracle using older versions of the ojdbc driver (#37634)
- When Hudi tables return an empty set during Incremental Read, Doris now also returns an empty set instead of an error (#37636)
- Fixed an issue where inner-outer table join queries could lead to FE timeouts in some cases (#37757)
- Fixed an issue with FE metadata replay errors during upgrades from older versions to newer versions when the Hive metastore event listener is enabled (#37757)

### Multi Table Materialized View
- Support for automatically selecting key columns when creating asynchronous materialized views: https://github.com/apache/doris/pull/36601
- Asynchronous materialized view partition refresh now supports using the date_trunc function in definitions: https://github.com/apache/doris/pull/35562
- In nested materialized views, when the lower level hits a roll-up rewrite for aggregation, the upper level can now continue with transparent rewrites: https://github.com/apache/doris/pull/37651
- Asynchronous materialized views remain available when schema changes do not affect the correctness of their data: https://github.com/apache/doris/pull/37122
- Improved planning speed for transparent rewrites: https://github.com/apache/doris/pull/37935
- When calculating the availability of asynchronous materialized views, the current refresh status is no longer taken into account: https://github.com/apache/doris/pull/36617

### Semi-Structured Data Management
- Optimize DESC performance for viewing VARIANT sub-columns through sampling: https://github.com/apache/doris/pull/37217
- Support for special JSON data with empty keys in the JSON type: https://github.com/apache/doris/pull/36762

### Inverted Index
- Reduce latency by minimizing the invocation of inverted index exists to avoid delays in accessing object storage: https://github.com/apache/doris/pull/36945
Optimize the overhead of the inverted index query process: https://github.com/apache/doris/pull/35357
- Do not create inverted indices in materialized views: https://github.com/apache/doris/pull/36869

### Query optimizer
- When both sides of a comparison expression are literals, the string literal will attempt to convert to the type of the other side: https://github.com/apache/doris/pull/36921
- Refactored the sub-path pushdown functionality for the variant type, now better supporting complex pushdown scenarios: https://github.com/apache/doris/pull/36923
- Optimized the logic for calculating the cost of materialized views, enabling more accurate selection of lower-cost materialized views: https://github.com/apache/doris/pull/37098
- Improved the SQL cache planning speed when using user variables in SQL: https://github.com/apache/doris/pull/37119
- Optimized the row estimation logic for NOT NULL expressions, resulting in better performance when NOT NULL is present in queries: https://github.com/apache/doris/pull/37498
- Optimized the null rejection derivation logic for LIKE expressions: https://github.com/apache/doris/pull/37864
- Improved error messages when querying a specific partition fails, making it clearer which table is causing the issue: https://github.com/apache/doris/pull/37280

### Query Execution
- Improved the performance of the bitmap_union operator by up to 3 times in certain scenarios.
- Enhanced the reading performance of Arrow Flight in ARM environments.
- Optimized the execution performance of the explode, explode_map, and explode_json functions.

### Data Loading
- Support setting max_filter_ratio for INSERT INTO ... FROM TABLE VALUE FUNCTION

# Bug fixes

### Lakehouse
- Fixed an issue that caused BE crashes in some cases when querying Parquet format (#37086)
- Fixed an issue where BE printed excessive logs when querying Parquet format (#37012)
- Fixed an issue where the FE side created a large number of duplicate FileSystem objects in some cases (#37142)
- Fixed an issue where transaction information was not cleaned up after writing to Hive in some cases (#37172)
- Fixed a thread leak issue caused by Hive table write operations in some cases (#37247)
- Fixed an issue where Hive Text format row and column delimiters could not be correctly obtained in some cases (#37188)
- Fixed a concurrency issue when reading lz4 compressed blocks in some cases (#37187)
- Fixed an issue where count(*) on Iceberg tables returned incorrect results in some cases (#37810)
- Fixed an issue where creating a Paimon catalog based on MinIO caused FE metadata replay errors in some cases (#37249)
- Fixed an issue where using Ranger to create a catalog caused the client to hang in some cases (https://github.com/apache/doris/pull/37551)

### Multi Table Materialized View
- Fixed an issue where adding new partitions to the base table could lead to incorrect results after partition aggregation roll-up rewrites. https://github.com/apache/doris/pull/37651
- Fixed an issue where the materialized view partition status was not set to out-of-sync after deleting associated base table partitions. https://github.com/apache/doris/pull/36602
- Fixed an occasional deadlock issue during asynchronous materialized view builds. https://github.com/apache/doris/pull/37133
- Fixed an occasional "nereids cost too much time" error when refreshing a large number of partitions in a single asynchronous materialized view refresh. https://github.com/apache/doris/pull/37589
- Fixed an issue where an asynchronous materialized view could not be created if the final select list contained a null literal. https://github.com/apache/doris/pull/37281
- Fixed an issue with single-table materialized views where, even though the aggregation materialized view was successfully rewritten, the CBO did not select it.
https://github.com/apache/doris/pull/35721
https://github.com/apache/doris/pull/36058
- Fixed an issue where partition derivation failed when building a partitioned materialized view with both join inputs being aggregations. https://github.com/apache/doris/pull/34781

### Semi-Structured Data Management
- Fixed issues with VARIANT in special cases such as concurrency and abnormal data. https://github.com/apache/doris/pull/37976, https://github.com/apache/doris/pull/37839, https://github.com/apache/doris/pull/37794, https://github.com/apache/doris/pull/37674, https://github.com/apache/doris/pull/36997
- Fixed coredump issues when using VARIANT in unsupported SQL. https://github.com/apache/doris/pull/37640
- Fixed coredump issues related to MAP data type when upgrading from 1.x to 2.x or higher versions. https://github.com/apache/doris/pull/36937
- Improved ES Catalog support for Array types. https://github.com/apache/doris/pull/36936

### Inverted Index
- Fixed an issue where DROP INDEX for Inverted Index v2 did not delete metadata. https://github.com/apache/doris/pull/37646
- Fixed query accuracy issues when string length exceeded the "ignore above" threshold. https://github.com/apache/doris/pull/37679
- Fixed issues with index size statistics. https://github.com/apache/doris/pull/37232, https://github.com/apache/doris/pull/37564

### Query optimizer
- Fixed an issue that prevented import operations from executing due to the use of reserved keywords. https://github.com/apache/doris/pull/35938
- Fixed a type error where char(255) was incorrectly recorded as char(1) when creating a table. https://github.com/apache/doris/pull/37671
- Fixed incorrect results when the join expression in a correlated subquery was a complex expression. https://github.com/apache/doris/pull/37683
- Fixed a potential issue with incorrect bucket pruning for decimal types. https://github.com/apache/doris/pull/38013
- Fixed incorrect aggregation operator results when pipeline local shuffle was enabled in certain scenarios. https://github.com/apache/doris/pull/38016
- Fixed planning errors that could occur when equal expressions existed in aggregation operators. https://github.com/apache/doris/pull/36622
- Fixed planning errors that could occur when lambda expressions were present in aggregation operators. https://github.com/apache/doris/pull/37285
- Fixed an issue where a literal generated from a window function being optimized to a literal had the wrong type, preventing execution. https://github.com/apache/doris/pull/37283
- Fixed an issue with the null attribute being incorrectly output by the aggregate function foreach combinator. https://github.com/apache/doris/pull/37980
- Fixed an issue where the acos function could not be planned when its parameter was a literal out of range. https://github.com/apache/doris/pull/37996
- Fixed planning errors when specifying partitions for a query on a synchronized materialized view. https://github.com/apache/doris/pull/36982
- Fixed occasional Null Pointer Exceptions (NPEs) during planning. https://github.com/apache/doris/pull/38024

### Query Execution
- Fixed an error in delete where statements when using decimal data types as conditions. https://github.com/apache/doris/pull/37801
- Fixed an issue where BE memory was not released after query execution ended. https://github.com/apache/doris/pull/37792, https://github.com/apache/doris/pull/37297
- Fixed a problem where audit logs occupied too much FE memory under high QPS scenarios. https://github.com/apache/doris/pull/37786
- Fixed BE core dumps when the sleep function received illegal input values. https://github.com/apache/doris/pull/37681
- Fixed an error encountered during sync filter size execution. https://github.com/apache/doris/pull/37103
- Fixed incorrect results when using time zones during execution. https://github.com/apache/doris/pull/37062
- Fixed incorrect results when casting strings to integers. https://github.com/apache/doris/pull/36788
- Fixed query errors when using the Arrow Flight protocol with pipelinex enabled. https://github.com/apache/doris/pull/35804
- Fixed errors when casting strings to dates/datetimes. https://github.com/apache/doris/pull/35637
- Fixed BE core dumps during large table join queries using <=>. https://github.com/apache/doris/pull/36263

### Storage Management
- Fixed the issue of invisible DELETE SIGN data encountered during column update and write operations (#36755)
- Optimized FE's memory usage during schema changes (#36756)
- Fixed the issue where BE would hang during restart due to transactions not being aborted (#36437)
- Fixed occasional errors when changing from NOT NULL to NULL data types (#36389)
- Optimized replica repair scheduling when BE goes down (#36897)
- Supported round-robin disk selection for tablet creation on a single BE (#36900)
- Fixed query error -230 caused by slow publishing (#36222)
- Improved the speed of partition balancing (#36976)
- Controlled segment cache using the number of file descriptors (FDs) and memory to avoid FD exhaustion (#37035)
- Fixed potential replica loss caused by concurrent clone and alter operations (#36858)
- Fixed the issue of not being able to adjust column order (#37226)
- Prohibited certain schema change operations on auto-increment columns (#37331)
- Fixed inaccurate error reporting for DELETE operations (#37374)
- Adjusted the trash expiration time on BE side to one day (#37409)
- Optimized compaction memory usage and scheduling (#37491, 37496)
- Checked for potential oversized backups causing FE restarts (#37466)
- Restored dynamic partition deletion policies and cross-partition behaviors to 2.1.3 (#37570, #37506, 37964)
- Fixed errors related to decimal types in DELETE predicates (#37710)

### Data Loading

- Fixed data invisibility issues caused by race conditions in error handling during imports (#36744, 37527, 37536)
- Added support for hhl_from_base64 in streamload imports (#36819)
- Fixed potential FE OOM issues when importing very large numbers of tablets for a single table (#36944)
- Fixed possible auto-increment column duplication during FE master-slave switchovers (#36961)
- Fixed errors when inserting into select with auto-increment columns (#37029)
- Reduced the number of data flush threads to optimize memory usage (#37092)
- Improved automatic recovery and error messaging for routineload tasks (#37371, 37372, 37373, 37391)
- Increased the default batch size for routineload (#37388)
- Fixed routineload task stoppage due to Kafka EOF expiration (#37983)
- Fixed coredump issues in multi-table streaming (#37370)
- Fixed premature backpressure caused by inaccurate memory estimation in groupcommit (#37379)
- Optimized BE-side thread usage in groupcommit (#37380)
- Fixed the issue of no error URL when data was not partitioned (#37401)
- Fixed potential memory misoperations during imports (#38021, 37939)


### Merge on Write Unique Key

- Reduced memory usage during compaction for primary key tables (#36968)
- Fixed potential duplicate data issues when primary key replica cloning fails (#37229)

### Permissions
- Fixed the issue of missing authorization when a table-valued function references a resource. (#37132)
- Fixed the issue where the SHOW ROLE statement did not include workload group permissions. https://github.com/apache/doris/pull/36032
- Fixed the issue where executing two statements simultaneously when creating a row policy could cause FE to fail to restart. (https://github.com/apache/doris/pull/37342)
- Fixed the issue where, in some cases, upgrading from an older version could result in FE metadata replay failures due to row policies. (#37342)

### Others

- Fixed the issue of compute nodes participating in internal table creation. (#37961)
- Fixed the read lag issue when enable_strong_read_consistency is set to true. (#37641)




### Code of Conduct

- [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)


2.1.5 Release Notes #38111

Description

Behavior changes

New features

Lakehouse

Multi Table Materialized View

Semi-Structured Data Management

Query optimizer

Memory Management

Permissions

Improvements

Lakehouse

Multi Table Materialized View

Semi-Structured Data Management

Inverted Index

Query optimizer

Query Execution

Data Loading

Bug fixes

Lakehouse

Multi Table Materialized View

Semi-Structured Data Management

Inverted Index

Query optimizer

Query Execution

Storage Management

Data Loading

Merge on Write Unique Key

Permissions

Others

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions