Skip to content

Release Note 1.2.0 #14461

@morningman

Description

@morningman

[Chinese Version. See below]

Feature

Highlight

  1. Full Vectorizied-Engine support, greatly improved performance

    In the standard ssb-100-flat benchmark, the performance of 1.2 is 2 times faster than that of 1.1; in complex TPCH 100 benchmark, the performance of 1.2 is 3 times faster than that of 1.1.

  2. Merge-on-Write Unique Key

    Support Merge-On-Write on Unique Key Model. This mode marks the data that needs to be deleted or updated when the data is written, thereby avoiding the overhead of Merge-On-Read when querying, and greatly improving the reading efficiency on the updateable data model.

  3. Multi Catalog

    The multi-catalog feature provides Doris with the ability to quickly access external data sources for access. Users can connect to external data sources through the CREATE CATALOG command. Doris will automatically map the library and table information of external data sources. After that, users can access the data in these external data sources just like accessing ordinary tables. It avoids the complicated operation that the user needs to manually establish external mapping for each table.

    Currently this feature supports the following data sources:

    1. Hive Metastore: You can access data tables including Hive, Iceberg, and Hudi. It can also be connected to data sources compatible with Hive Metastore, such as Alibaba Cloud's DataLake Formation. Supports data access on both HDFS and object storage.
    2. Elasticsearch: Access ES data sources.
    3. JDBC: Access MySQL through the JDBC protocol.

    Documentation: https://doris.apache.org/zh-CN/docs/dev/ecosystem/external-table/multi-catalog)

    Note: The corresponding permission level will also be changed automatically, see the "Upgrade Notes" section for details.

  4. Light table structure changes

In the new version, it is no longer necessary to change the data file synchronously for the operation of adding and subtracting columns to the data table, and only need to update the metadata in FE, thus realizing the millisecond-level Schema Change operation. Through this function, the DDL synchronization capability of upstream CDC data can be realized. For example, users can use Flink CDC to realize DML and DDL synchronization from upstream database to Doris.

Documentation: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE

When creating a table, set "light_schema_change"="true" in properties.

  1. JDBC facade

    Users can connect to external data sources through JDBC. Currently supported:

    • MySQL
    • PostgreSQL
    • Oracle
    • SQL Server
    • Clickhouse

    Documentation: https://doris.apache.org/zh-CN/docs/dev/ecosystem/external-table/jdbc-of-doris/

    Note: The ODBC feature will be removed in a later version, please try to switch to the JDBC.

  2. JAVA UDF

    Supports writing UDF/UDAF in Java, which is convenient for users to use custom functions in the Java ecosystem. At the same time, through technologies such as off-heap memory and Zero Copy, the efficiency of cross-language data access has been greatly improved.

    Document: https://doris.apache.org/zh-CN/docs/dev/ecosystem/udf/java-user-defined-function

    Example: https://github.com/apache/doris/tree/master/samples/doris-demo

  3. Remote UDF

    Supports accessing remote user-defined function services through RPC, thus completely eliminating language restrictions for users to write UDFs. Users can use any programming language to implement custom functions to complete complex data analysis work.

    Documentation: https://doris.apache.org/zh-CN/docs/ecosystem/udf/remote-user-defined-function

    Example: https://github.com/apache/doris/tree/master/samples/doris-demo

  4. More data types support

    • Array type

      Array types are supported. It also supports nested array types. In some scenarios such as user portraits and tags, the Array type can be used to better adapt to business scenarios. At the same time, in the new version, we have also implemented a large number of data-related functions to better support the application of data types in actual scenarios.

    Documentation: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Data-Types/ARRAY

    Related functions: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-functions/array-functions/array_max

    • Jsonb type

      Support binary Json data type: Jsonb. This type provides a more compact json encoding format, and at the same time provides data access in the encoding format. Compared with json data stored in strings, it is several times newer and can be improved.

    Documentation: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Data-Types/JSONB

    Related functions: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-functions/json-functions/jsonb_parse

    • Date V2

      Sphere of influence:

      1. The user needs to specify datev2 and datetimev2 when creating the table, and the date and datetime of the original table will not be affected.
      2. When datev2 and datetimev2 are calculated with the original date and datetime (for example, equivalent connection), the original type will be cast into a new type for calculation
      3. The example is in the documentation

      Documentation: https://doris.apache.org/docs/dev/sql-manual/sql-reference/Data-Types/DATEV2

More

  1. A new memory management framework

    Documentation: https://doris.apache.org/zh-CN/docs/dev/admin-manual/maint-monitor/memory-management/memory-tracker

  2. Table Valued Function

    Doris implements a set of Table Valued Function (TVF). TVF can be regarded as an ordinary table, which can appear in all places where "table" can appear in SQL.

    For example, we can use S3 TVF to implement data import on object storage:

    insert into tbl select * from s3("s3://bucket/file.*", "ak" = "xx", "sk" = "xxx") where c1 > 2;
    

    Or directly query data files on HDFS:

    insert into tbl select * from hdfs("hdfs://bucket/file.*") where c1 > 2;
    

    TVF can help users make full use of the rich expressiveness of SQL and flexibly process various data.

    Documentation:

    https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-functions/table-functions/s3

    https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-functions/table-functions/hdfs

  3. A more convenient way to create partitions

    Support for creating multiple partitions within a time range via the FROM TO command.

  4. Column renaming

    For tables with Light Schema Change enabled, column renaming is supported.

    Documentation: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Data-Definition-Statements/Alter/ALTER-TABLE-RENAME

  5. Richer permission management

  6. Import

  7. Support viewing the contents of the catalog recycle bin through SHOW CATALOG RECYCLE BIN function.

    Documentation: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Show-Statements/SHOW-CATALOG-RECYCLE-BIN

  8. Support SELECT * EXCEPT syntax.

    Documentation: https://doris.apache.org/zh-CN/docs/dev/data-table/basic-usage

  9. OUTFILE supports ORC format export. And supports multi-byte delimiters.

    Documentation: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Data-Manipulation-Statements/OUTFILE

  10. Support to modify the number of Query Profiles that can be saved through configuration.

    Document search FE configuration item: max_query_profile_num

  11. The DELETE statement supports IN predicate conditions. And it supports partition pruning.

    Documentation: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/DELETE

  12. The default value of the time column supports using CURRENT_TIMESTAMP

    Search for "CURRENT_TIMESTAMP" in the documentation: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE

  13. Add two system tables: backends, rowsets

    Documentation:

    https://doris.apache.org/zh-CN/docs/dev/admin-manual/system-table/backends

    https://doris.apache.org/zh-CN/docs/dev/admin-manual/system-table/rowsets

  14. Backup and restore

    • The Restore job supports the reserve_replica parameter, so that the number of replicas of the restored table is the same as that of the backup.

    • The Restore job supports reserve_dynamic_partition_enable parameter, so that the restored table keeps the dynamic partition enabled.

    Documentation: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Data-Definition-Statements/Backup-and-Restore/RESTORE

    • Support backup and restore operations through the built-in libhdfs, no longer rely on broker.

    Documentation: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Data-Definition-Statements/Backup-and-Restore/CREATE-REPOSITORY

  15. Support data balance between multiple disks on the same machine

    Documentation:

    https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Database-Administration-Statements/ADMIN-REBALANCE-DISK

    https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Database-Administration-Statements/ADMIN-CANCEL-REBALANCE-DISK

  16. Routine Load supports subscribing to Kerberos-authenticated Kafka services.

    Search for kerberos in the documentation: https://doris.apache.org/zh-CN/docs/dev/data-operate/import/import-way/routine-load-manual

  17. New built-in-function

    Added the following built-in functions:

    • cbrt
    • sequence_match/sequence_count
    • mask/mask_first_n/mask_last_n
    • elt
    • any/any_value
    • group_bitmap_xor
    • ntile
    • nvl
    • uuid
    • initcap
    • regexp_replace_one/regexp_extract_all
    • multi_search_all_positions/multi_match_any
    • domain/domain_without_www/protocol
    • running_difference
    • bitmap_hash64
    • murmur_hash3_64
    • to_monday
    • not_null_or_empty
    • window_funnel
    • group_bit_and/group_bit_or/group_bit_xor
    • outer combine
    • and all array functions

Upgrade Notice

Known Issues

  • Use JDK11 will cause BE crash, please use JDK8 instead.

Behavior Changed

During Upgrade

  1. Upgrade preparation

    • Need to replace: lib, bin directory (start/stop scripts have been modified)

    • BE also needs to configure JAVA_HOME, and already supports JDBC Table and Java UDF.

    • The default JVM Xmx parameter in fe.conf is changed to 8GB.

  2. Possible errors during the upgrade process

    The above errors will return to normal after a full upgrade.

Performance Impact

Api change

Big Thanks

Thanks to ALL who contributed to this release! (alphabetically)

@924060929
@a19920714liou
@adonis0147
@Aiden-Dong
@aiwenmo
@AshinGau
@b19mud
@BePPPower
@BiteTheDDDDt
@bridgeDream
@ByteYue
@caiconghui
@CalvinKirs
@cambyzju
@caoliang-web
@carlvinhust2012
@catpineapple
@ccoffline
@chenlinzhong
@chovy-3012
@coderjiang
@cxzl25
@dataalive
@dataroaring
@dependabot[bot]
@dinggege1024
@DongLiang-0
@Doris-Extras
@eldenmoon
@EmmyMiao87
@englefly
@FreeOnePlus
@Gabriel39
@gaodayue
@geniusjoe
@gj-zhang
@gnehil
@GoGoWen
@HappenLee
@hello-stephen
@Henry2SS
@hf200012
@huyuanfeng2018
@jacktengg
@jackwener
@jeffreys-cat
@Jibing-Li
@JNSimba
@Kikyou1997
@Lchangliang
@LemonLiTree
@lexoning
@liaoxin01
@lide-reed
@link3280
@liutang123
@liuyaolin
@LOVEGISER
@lsy3993
@luozenglin
@luzhijing
@madongz
@morningman
@morningman-cmy
@morrySnow
@mrhhsg
@Myasuka
@myfjdthink
@nextdreamblue
@pan3793
@pangzhili
@pengxiangyu
@platoneko
@qidaye
@qzsee
@SaintBacchus
@SeekingYang
@smallhibiscus
@sohardforaname
@song7788q
@spaces-X
@ssusieee
@stalary
@starocean999
@SWJTU-ZhangLei
@TaoZex
@timelxy
@Wahno
@wangbo
@wangshuo128
@wangyf0555
@weizhengte
@weizuo93
@wsjz
@wunan1210
@xhmz
@xiaokang
@xiaokangguo
@xinyiZzz
@xy720
@yangzhg
@Yankee24
@yeyudefeng
@yiguolei
@yinzhijian
@yixiutt
@yuanyuan8983
@Yulei-Yang
@zbtzbtzbt
@zenoyang
@zhangboya1
@zhangstar333
@zhannngchen
@ZHbamboo
@zhengshiJ
@zhenhb
@zhqu1148980644
@zuochunwei
@zy-kkk

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions