-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
Resource Division
In the early version, Doris supports the multi-Cluster feature, which aims to manage multiple BE node groups into the same Doris cluster to facilitate unified management. And the cluster can achieve node-level resource isolation.
But because of some design problems, this feature has been deprecated. These problems mainly include:
-
The design problem of the code itself leads to tight coupling between various metadata, and the code is too heavy to maintain.
-
Clusters are too independent, data are isolated from each other, and data migration costs are high.
Therefore, we plan to implement a relatively lightweight node-level resource isolation to meet the following requirements:
- Reduce the maintenance of multiple clusters by users, and enable unified management in a set of clusters.
- Able to achieve node-level resource isolation, that is, different users can use different node groups, such as the isolation of online business and offline business, or the isolation of different departments.
- Able to support the storage of replicas of a tablet in different node groups, that is, users can use different node resources to query the same data. Isolate resources while sharing data.
- The code is loosely coupled and easy to maintain.
Design
The overall design ideas are as follows:
- Support setting Resource Tags for BE nodes, BE will group according to resource tags (resource groups).
- Support to specify the allocation of replicas on different BE nodes.
- When querying, user can access only the replica of the specified resource group according to the privilege. And only use the computing resources of the specified resource group.
- Support resource group permission verification.
The detailed design plan is as follows:
-
Set the resource tag
User can specify a resource tag for the BE when adding it, or modify the tag at runtime. For simplicity, currently we only support each BE to specify a unique Tag. At the same time, in order to ensure compatibility with the original logic and unification of subsequent processing logic, each BE will have a default Tag unless it is specified.
alter system add backend "1272:9050, 1212:9050" properties("tag.location": "zoneA"); alter system modify backend "1272:9050, 1212:9050" set ("tag.location": "zoneB"); -
Specify the allocation of replicas
Here we discuss the following situations:
-
Specify the copy distribution in the table creation statement
We can specify the allocation of replicas in the properties of the table creation statement. For example, two of the three replicas are in resource group A, and one replica is in resource group B. This method requires the user to have access to the corresponding resource group. At the same time, the number of nodes in the corresponding resource group is required to be sufficient.
CREATE TABLE example_db.table_hash ( k1 TINYINT ) DISTRIBUTED BY HASH(k1) BUCKETS 32 PROPERTIES ( "replication_allocation"="tag.location.zone1:1, tag.location.zone1:2" ); -
Resource division at the database level
It is more flexible to specify the replica allocation when creating a table, but the operation is more cumbersome. The user needs to specify a distribution every time the table is created. In some scenarios, the business is usually divided according to the database, and the resource distribution of the tables under a database remains consistent. Therefore, we can support specifying resource division at the database level, that is, for a db, specify its own resources. After that, the db tables all use this resource setting. The setting in the table creation statement can still overwrite the setting in db. In order to simplify the design, db-level resource division does not support specifying multiple tags, that is, when db-level resource division is used, all replicas are in a resource group.
alter database db1 set ("replication_allocation" = "tag.location.zone1: 3"); -
Division of resources at the partition level.
In fact, the setting of the table replica allocation is at the partition level. That is, each partition saves its own replica allocation information. Therefore, although we specified resource division at the db and table levels, the information is ultimately stored at the partition level. Therefore, we can also support partition level's replica allocation modification. These modifications can occur in the add partition and modify partition operations.
-
Modify allocation information
In order to simplify the design, in this issue we only support the modification of allocation information at the partition level. In the follow-up, we are considering how to support the modification of the distribution information of the entire table or the entire database.
-
-
Support specifying resources when querying
This permission can be set in user property. The user can only query the replica of the data on the resource with the permission. And the query will only run in the corresponding resource group. A user can be granted permissions for multiple resource groups.
-
Replica scheduling
Replica scheduling includes replica complement and replica balance.
-
Repair replicas
Repair replicas include missing replicas, missing versions, node offline, redundant replicas, not in the corresponding resource group, Colocation distribution, etc.
-
Replica balance
The scope of replica balancing should be performed within a resource group. First perform load statistics on the nodes in a resource group, and then migrate data fragments from high-load nodes to low-load nodes.
-
Schedule
- Step1: Support setting tags for BE nodes, and support create table with specified tags
- Step2: Support tablet repair and balance based on tags.
- Step3: Support privilege checking of tag.