Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ This document focuses on how to create an index job, as well as some considerati
* bitmap index:a fast data structure that speeds up queries

## Basic Principles
Creating and droping index is essentially a schema change job. For details, please refer to
Creating and dropping index is essentially a schema change job. For details, please refer to
[Schema Change](alter-table-schema-change.html)。

## Syntax
Expand All @@ -53,12 +53,12 @@ create/drop index syntax
Please refer to [DROP INDEX](../../sql-reference/sql-statements/Data%20Definition/DROP%20INDEX.html) or [ALTER TABLE](../../sql-reference/sql-statements/Data%20Definition/ALTER%20TABLE.html)

## Create Job
Please refer to [Scheam Change](alter-table-schema-change.html)
Please refer to [Schema Change](alter-table-schema-change.html)
## View Job
Please refer to [Scheam Change](alter-table-schema-change.html)
Please refer to [Schema Change](alter-table-schema-change.html)

## Cancel Job
Please refer to [Scheam Change](alter-table-schema-change.html)
Please refer to [Schema Change](alter-table-schema-change.html)

## Notice
* Currently only index of bitmap type is supported.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
{
"title": "Scheam Change",
"title": "Schema Change",
"language": "en"
}
---
Expand All @@ -24,17 +24,17 @@ specific language governing permissions and limitations
under the License.
-->

# Scheam Change
# Schema Change

Users can modify the schema of existing tables through the Scheam Change operation. Doris currently supports the following modifications:
Users can modify the schema of existing tables through the Schema Change operation. Doris currently supports the following modifications:

* Add and delete columns
* Modify column type
* Adjust column order
* Add and modify Bloom Filter
* Add and delete bitmap index

This document mainly describes how to create a Scheam Change job, as well as some considerations and frequently asked questions about Scheam Change.
This document mainly describes how to create a Schema Change job, as well as some considerations and frequently asked questions about Schema Change.
## Glossary

* Base Table:When each table is created, it corresponds to a base table. The base table stores the complete data of this table. Rollups are usually created based on the data in the base table (and can also be created from other rollups).
Expand Down Expand Up @@ -68,9 +68,9 @@ The basic process of executing a Schema Change is to generate a copy of the inde
Before starting the conversion of historical data, Doris will obtain a latest transaction ID. And wait for all import transactions before this Transaction ID to complete. This Transaction ID becomes a watershed. This means that Doris guarantees that all import tasks after the watershed will generate data for both the original Index and the new Index. In this way, when the historical data conversion is completed, the data in the new Index can be guaranteed to be complete.
## Create Job

The specific syntax for creating a Scheam Change can be found in the description of the Scheam Change section in the help `HELP ALTER TABLE`.
The specific syntax for creating a Schema Change can be found in the description of the Schema Change section in the help `HELP ALTER TABLE`.

The creation of Scheam Change is an asynchronous process. After the job is submitted successfully, the user needs to view the job progress through the `SHOW ALTER TABLE COLUMN` command.
The creation of Schema Change is an asynchronous process. After the job is submitted successfully, the user needs to view the job progress through the `SHOW ALTER TABLE COLUMN` command.
## View Job

`SHOW ALTER TABLE COLUMN` You can view the Schema Change jobs that are currently executing or completed. When multiple indexes are involved in a Schema Change job, the command displays multiple lines, each corresponding to an index. For example:
Expand Down
4 changes: 2 additions & 2 deletions docs/en/administrator-guide/backup-restore.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ The commands related to the backup recovery function are as follows. The followi
* Snapshot Finished Time: Snapshot completion time.
* Upload Finished Time: Snapshot upload completion time.
* FinishedTime: The completion time of this assignment.
* Unfinished Tasks: In the `SNAPSHOTTING', `UPLOADING'and other stages, there will be multiple sub-tasks at the same time, the current stage shown here, the task ID of the unfinished sub-tasks.
* Unfinished Tasks: In the `SNAPSHOTTING`, `UPLOADING` and other stages, there will be multiple sub-tasks at the same time, the current stage shown here, the task ID of the unfinished sub-tasks.
* TaskErrMsg: If there is a sub-task execution error, the error message corresponding to the sub-task will be displayed here.
* Status: It is used to record some status information that may appear during the whole operation.
* Timeout: The timeout time of a job in seconds.
Expand All @@ -139,7 +139,7 @@ The commands related to the backup recovery function are as follows. The followi
* Database: The database corresponding to backup.
* Details: Shows the complete data directory structure of the backup.

5. RESTOR
5. RESTORE

Perform a recovery operation.

Expand Down
6 changes: 3 additions & 3 deletions docs/en/administrator-guide/colocation-join.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ In order for a table to have the same data distribution, the table in the same C

Tables in the same CG do not require consistency in the number, scope, and type of partition columns.

After fixing the number of bucket columns and buckets, the tables in the same CG will have the same Buckets Sequnce. The number of replicas determines the number of replicas of Tablets in each bucket, which BE they are stored on. Suppose that Buckets Sequnce is `[0, 1, 2, 3, 4, 5, 6, 7] `, and that BE nodes have `[A, B, C, D] `4. A possible distribution of data is as follows:
After fixing the number of bucket columns and buckets, the tables in the same CG will have the same Buckets Sequence. The number of replicas determines the number of replicas of Tablets in each bucket, which BE they are stored on. Suppose that Buckets Sequence is `[0, 1, 2, 3, 4, 5, 6, 7] `, and that BE nodes have `[A, B, C, D] `4. A possible distribution of data is as follows:

```
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
Expand Down Expand Up @@ -141,7 +141,7 @@ SHOW PROC '/colocation_group/10005.10008';
* BucketIndex: Subscript to the bucket sequence.
* Backend Ids: A list of BE node IDs where data fragments are located in buckets.

> The above commands require AMDIN privileges. Normal user view is not supported at this time.
> The above commands require ADMIN privileges. Normal user view is not supported at this time.

### Modify Colocate Group

Expand Down Expand Up @@ -172,7 +172,7 @@ Copies can only be stored on specified BE nodes. So when a BE is unavailable (do

### Duplicate Equilibrium

Doris will try to distribute the fragments of the Collocation table evenly across all BE nodes. For the replica balancing of common tables, the granularity is single replica, that is to say, it is enough to find BE nodes with lower load for each replica alone. The equilibrium of the Colocation table is at the Bucket level, where all replicas within a Bucket migrate together. We adopt a simple equalization algorithm, which distributes Buckets Sequnce evenly on all BEs, regardless of the actual size of the replicas, but only according to the number of replicas. Specific algorithms can be referred to the code annotations in `ColocateTableBalancer.java`.
Doris will try to distribute the fragments of the Collocation table evenly across all BE nodes. For the replica balancing of common tables, the granularity is single replica, that is to say, it is enough to find BE nodes with lower load for each replica alone. The equilibrium of the Colocation table is at the Bucket level, where all replicas within a Bucket migrate together. We adopt a simple equalization algorithm, which distributes Buckets Sequence evenly on all BEs, regardless of the actual size of the replicas, but only according to the number of replicas. Specific algorithms can be referred to the code annotations in `ColocateTableBalancer.java`.

> Note 1: Current Colocation replica balancing and repair algorithms may not work well for heterogeneous deployed Oris clusters. The so-called heterogeneous deployment, that is, the BE node's disk capacity, number, disk type (SSD and HDD) is inconsistent. In the case of heterogeneous deployment, small BE nodes and large BE nodes may store the same number of replicas.
>
Expand Down
4 changes: 2 additions & 2 deletions docs/en/administrator-guide/config/be_config.md
Original file line number Diff line number Diff line change
Expand Up @@ -199,7 +199,7 @@ Similar to `base_compaction_trace_threshold`.
* Description: Configure the merge policy of the cumulative compaction stage. Currently, two merge policy have been implemented, num_based and size_based.
* Default value: size_based

In detail, ordinary is the initial version of the cumulative compaction merge policy. After a cumulative compaction, the base compaction process is directly performed. The size_based policy is an optimized version of the ordinary strategy. Versions are merged only when the disk volume of the rowset is of the same order of magnitude. After the compaction, the output rowset which satifies the conditions is promoted to the base compaction stage. In the case of a large number of small batch imports: reduce the write magnification of base compact, trade-off between read magnification and space magnification, and reducing file version data.
In detail, ordinary is the initial version of the cumulative compaction merge policy. After a cumulative compaction, the base compaction process is directly performed. The size_based policy is an optimized version of the ordinary strategy. Versions are merged only when the disk volume of the rowset is of the same order of magnitude. After the compaction, the output rowset which satisfies the conditions is promoted to the base compaction stage. In the case of a large number of small batch imports: reduce the write magnification of base compact, trade-off between read magnification and space magnification, and reducing file version data.

### `cumulative_size_based_promotion_size_mbytes`

Expand Down Expand Up @@ -337,7 +337,7 @@ The default value is `false`.
* Default: false

The merged expired rowset version path will be deleted after half an hour. In abnormal situations, deleting these versions will result in the problem that the consistent path of the query cannot be constructed. When the configuration is false, the program check is strict and the program will directly report an error and exit.
When configured as true, the program will run normally and ignore this error. In general, ignoring this error will not affect the query, only when the merged version is dispathed by fe, -230 error will appear.
When configured as true, the program will run normally and ignore this error. In general, ignoring this error will not affect the query, only when the merged version is dispatched by fe, -230 error will appear.

### inc_rowset_expired_sec

Expand Down
2 changes: 1 addition & 1 deletion docs/en/administrator-guide/config/fe_config.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ There are two ways to configure FE configuration items:

2. Dynamic configuration

After the FE starts, you can set the configuration items dynamically through the following commands. This command requires administrator priviledge.
After the FE starts, you can set the configuration items dynamically through the following commands. This command requires administrator privilege.

`ADMIN SET FRONTEND CONFIG (" fe_config_name "=" fe_config_value ");`

Expand Down
10 changes: 5 additions & 5 deletions docs/en/administrator-guide/dynamic-partition.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ under the License.

# Dynamic Partition

Dynamic partition is a new feature introduced in Doris verion 0.12. It's designed to manage partition's Time-to-Life (TTL), reducing the burden on users.
Dynamic partition is a new feature introduced in Doris version 0.12. It's designed to manage partition's Time-to-Life (TTL), reducing the burden on users.

At present, the functions of dynamically adding partitions and dynamically deleting partitions are realized.

Expand Down Expand Up @@ -302,11 +302,11 @@ mysql> SHOW DYNAMIC PARTITION TABLES;

Whether to enable Doris's dynamic partition feature. The default value is false, which is off. This parameter only affects the partitioning operation of dynamic partition tables, not normal tables. You can modify the parameters in `fe.conf` and restart FE to take effect. You can also execute the following commands at runtime to take effect:

MySQL protocal
MySQL protocol

`ADMIN SET FRONTEND CONFIG ("dynamic_partition_enable" = "true")`

HTTP protocal
HTTP protocol

`curl --location-trusted -u username:password -XGET http://fe_host:fe_http_port/api/_set_config?dynamic_partition_enable=true`

Expand All @@ -316,11 +316,11 @@ mysql> SHOW DYNAMIC PARTITION TABLES;

The execution frequency of dynamic partition threads defaults to 3600 (1 hour), that is, scheduling is performed every 1 hour. You can modify the parameters in `fe.conf` and restart FE to take effect. You can also modify the following commands at runtime:

MySQL protocal
MySQL protocol

`ADMIN SET FRONTEND CONFIG ("dynamic_partition_check_interval_seconds" = "7200")`

HTTP protocal
HTTP protocol

`curl --location-trusted -u username:password -XGET http://fe_host:fe_http_port/api/_set_config?dynamic_partition_check_interval_seconds=432000`

6 changes: 3 additions & 3 deletions docs/en/administrator-guide/export_manual.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ The overall mode of dispatch is as follows:
1. The user submits an Export job to FE.
2. FE's Export scheduler performs an Export job in two stages:
1. PENDING: FE generates Export Pending Task, sends snapshot command to BE, and takes a snapshot of all Tablets involved. And generate multiple query plans.
2. EXPORTING: FE generates Export ExporingTask and starts executing the query plan.
2. EXPORTING: FE generates Export ExportingTask and starts executing the query plan.

### query plan splitting

Expand Down Expand Up @@ -122,7 +122,7 @@ WITH BROKER "hdfs"
* `timeout`: homework timeout. Default 2 hours. Unit seconds.
* `tablet_num_per_task`: The maximum number of fragments allocated per query plan. The default is 5.

After submitting a job, the job status can be imported by querying the `SHOW EXPORT'command. The results are as follows:
After submitting a job, the job status can be imported by querying the `SHOW EXPORT` command. The results are as follows:

```
JobId: 14008
Expand All @@ -141,7 +141,7 @@ FinishTime: 2019-06-25 17:08:34
* JobId: The unique ID of the job
* State: Job status:
* PENDING: Jobs to be Scheduled
* EXPORING: Data Export
* EXPORTING: Data Export
* FINISHED: Operation Successful
* CANCELLED: Job Failure
* Progress: Work progress. The schedule is based on the query plan. Assuming a total of 10 query plans have been completed, the progress will be 30%.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -71,4 +71,4 @@ To get FE log via HTTP

## Notification

Need ADMIN priviledge.
Need ADMIN privilege.
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ specific language governing permissions and limitations
under the License.
-->

# Conection Action
# Connection Action

## Request

Expand Down
4 changes: 2 additions & 2 deletions docs/en/administrator-guide/load-data/broker-load-manual.md
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,7 @@ The following is a detailed explanation of some parameters of the data descripti

+ negative

```data_desc``` can also set up data fetching and anti-importing. This function is mainly used when aggregated columns in data tables are of SUM type. If you want to revoke a batch of imported data. The `negative'parameter can be used as a batch of data. Doris automatically retrieves this batch of data on aggregated columns to eliminate the same batch of data.
```data_desc``` can also set up data fetching and anti-importing. This function is mainly used when aggregated columns in data tables are of SUM type. If you want to revoke a batch of imported data. The `negative` parameter can be used as a batch of data. Doris automatically retrieves this batch of data on aggregated columns to eliminate the same batch of data.

+ partition

Expand Down Expand Up @@ -377,7 +377,7 @@ The following configurations belong to the Broker load system-level configuratio

+ min\_bytes\_per\_broker\_scanner/max\_bytes\_per\_broker\_scanner/max\_broker\_concurrency

The first two configurations limit the minimum and maximum amount of data processed by a single BE. The third configuration limits the maximum number of concurrent imports for a job. The minimum amount of data processed, the maximum number of concurrencies, the size of source files and the number of BEs in the current cluster **together determine the concurrency of this import**.
The first two configurations limit the minimum and maximum amount of data processed by a single BE. The third configuration limits the maximum number of concurrent imports for a job. The minimum amount of data processed, the maximum number of concurrency, the size of source files and the number of BEs in the current cluster **together determine the concurrency of this import**.

```
The number of concurrent imports = Math. min (source file size / minimum throughput, maximum concurrency, current number of BE nodes)
Expand Down
4 changes: 2 additions & 2 deletions docs/en/administrator-guide/load-data/delete-manual.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,14 +59,14 @@ The following describes the parameters used in the delete statement:

* WHERE

The conditiona of the delete statement. All delete statements must specify a where condition.
The condition of the delete statement. All delete statements must specify a where condition.

Explanation:

1. The type of `OP` in the WHERE condition can only include `=, >, <, >=, <=, !=, in, not in`.
2. The column in the WHERE condition can only be the `key` column.
3. Cannot delete when the `key` column does not exist in any rollup table.
4. Each condition in WHERE condition can only be realated by `and`. If you want `or`, you are suggested to write these conditions into two delete statements.
4. Each condition in WHERE condition can only be connected by `and`. If you want `or`, you are suggested to write these conditions into two delete statements.
5. If the specified table is a range partitioned table, `PARTITION` must be specified unless the table is a single partition table,.
6. Unlike the insert into command, delete statement cannot specify `label` manually. You can view the concept of `label` in [Insert Into] (./insert-into-manual.md)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -221,7 +221,7 @@ Insert Into itself is a SQL command, and the return result is divided into the f
## Best Practices

### Application scenarios
1. Users want to import only a few false data to verify the functionality of Doris system. The grammar of INSERT INTO VALUS is suitable at this time.
1. Users want to import only a few false data to verify the functionality of Doris system. The grammar of INSERT INTO VALUES is suitable at this time.
2. Users want to convert the data already in the Doris table into ETL and import it into a new Doris table, which is suitable for using INSERT INTO SELECT grammar.
3. Users can create an external table, such as MySQL external table mapping a table in MySQL system. Or create Broker external tables to map data files on HDFS. Then the data from the external table is imported into the Doris table for storage through the INSERT INTO SELECT grammar.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ Usually used to troubleshoot network problems.

### `doris_be_snmp{name="tcp_in_segs"}`

Value of the `Tcp: InSegs` field in `/proc/net/snmp`. Represents the number of receivied TCP packets.
Value of the `Tcp: InSegs` field in `/proc/net/snmp`. Represents the number of received TCP packets.

Use `(NEW_tcp_in_errs - OLD_tcp_in_errs) / (NEW_tcp_in_segs - OLD_tcp_in_segs)` can calculate the error rate of received TCP packets.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ Usually used to troubleshoot network problems.

### `doris_fe_snmp{name="tcp_in_segs"}`

Value of the `Tcp: InSegs` field in `/proc/net/snmp`. Represents the number of receivied TCP packets.
Value of the `Tcp: InSegs` field in `/proc/net/snmp`. Represents the number of received TCP packets.

Use `(NEW_tcp_in_errs - OLD_tcp_in_errs) / (NEW_tcp_in_segs - OLD_tcp_in_segs)` can calculate the error rate of received TCP packets.

Expand Down
2 changes: 1 addition & 1 deletion docs/en/community/committer-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ and you will be able to manage issues and pull request directly through our Gith

5. Once a reviewer has commented on a PR, they need to keep following up on subsequent changes to that PR.

6. A PR must get at least a +1 appove from committer who is not the author.
6. A PR must get at least a +1 approved from committer who is not the author.

7. After the first +1 to the PR, wait at least one working day before merging. The main purpose is to wait for the rest of the community to come to review.

Expand Down
Loading