Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions docs/en/community/design/doris_storage_optimization.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ We generate a sparse index of short key every N rows (configurable) with the con

### Column's other indexes ###

The format design supports the subsequent expansion of other index information, such as bitmap index, spatial index, etc. It only needs to write the required data to the existing column data, and add the corresponding metadata fields to FileFooterPB.
The format design supports the subsequent expansion of other index information, such as inverted index, spatial index, etc. It only needs to write the required data to the existing column data, and add the corresponding metadata fields to FileFooterPB.

### Metadata Definition ###
SegmentFooterPB is defined as:
Expand Down Expand Up @@ -210,7 +210,7 @@ Relevant issues:
1. Read the magic of the file and judge the type and version of the file.
2. Read FileFooterPB and check sum
3. Read short key index and data ordinal index information of corresponding columns according to required columns
4. Use start key and end key, locate the row number to be read through short key index, then determine the row ranges to be read through ordinal index, and filter the row ranges to be read through statistics, bitmap index and so on.
4. Use start key and end key, locate the row number to be read through short key index, then determine the row ranges to be read through ordinal index, and filter the row ranges to be read through statistics, inverted index and so on.
5. Then read row data through ordinal index according to row ranges

Relevant issues:
Expand All @@ -232,4 +232,4 @@ It implements a scalable compression framework, supports a variety of compressio

## TODO ##
1. How to implement nested types? How to locate line numbers in nested types?
2. How to optimize the downstream bitmap and column statistics caused by ScanRange splitting?
2. How to optimize the downstream inverted index and column statistics caused by ScanRange splitting?
8 changes: 4 additions & 4 deletions docs/en/docs/admin-manual/query-profile.md
Original file line number Diff line number Diff line change
Expand Up @@ -221,7 +221,7 @@ OLAP_SCAN_NODE (id=0):(Active: 1.2ms,% non-child: 0.00%)
- ScanTime: 39.24us # The time returned from ScanNode to the upper node.
- ShowHintsTime_V1: 0ns # V2 has no meaning. Read part of the data in V1 to perform ScanRange segmentation.
SegmentIterator:
- BitmapIndexFilterTimer: 779ns # Use bitmap index to filter data time-consuming.
- InvertedIndexFilterTimer: 779ns # Use inverted index to filter data time-consuming.
- BlockLoadTime: 415.925us # SegmentReader(V1) or SegmentIterator(V2) gets the time of the block.
- BlockSeekCount: 12 # The number of block seeks when reading Segment.
- BlockSeekTime: 222.556us # It takes time to block seek when reading Segment.
Expand All @@ -234,7 +234,7 @@ OLAP_SCAN_NODE (id=0):(Active: 1.2ms,% non-child: 0.00%)
- NumSegmentFiltered: 0 # When generating Segment Iterator, the number of Segments that are completely filtered out through column statistics and query conditions.
- NumSegmentTotal: 6 # Query the number of all segments involved.
- RawRowsRead: 7 # The number of raw rows read in the storage engine. See below for details.
- RowsBitmapIndexFiltered: 0 # Only in V2, the number of rows filtered by the Bitmap index.
- RowsInvertedIndexFiltered: 0 # Only in V2, the number of rows filtered by the Inverted index.
- RowsBloomFilterFiltered: 0 # Only in V2, the number of rows filtered by BloomFilter index.
- RowsKeyRangeFiltered: 0 # In V2 only, the number of rows filtered out by SortkeyIndex index.
- RowsStatsFiltered: 0 # In V2, the number of rows filtered by the ZoneMap index, including the deletion condition. V1 also contains the number of rows filtered by BloomFilter.
Expand All @@ -248,8 +248,8 @@ OLAP_SCAN_NODE (id=0):(Active: 1.2ms,% non-child: 0.00%)
The predicate push down and index usage can be inferred from the related indicators of the number of data rows in the profile. The following only describes the profile in the reading process of segment V2 format data. In segment V1 format, the meaning of these indicators is slightly different.

- When reading a segment V2, if the query has key_ranges (the query range composed of prefix keys), first filter the data through the SortkeyIndex index, and the number of filtered rows is recorded in `RowsKeyRangeFiltered`.
- After that, use the Bitmap index to perform precise filtering on the columns containing the bitmap index in the query condition, and the number of filtered rows is recorded in `RowsBitmapIndexFiltered`.
- After that, according to the equivalent (eq, in, is) condition in the query condition, use the BloomFilter index to filter the data and record it in `RowsBloomFilterFiltered`. The value of `RowsBloomFilterFiltered` is the difference between the total number of rows of the Segment (not the number of rows filtered by the Bitmap index) and the number of remaining rows after BloomFilter, so the data filtered by BloomFilter may overlap with the data filtered by Bitmap.
- After that, use the Inverted index to perform precise filtering on the columns containing the inverted index in the query condition, and the number of filtered rows is recorded in `RowsInvertedIndexFiltered`.
- After that, according to the equivalent (eq, in, is) condition in the query condition, use the BloomFilter index to filter the data and record it in `RowsBloomFilterFiltered`. The value of `RowsBloomFilterFiltered` is the difference between the total number of rows of the Segment (not the number of rows filtered by the Inverted index) and the number of remaining rows after BloomFilter, so the data filtered by BloomFilter may overlap with the data filtered by inverted index.
- After that, use the ZoneMap index to filter the data according to the query conditions and delete conditions and record it in `RowsStatsFiltered`.
- `RowsConditionsFiltered` is the number of rows filtered by various indexes, including the values ​​of `RowsBloomFilterFiltered` and `RowsStatsFiltered`.
- So far, the Init phase is completed, and the number of rows filtered by the condition to be deleted in the Next phase is recorded in `RowsDelFiltered`. Therefore, the number of rows actually filtered by the delete condition are recorded in `RowsStatsFiltered` and `RowsDelFiltered` respectively.
Expand Down
3 changes: 1 addition & 2 deletions docs/en/docs/advanced/alter-table/schema-change.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,7 @@ Users can modify the schema of existing tables through the Schema Change operati
* Add and delete columns
* Modify column type
* Adjust column order
* Add and modify Bloom Filter
* Add and delete bitmap index
* Add and delete index

This document mainly describes how to create a Schema Change job, as well as some considerations and frequently asked questions about Schema Change.
## Glossary
Expand Down
1 change: 0 additions & 1 deletion docs/en/docs/advanced/cold-hot-separation.md
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,6 @@ The supported schema change types after data cooling are as follows:
* Modify column type
* Adjust column order
* Add and modify Bloom Filter
* Add and delete bitmap index

## cold data Garbage collection
The garbage data of cold data refers to the data that is not used by any Replica. Object storage may have garbage data generated by the following situations:
Expand Down
3 changes: 1 addition & 2 deletions docs/en/docs/data-table/best-practice.md
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,6 @@ Users can modify the Schema of an existing table through the Schema Change opera
- Adding and deleting columns
- Modify column types
- Reorder columns
- Adding or modifying Bloom Filter
- Adding or removing bitmap index
- Adding or removing index

For details, please refer to [Schema Change](../advanced/alter-table/schema-change.md)
83 changes: 0 additions & 83 deletions docs/en/docs/data-table/index/bitmap-index.md

This file was deleted.

2 changes: 1 addition & 1 deletion docs/en/docs/data-table/index/bloomfilter.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ You can consider establishing a Bloom Filter index for a column when the followi

1. First, BloomFilter is suitable for non-prefix filtering.
2. The query will be filtered according to the high frequency of the column, and most of the query conditions are in and = filtering.
3. Unlike Bitmap, BloomFilter is suitable for high cardinality columns. Such as UserID. Because if it is created on a low-cardinality column, such as a "gender" column, each Block will almost contain all values, causing the BloomFilter index to lose its meaning.
3. BloomFilter is suitable for high cardinality columns. Such as UserID. Because if it is created on a low-cardinality column, such as a "gender" column, each Block will almost contain all values, causing the BloomFilter index to lose its meaning.

### **Doris BloomFilter Use Precautions**

Expand Down
2 changes: 1 addition & 1 deletion docs/en/docs/data-table/index/index-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Indexes are used to help quickly filter or find data.

Doris currently supports two main types of indexes:
1. built-in smart indexes, including prefix indexes and ZoneMap indexes.
2. User-created secondary indexes, including the [inverted index](./inverted-index.md), [bloomfilter index](./bloomfilter.md)[ngram bloomfilter index](./ngram-bloomfilter-index.md) and [bitmap index](./bitmap-index.md).
2. User-created secondary indexes, including the [inverted index](./inverted-index.md), [bloomfilter index](./bloomfilter.md) and [ngram bloomfilter index](./ngram-bloomfilter-index.md).

The ZoneMap index is the index information automatically maintained for each column in the column storage format, including Min/Max, the number of Null values, and so on. This index is transparent to the user.

Expand Down
4 changes: 2 additions & 2 deletions docs/en/docs/data-table/index/inverted-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ specific language governing permissions and limitations
under the License.
-->

# [Experimental] Inverted Index
# Inverted Index

<version since="2.0.0">

Expand Down Expand Up @@ -53,7 +53,7 @@ The features for inverted index is as follows:
- MATCH_ALL matches all keywords, MATCH_ANY matches any keywords
- support fulltext on array of text field
- support english, chinese and mixed unicode word parser
- accelerate normal equal, range query, replacing bitmap index in the future
- accelerate normal equal, range query, replacing bitmap index
- suport =, !=, >, >=, <, <= on text, numeric, datetime types
- suport =, !=, >, >=, <, <= on array of text, numeric, datetime types
- complete suport for logic combination
Expand Down
2 changes: 1 addition & 1 deletion docs/en/docs/data-table/index/ngram-bloomfilter-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ specific language governing permissions and limitations
under the License.
-->

# [Experimental] NGram BloomFilter Index
# NGram BloomFilter Index

<version since="2.0.0">
</version>
Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -36,18 +36,17 @@ This statement is used to create an index
grammar:

```sql
CREATE INDEX [IF NOT EXISTS] index_name ON table_name (column [, ...],) [USING BITMAP] [COMMENT 'balabala'];
CREATE INDEX [IF NOT EXISTS] index_name ON table_name (column [, ...],) [USING INVERTED] [COMMENT 'balabala'];
````
Notice:
- Currently only supports bitmap indexes
- BITMAP indexes are only created on a single column
- INVERTED indexes are only created on a single column

### Example

1. Create a bitmap index for siteid on table1
1. Create a inverted index for siteid on table1

```sql
CREATE INDEX [IF NOT EXISTS] index_name ON table1 (siteid) USING BITMAP COMMENT 'balabala';
CREATE INDEX [IF NOT EXISTS] index_name ON table1 (siteid) USING INVERTED COMMENT 'balabala';
````


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -165,14 +165,14 @@ Index list definition:
Index definition:

```sql
INDEX index_name (col_name) [USING BITMAP] COMMENT'xxxxxx'
INDEX index_name (col_name) [USING INVERTED] COMMENT'xxxxxx'
```

Example:

```sql
INDEX idx1 (k1) USING BITMAP COMMENT "This is a bitmap index1",
INDEX idx2 (k2) USING BITMAP COMMENT "This is a bitmap index2",
INDEX idx1 (k1) USING INVERTED COMMENT "This is a inverted index1",
INDEX idx2 (k2) USING INVERTED COMMENT "This is a inverted index2",
...
```

Expand Down Expand Up @@ -582,7 +582,7 @@ Set table properties. The following attributes are currently supported:
);
```

7. Create a table with bitmap index and bloom filter index
7. Create a table with inverted index and bloom filter index

```sql
CREATE TABLE example_db.table_hash
Expand All @@ -591,7 +591,7 @@ Set table properties. The following attributes are currently supported:
k2 DECIMAL(10, 2) DEFAULT "10.5",
v1 CHAR(10) REPLACE,
v2 INT SUM,
INDEX k1_idx (k1) USING BITMAP COMMENT'my first index'
INDEX k1_idx (k1) USING INVERTED COMMENT'my first index'
)
AGGREGATE KEY(k1, k2)
DISTRIBUTED BY HASH(k1) BUCKETS 32
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ DROP INDEX

### Description

This statement is used to delete the index of the specified name from a table. Currently, only bitmap indexes are supported.
This statement is used to delete the index of the specified name from a table.
grammar:

```sql
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ SHOW INDEX

### Description

This statement is used to display information about indexes in a table. Currently, only bitmap indexes are supported.
This statement is used to display information about indexes in a table.

grammar:

Expand Down
4 changes: 1 addition & 3 deletions docs/sidebars.json
Original file line number Diff line number Diff line change
Expand Up @@ -60,8 +60,7 @@
"data-table/index/index-overview",
"data-table/index/inverted-index",
"data-table/index/bloomfilter",
"data-table/index/ngram-bloomfilter-index",
"data-table/index/bitmap-index"
"data-table/index/ngram-bloomfilter-index"
]
}
]
Expand Down Expand Up @@ -939,7 +938,6 @@
"sql-manual/sql-reference/Data-Definition-Statements/Alter/ALTER-DATABASE",
"sql-manual/sql-reference/Data-Definition-Statements/Alter/ALTER-TABLE-COLUMN",
"sql-manual/sql-reference/Data-Definition-Statements/Alter/ALTER-TABLE-PARTITION",
"sql-manual/sql-reference/Data-Definition-Statements/Alter/ALTER-TABLE-BITMAP",
"sql-manual/sql-reference/Data-Definition-Statements/Alter/ALTER-TABLE-ROLLUP",
"sql-manual/sql-reference/Data-Definition-Statements/Alter/ALTER-TABLE-RENAME",
"sql-manual/sql-reference/Data-Definition-Statements/Alter/ALTER-TABLE-REPLACE",
Expand Down
Loading