diff --git a/TOC.md b/TOC.md index ce4643a991335..339a2ef3de2d3 100644 --- a/TOC.md +++ b/TOC.md @@ -8,7 +8,7 @@ + About TiDB + [TiDB Introduction](/overview.md) + [What's New in TiDB 4.0](/whats-new-in-tidb-4.0.md) - + [Key Features](/key-features.md) + + [Basic Features](/basic-features.md) + Compatibility + [MySQL Compatibility](/mysql-compatibility.md) + [TiDB Limitations](/tidb-limitations.md) diff --git a/_index.md b/_index.md index 3b740480731e5..2f94b184be7ce 100644 --- a/_index.md +++ b/_index.md @@ -1,20 +1,22 @@ --- title: TiDB Introduction -summary: Learn how to quickly start a TiDB cluster. +summary: Learn about the NewSQL database TiDB that supports HTAP workloads. category: introduction aliases: ['/docs/dev/'] --- # TiDB Introduction -[TiDB](https://github.com/pingcap/tidb) ("Ti" stands for Titanium) is an open-source NewSQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability. TiDB can be deployed on-premise or in-cloud. +[TiDB](https://github.com/pingcap/tidb) ("Ti" stands for Titanium) is an open-source, distributed, NewSQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability. TiDB can be deployed on-premise or in-cloud. + +Designed for the cloud, TiDB provides flexible scalability, reliability and security on the cloud platform. Users can elastically scale TiDB to meet the requirements of their changing workloads. [TiDB Operator](https://docs.pingcap.com/tidb-in-kubernetes/v1.1/tidb-operator-overview) helps manage TiDB on Kubernetes and automates operating tasks, which makes TiDB easier to deploy on any cloud that provides managed Kubernetes. [TiDB Cloud](https://pingcap.com/products/tidbcloud) (Beta), the fully-managed TiDB service, is the easiest, most economical, and most resilient way to unlock the full power of [TiDB in the cloud](https://docs.pingcap.com/tidbcloud/beta), allowing you to deploy and run TiDB clusters with just a few clicks. About TiDB - [TiDB Introduction](/overview.md) -- [Key Features](/key-features.md) +- [Basic Features](/basic-features.md) - [Compatibility with MySQL](/mysql-compatibility.md) - [Usage Limitations](/tidb-limitations.md) diff --git a/basic-features.md b/basic-features.md new file mode 100644 index 0000000000000..3bcc4f50a5717 --- /dev/null +++ b/basic-features.md @@ -0,0 +1,80 @@ +--- +title: TiDB Basic Features +summary: Learn about the basic features of TiDB. +category: introduction +--- + +# TiDB Basic Features + +This document introduces the basic features of TiDB. + +## Data types + +- Numeric types: `BIT`, `BOOL|BOOLEAN`, `SMALLINT`, `MEDIUMINT`, `INT|INTEGER`, `BIGINT`, `FLOAT`, `DOUBLE`, `DECIMAL`. + +- Date and time types: `DATE`, `TIME`, `DATETIME`, `TIMESTAMP`, `YEAR`. + +- String types: `CHAR`, `VARCHAR`, `TEXT`, `TINYTEXT`, `MEDIUMTEXT`, `LONGTEXT`, `BINARY`, `VARBINARY`, `BLOB`, `TINYBLOB`, `MEDIUMBLOB`, `LONGBLOB`, `ENUM`, `SET`. + +- The `JSON` type. + +## Operators + +- Arithmetic operators, bit operators, comparison operators, logical operators, date and time operators, and so on. + +## Character sets and collations + +- Character sets: `UTF8`, `UTF8MB4`, `BINARY`, `ASCII`, `LATIN1`. + +- Collations: `UTF8MB4_GENERAL_CI`, `UTF8MB4_GENERAL_BIN`, `UTF8_GENERAL_CI`, `UTF8_GENERAL_BIN`, `BINARY`. + +## Functions + +- Control flow functions, string functions, date and time functions, bit functions, data type conversion functions, data encryption and decryption functions, compression and decompression functions, information functions, JSON functions, aggregation functions, window functions, and so on. + +## SQL statements + +- Fully supports standard Data Definition Language (DDL) statements, such as `CREATE`, `DROP`, `ALTER`, `RENAME`, `TRUNCATE`, and so on. + +- Fully supports standard Data Manipulation Language (DML) statements, such as `INSERT`, `REPLACE`, `SELECT`, subqueries, `UPDATE`, `LOAD DATA`, and so on. + +- Fully supports standard transactional and locking statements, such as `START TRANSACTION`, `COMMIT`, `ROLLBACK`, `SET TRANSACTION`, and so on. + +- Fully supports standard database administration statements, such as `SHOW`, `SET`, and so on. + +- Fully supports standard utility statements, such as `DESCRIBE`, `EXPLAIN`, `USE`, and so on. + +- Fully supports the `GROUP BY` and `ORDER BY` clauses. + +- Fully supports the standard `LEFT OUTER JOIN` and `RIGHT OUTER JOIN` SQL statements. + +- Fully supports the standard SQL table and column aliases. + +## Partitioning + +- Supports Range partitioning +- Supports Hash partitioning + +## Views + +- Supports general views + +## Constraints + +- Supports non-empty constraints +- Supports primary key constraints +- Supports unique constraints + +## Security + +- Supports privilege management based on RBAC (role-based access control) +- Supports password management +- Supports communication and data encryption +- Supports IP allowlist +- Supports audit + +## Tools + +- Supports fast backup +- Supports data migration from MySQL to TiDB using tools +- Supports deploying and maintaining TiDB using tools diff --git a/key-features.md b/key-features.md deleted file mode 100644 index 89317a34ec5a7..0000000000000 --- a/key-features.md +++ /dev/null @@ -1,100 +0,0 @@ ---- -title: Key Features -summary: Key features of the TiDB database platform. -category: concepts -aliases: ['/docs/dev/key-features/'] ---- - -# Key Features - -## Horizontal scalability - -TiDB expands both SQL processing and storage by simply adding new nodes. This makes infrastructure capacity planning both easier and more cost-effective than traditional relational databases which only scale vertically. - -## MySQL compatible syntax - -TiDB acts like it is a MySQL 5.7 server to your applications. You can continue to use all of the existing MySQL client libraries, and in many cases, you will not need to change a single line of code in your application. - -TiDB does not have 100% MySQL compatibility because we built the layer from scratch in order to maximize the performance advantages inherent to a distributed system. We believe in being transparent about the level of MySQL compatibility that TiDB provides. Please check out the list of [known compatibility differences](/mysql-compatibility.md). - -## Replicate from and to MySQL - -TiDB supports the ability to replicate from a MySQL or MariaDB installation, using its Data Migration (DM) toolchain. Replication is also possible in the direction of TiDB to MySQL using the TiDB Binlog. - -We believe that being able to replicate in both directions lowers the risk when either evaluating or migrating to TiDB from MySQL. - -## Distributed transactions with strong consistency - -TiDB internally shards table into small range-based chunks that we refer to as "Regions". Each Region defaults to approximately 100MiB in size, and TiDB uses a Two-phase commit internally to ensure that Regions are maintained in a transactionally consistent way. - -Transactions in TiDB are strongly consistent, with snapshot isolation level consistency. For more information, see transaction [behavior and performance differences](/transaction-isolation-levels.md). This makes TiDB more comparable to traditional relational databases in semantics than some of the newer NoSQL systems using eventual consistency. - -These behaviors are transparent to your application(s), which only need to connect to TiDB using a MySQL 5.7 compatible client library. - -## Cloud native architecture - -TiDB is designed to work in the cloud -- public, private, or hybrid -- making deployment, provisioning, operations, and maintenance simple. - -The storage layer of TiDB, called TiKV, [became](https://www.cncf.io/blog/2018/08/28/cncf-to-host-tikv-in-the-sandbox/) a [Cloud Native Computing Foundation](https://www.cncf.io/) member project in 2018. The architecture of the TiDB platform also allows SQL processing and storage to be scaled independently of each other in a very cloud-friendly manner. - -## Minimize ETL with HTAP - -TiDB is designed to support both transaction processing (OLTP) and analytical processing (OLAP) workloads. This means that while you may have traditionally transacted on MySQL and then Extracted, Transformed and Loaded (ETL) data into a column store for analytical processing, this step is no longer required. - -With trends in business such as moving from two-day delivery to instant, it is important to be able to run analytics with minimal delay. The future is in HTAP databases which can perform the _hybrid_ of Transactional and Analytical processing. - -## Fault tolerance & recovery with Raft - -TiDB uses the Raft consensus algorithm to ensure that data is safely replicated throughout storage in Raft groups. In the event of failure, a Raft group will automatically elect a new leader for the failed member, and self-heal the TiDB cluster without any required manual intervention. - -Failure and self-healing operations are also transparent to applications. TiDB servers will retry accessing the data after the leadership change, with the only impact being slightly higher latency for queries attempting to access this specific data in between when the failure is detected and fixed. - -## Automatic rebalancing - -The storage in TiKV is automatically rebalanced to match changes in your workload. For example, if part of your data is more frequently accessed, this hotspot will be detected and may trigger the data to be rebalanced among other TiKV servers. Chunks of data ("Regions" in TiDB terminology) will automatically be split or merged as needed. - -This helps remove some of the headaches associated with maintaining a large database cluster and also leads to better utilization over traditional master-slave read-write splitting that is commonly used with MySQL deployments. - -## Deployment and orchestration with Ansible, Kubernetes, Docker - -TiDB supports several deployment and orchestration methods, like Ansible, Kubernetes, and Docker. Whether your environment is bare metal, virtualized or containerized, TiDB can be deployed, upgraded, operated, and maintained using the best toolset most suited to your needs. - -## JSON support - -TiDB supports a built-in JSON data type and set of built-in functions to search, manipulate and create JSON data. This enables you to build your application without enforcing a strict schema up front. - -## Spark integration - -TiDB natively supports an Apache Spark plug-in, called TiSpark, with a SparkSQL interface that enables users to run analytical workloads using Spark directly on TiKV, where the data is stored. This plug-in does not interfere with transactional processing in the TiDB server. This integration takes advantage of TiDB’s modular architecture to support HTAP workloads. - -## Read historical data without restoring from backup - -Many restore-from-backup events are the result of accidental deletion or modification of the wrong data. With TiDB, you can access the older versions from MVCC by specifying a timestamp in the past from when you would like to access the data. - -Your session will be placed in read-only mode while reading the earlier versions of rows, but you can use this to export the data and reload it to the current time if required. - -## Fast import and restore of data - -TiDB supports the ability to fast-import both Mydumper and .csv formatted data using an optimized insert mode that disables redo logging, and applies a number of optimizations. - -With TiDB Lightning, you can import data into TiDB at over 100GiB/hour using production-grade hardware. - -## Hybrid of column and row storage - -TiDB supports the ability to store data in both row-oriented and (coming soon) column-oriented storage. This enables a wide spectrum of both transactional and analytical queries to be executed efficiently in TiDB and TiSpark. The TiDB optimizer is also able to determine which queries are best served by column storage, and route the queries appropriately. - -## SQL plan management - -In both MySQL and TiDB, optimizer hints are available to override the default query execution plan with a better known plan. The problem with this approach is that it requires an application developer to make modifications to query text to inject the hint. This can also be undesirable in the case that an ORM is used to generate the query. - -In TiDB 3.0, you will be able to bind queries to a specific execution plan directly within the TiDB server. This method is entirely transparent to application code. - -## Open source - -TiDB has been released under the Apache 2.0 license since its initial launch in 2015. The TiDB server has (to our knowledge) the highest contributor count on GitHub of any relational database project. - -## Online schema changes - -TiDB implements the _Online, Asynchronous Schema Change_ algorithm first described in [Google's F1 paper](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41376.pdf). - -In simplified terms, this means that TiDB is able to make changes to the schema across its distributed architecture without blocking either read or write operations. There is no need to use an external schema change tool or flip between masters and slaves as is common in large MySQL deployments. diff --git a/mysql-compatibility.md b/mysql-compatibility.md index 1f860e10abb58..723e8e16fb5bf 100644 --- a/mysql-compatibility.md +++ b/mysql-compatibility.md @@ -205,4 +205,4 @@ The following column types are supported by MySQL, but **NOT** by TiDB: + FLOAT4/FLOAT8 + FIXED (alias for DECIMAL) + SERIAL (alias for BIGINT UNSIGNED NOT NULL AUTO_INCREMENT UNIQUE) -+ SQL_TSI_* (including SQL_TSI_YEAR, SQL_TSI_MONTH, SQL_TSI_WEEK, SQL_TSI_DAY, SQL_TSI_HOUR, SQL_TSI_MINUTE and SQL_TSI_SECOND) ++ `SQL_TSI_*` (including SQL_TSI_YEAR, SQL_TSI_MONTH, SQL_TSI_WEEK, SQL_TSI_DAY, SQL_TSI_HOUR, SQL_TSI_MINUTE and SQL_TSI_SECOND) diff --git a/overview.md b/overview.md index 147535ad0aea0..3978576f0ff5e 100644 --- a/overview.md +++ b/overview.md @@ -1,51 +1,50 @@ --- title: TiDB Introduction -summary: Learn how to quickly start a TiDB cluster. +summary: Learn about the key features and usage scenarios of TiDB. category: introduction -aliases: ['/docs/dev/overview/'] +aliases: ['/docs/dev/key-features/','/tidb/dev/key-features','/docs/dev/overview/'] --- # TiDB Introduction -[TiDB](https://github.com/pingcap/tidb) ("Ti" stands for Titanium) is an open-source NewSQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability. +[TiDB](https://github.com/pingcap/tidb) ("Ti" stands for Titanium) is an open-source NewSQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability. The goal of TiDB is to provide users with a one-stop database solution that covers OLTP (Online Transactional Processing), OLAP (Online Analytical Processing), and HTAP services. TiDB is suitable for various use cases that require high availability and strong consistency with large-scale data. -TiDB can be deployed on-premise or in-cloud. The following deployment options are officially supported by PingCAP: +## Key features -- [TiUP Deployment](/production-deployment-using-tiup.md): This guide describes how to deploy TiDB using [TiUP](https://github.com/pingcap-incubator/tiup). It is strongly recommended for production deployment. -- [Docker Deployment](/test-deployment-using-docker.md): This guide describes how to deploy TiDB using Docker. -- [Docker Compose Deployment](/deploy-test-cluster-using-docker-compose.md): This guide describes how to deploy TiDB using Docker compose. You can follow this guide to quickly deploy a TiDB cluster for testing and development on your local drive. -- Kubernetes Deployment: +- **Horizontally scaling out or scale in with only one click** - You can use [TiDB Operator](https://github.com/pingcap/tidb-operator) to deploy TiDB on: + The TiDB architecture design of separating computing from storage enables you to separately scale out or scale in the computing or storage capacity online as needed. The scaling process is transparent to application operations and maintenance staff. - - [AWS EKS (Elastic Kubernetes Service)](https://docs.pingcap.com/tidb-in-kubernetes/v1.1/deploy-on-aws-eks) - - [GKE (Google Kubernetes Engine)](https://docs.pingcap.com/tidb-in-kubernetes/v1.1/deploy-on-gcp-gke) - - [Google Cloud Shell](https://docs.pingcap.com/tidb-in-kubernetes/v1.1/deploy-tidb-from-kubernetes-gke) - - [Alibaba Cloud ACK (Container Service for Kubernetes)](https://docs.pingcap.com/tidb-in-kubernetes/v1.1/deploy-on-alibaba-cloud) +- **Financial-grade high availability** - Or deploy TiDB locally using: + The data is stored in multiple replicas. Data replicas obtain the transaction log using the Multi-Raft protocol. A transaction can be committed only when data has been successfully written into the majority of replicas. This can guarantee strong consistency, and availability when a minority of replicas go down. To meet the requirements of different disaster tolerance levels, you can configure the geographic location and number of replicas as needed. - - [kind](https://docs.pingcap.com/tidb-in-kubernetes/v1.1/get-started#create-a-kubernetes-cluster-using-kind) - - [Minikube](https://docs.pingcap.com/tidb-in-kubernetes/v1.1/get-started#create-a-kubernetes-cluster-using-minikube) +- **Real-time HTAP** -- [Binary Tarball Deployment](/production-deployment-from-binary-tarball.md): This guide describes how to deploy TiDB from a binary tarball in production. Guides for [development](/deploy-tidb-from-binary.md) and [testing](/test-deployment-from-binary-tarball.md) environments are also available. + TiDB provides two storage engines: [TiKV](https://tikv.org/), a row-based storage engine, and [TiFlash](/tiflash/tiflash-overview.md), a columnar storage engine. TiFlash uses the Multi-Raft Learner protocol to replicate data from TiKV in real time, ensuring that the data between the TiKV row-based storage engine and the TiFlash columnar storage engine are consistent. TiKV and TiFlash can be deployed on different machines as needed to solve the problem of HTAP resource isolation. -## Community provided blog posts & tutorials +- **Cloud-native distributed database** -The following list collects deployment guides and tutorials from the community. The content is subject to change by the contributors. + TiDB is a distributed database designed for the cloud, providing flexible scalability, reliability and security on the cloud platform. Users can elastically scale TiDB to meet the requirements of their changing workloads. In TiDB, each piece of data has 3 replicas at least, which can be scheduled in different cloud availability zones to tolerate the outage of a whole data center. [TiDB Operator](https://docs.pingcap.com/tidb-in-kubernetes/v1.1/tidb-operator-overview) helps manage TiDB on Kubernetes and automates tasks related to operating the TiDB cluster, which makes TiDB easier to deploy on any cloud that provides managed Kubernetes. [TiDB Cloud](https://pingcap.com/products/tidbcloud) (Beta), the fully-managed TiDB service, is the easiest, most economical, and most resilient way to unlock the full power of [TiDB in the cloud](https://docs.pingcap.com/tidbcloud/beta), allowing you to deploy and run TiDB clusters with just a few clicks. -- [How To Spin Up an HTAP Database in 5 Minutes with TiDB + TiSpark](https://pingcap.com/blog/how_to_spin_up_an_htap_database_in_5_minutes_with_tidb_tispark/) -- [Developer install guide (single machine)](http://www.tocker.ca/this-blog-now-powered-by-wordpress-tidb.html) -- [TiDB Best Practices](https://pingcap.com/blog/2017-07-24-tidbbestpractice/) +- **Compatible with the MySQL 5.7 protocol and MySQL ecosystem** -_Your contribution is also welcome! Feel free to open a [pull request](https://github.com/pingcap/docs/blob/master/overview.md) to add additional links._ + TiDB is compatible with the MySQL 5.7 protocol, common features of MySQL, and the MySQL ecosystem. To migrate your applications to TiDB, you do not need to change a single line of code in many cases or only need to modify a small amount of code. In addition, TiDB provides a series of [data migration tools](/migration-overview.md) to help migrate application data easily into TiDB. -## Source code +## Use cases -Source code for [all components of the TiDB platform](https://github.com/pingcap) is available on GitHub. +- **Financial industry scenarios with high requirements for data consistency, reliability, availability, scalability, and disaster tolerance** -- [TiDB](https://github.com/pingcap/tidb) -- [TiKV](https://github.com/tikv/tikv) -- [PD](https://github.com/pingcap/pd) -- [TiSpark](https://github.com/pingcap/tispark) -- [TiDB Operator](https://github.com/pingcap/tidb-operator) + As we all know, the financial industry has high requirements for data consistency, reliability, availability, scalability, and disaster tolerance. The traditional solution is to provide services in two data centers in the same city, and provide data disaster recovery but no services in a third data center located in another city. This solution has the disadvantages of low resource utilization, high maintenance cost, and the fact that RTO (Recovery Time Objective) and RPO (Recovery Point Objective) cannot meet expectations. TiDB uses multiple replicas and the Multi-Raft protocol to schedule data to different data centers, racks, and machines. When some machines fail, the system can automatically switch to ensure that the system RTO ≦ 30s and RPO = 0. + +- **Massive data and high concurrency scenarios with high requirements for storage capacity, scalability, and concurrency** + + As applications grow rapidly, the data surges. Traditional standalone databases cannot meet the data capacity requirements. The solution is to use sharding middleware or a NewSQL database (like TiDB), and the latter is more cost-effective. TiDB adopts a separate computing and storage architecture, which enables you to scale out or scale in the computing or storage capacity separately. The computing layer supports a maximum of 512 nodes, each node supports a maximum of 1,000 concurrencies, and the maximum cluster capacity is at the PB (petabytes) level. + +- **Real-time HTAP scenarios** + + With the fast growth of 5G, Internet of Things, and artificial intelligence, the data generated by a company keeps increasing tremendously, reaching a scale of hundreds of TB (terabytes) or even the PB level. The traditional solution is to process online transactional applications using an OLTP database and use an ETL (Extract, Transform, Load) tool to replicate the data into an OLAP database for data analysis. This solution has multiple disadvantages such as high storage costs and poor real-time performance. TiDB introduces the TiFlash columnar storage engine in v4.0, which combines with the TiKV row-based storage engine to build TiDB as a true HTAP database. With a small amount of extra storage cost, you can handle both online transactional processing and real-time data analysis in the same system, which greatly saves the cost. + +- **Data aggregation and secondary processing scenarios** + + The application data of most companies are scattered in different systems. As the application grows, the decision-making leaders need to understand the business status of the entire company to make decisions in time. In this case, the company needs to aggregate the scattered data into the same system and execute secondary processing to generate a T+0 or T+1 report. The traditional solution is to use ETL and Hadoop, but the Hadoop system is complicated, with high operations and maintenance cost and storage cost. Compared with Hadoop, TiDB is much simpler. You can replicate data into TiDB using ETL tools or data migration tools provided by TiDB. Reports can be directly generated using SQL statements.