From df246eaa94a6e30f799dd8376f9eca378f2e31fe Mon Sep 17 00:00:00 2001 From: Angerszhuuuu Date: Wed, 21 Apr 2021 11:06:07 +0800 Subject: [PATCH 1/8] [SPARK-35159][SQL][DOCS] Extract hive format doc --- ...-ref-syntax-ddl-create-table-hiveformat.md | 56 +------------ docs/sql-ref-syntax-hive-format.md | 83 +++++++++++++++++++ docs/sql-ref-syntax-qry-select-transform.md | 48 +---------- 3 files changed, 89 insertions(+), 98 deletions(-) create mode 100644 docs/sql-ref-syntax-hive-format.md diff --git a/docs/sql-ref-syntax-ddl-create-table-hiveformat.md b/docs/sql-ref-syntax-ddl-create-table-hiveformat.md index 11ec2f1d9ea85..caf8c3c3bd156 100644 --- a/docs/sql-ref-syntax-ddl-create-table-hiveformat.md +++ b/docs/sql-ref-syntax-ddl-create-table-hiveformat.md @@ -39,14 +39,6 @@ CREATE [ EXTERNAL ] TABLE [ IF NOT EXISTS ] table_identifier [ LOCATION path ] [ TBLPROPERTIES ( key1=val1, key2=val2, ... ) ] [ AS select_statement ] - -row_format: - : SERDE serde_class [ WITH SERDEPROPERTIES (k1=v1, k2=v2, ... ) ] - | DELIMITED [ FIELDS TERMINATED BY fields_terminated_char [ ESCAPED BY escaped_char ] ] - [ COLLECTION ITEMS TERMINATED BY collection_items_terminated_char ] - [ MAP KEYS TERMINATED BY map_key_terminated_char ] - [ LINES TERMINATED BY row_terminated_char ] - [ NULL DEFINED AS null_char ] ``` Note that, the clauses between the columns definition clause and the AS SELECT clause can come in @@ -82,50 +74,6 @@ as any order. For example, you can write COMMENT table_comment after TBLPROPERTI * **INTO num_buckets BUCKETS** Specifies buckets numbers, which is used in `CLUSTERED BY` clause. - -* **row_format** - - Use the `SERDE` clause to specify a custom SerDe for one table. Otherwise, use the `DELIMITED` clause to use the native SerDe and specify the delimiter, escape character, null character and so on. - -* **SERDE** - - Specifies a custom SerDe for one table. - -* **serde_class** - - Specifies a fully-qualified class name of a custom SerDe. - -* **SERDEPROPERTIES** - - A list of key-value pairs that is used to tag the SerDe definition. - -* **DELIMITED** - - The `DELIMITED` clause can be used to specify the native SerDe and state the delimiter, escape character, null character and so on. - -* **FIELDS TERMINATED BY** - - Used to define a column separator. - -* **COLLECTION ITEMS TERMINATED BY** - - Used to define a collection item separator. - -* **MAP KEYS TERMINATED BY** - - Used to define a map key separator. - -* **LINES TERMINATED BY** - - Used to define a row separator. - -* **NULL DEFINED AS** - - Used to define the specific value for NULL. - -* **ESCAPED BY** - - Used for escape mechanism. * **STORED AS** @@ -147,6 +95,10 @@ as any order. For example, you can write COMMENT table_comment after TBLPROPERTI The table is populated using the data from the select statement. +* **row_format** + + All descriptions about syntax in `row_format` can refer to [HIVE FORMAT](sql-ref-syntax-hive-format.html) + ### Examples ```sql diff --git a/docs/sql-ref-syntax-hive-format.md b/docs/sql-ref-syntax-hive-format.md new file mode 100644 index 0000000000000..bdeb5a8a8eae7 --- /dev/null +++ b/docs/sql-ref-syntax-hive-format.md @@ -0,0 +1,83 @@ +--- +layout: global +title: Data Retrieval +displayTitle: Data Retrieval +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +-- + +### Description + +Spark support Hive format in `CREATE TABLE` clause and `TRANSFORM` clause, Hive format support +`SERDE` and native `DELIMITED`. + +### Syntax + +```sql +row_format: + SERDE serde_class [ WITH SERDEPROPERTIES (k1=v1, k2=v2, ... ) ] + | DELIMITED [ FIELDS TERMINATED BY fields_terminated_char [ ESCAPED BY escaped_char ] ] + [ COLLECTION ITEMS TERMINATED BY collection_items_terminated_char ] + [ MAP KEYS TERMINATED BY map_key_terminated_char ] + [ LINES TERMINATED BY row_terminated_char ] + [ NULL DEFINED AS null_char ] +``` + +### Parameters + +* **row_format** + + Use the `SERDE` clause to specify a custom SerDe for one table or processing inputs and outputs data. Otherwise, use the `DELIMITED` clause to use the native SerDe and specify the delimiter, escape character, null character and so on. + +* **SERDE** + + Specifies a custom SerDe for one table or processing inputs and outputs data. + +* **serde_class** + + Specifies a fully-qualified class name of a custom SerDe. + +* **SERDEPROPERTIES** + + A list of key-value pairs that is used to tag the SerDe definition. + +* **DELIMITED** + + The `DELIMITED` clause can be used to specify the native SerDe and state the delimiter, escape character, null character and so on. + +* **FIELDS TERMINATED BY** + + Used to define a column separator. + +* **COLLECTION ITEMS TERMINATED BY** + + Used to define a collection item separator. + +* **MAP KEYS TERMINATED BY** + + Used to define a map key separator. + +* **LINES TERMINATED BY** + + Used to define a row separator. + +* **NULL DEFINED AS** + + Used to define the specific value for NULL. + +* **ESCAPED BY** + + Used for escape mechanism. diff --git a/docs/sql-ref-syntax-qry-select-transform.md b/docs/sql-ref-syntax-qry-select-transform.md index 814bd01ec2cfc..8e5b098689c96 100644 --- a/docs/sql-ref-syntax-qry-select-transform.md +++ b/docs/sql-ref-syntax-qry-select-transform.md @@ -33,14 +33,6 @@ SELECT TRANSFORM ( expression [ , ... ] ) USING command_or_script [ AS ( [ col_name [ col_type ] ] [ , ... ] ) ] [ ROW FORMAT row_format ] [ RECORDREADER record_reader_class ] - -row_format: - SERDE serde_class [ WITH SERDEPROPERTIES (k1=v1, k2=v2, ... ) ] - | DELIMITED [ FIELDS TERMINATED BY fields_terminated_char [ ESCAPED BY escaped_char ] ] - [ COLLECTION ITEMS TERMINATED BY collection_items_terminated_char ] - [ MAP KEYS TERMINATED BY map_key_terminated_char ] - [ LINES TERMINATED BY row_terminated_char ] - [ NULL DEFINED AS null_char ] ``` ### Parameters @@ -49,45 +41,9 @@ row_format: Specifies a combination of one or more values, operators and SQL functions that results in a value. -* **row_format** - - Otherwise, uses the `DELIMITED` clause to specify the native SerDe and state the delimiter, escape character, null character and so on. - -* **SERDE** - - Specifies a custom SerDe for one table. - -* **serde_class** - - Specifies a fully-qualified class name of a custom SerDe. - -* **DELIMITED** - - The `DELIMITED` clause can be used to specify the native SerDe and state the delimiter, escape character, null character and so on. - -* **FIELDS TERMINATED BY** - - Used to define a column separator. - -* **COLLECTION ITEMS TERMINATED BY** - - Used to define a collection item separator. +* **row_format** -* **MAP KEYS TERMINATED BY** - - Used to define a map key separator. - -* **LINES TERMINATED BY** - - Used to define a row separator. - -* **NULL DEFINED AS** - - Used to define the specific value for NULL. - -* **ESCAPED BY** - - Used for escape mechanism. + All descriptions about syntax in `row_format` can refer to [HIVE FORMAT](sql-ref-syntax-hive-format.html) * **RECORDWRITER** From 5d5397681f42f7a8a499cb5d2aa63abbd8da140a Mon Sep 17 00:00:00 2001 From: Angerszhuuuu Date: Wed, 21 Apr 2021 11:10:30 +0800 Subject: [PATCH 2/8] Update sql-ref-syntax-ddl-create-table-hiveformat.md --- docs/sql-ref-syntax-ddl-create-table-hiveformat.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/sql-ref-syntax-ddl-create-table-hiveformat.md b/docs/sql-ref-syntax-ddl-create-table-hiveformat.md index caf8c3c3bd156..1ef9a7c2e65a3 100644 --- a/docs/sql-ref-syntax-ddl-create-table-hiveformat.md +++ b/docs/sql-ref-syntax-ddl-create-table-hiveformat.md @@ -75,6 +75,10 @@ as any order. For example, you can write COMMENT table_comment after TBLPROPERTI Specifies buckets numbers, which is used in `CLUSTERED BY` clause. +* **row_format** + + All descriptions about syntax in `row_format` can refer to [HIVE FORMAT](sql-ref-syntax-hive-format.html) + * **STORED AS** File format for table storage, could be TEXTFILE, ORC, PARQUET, etc. @@ -95,10 +99,6 @@ as any order. For example, you can write COMMENT table_comment after TBLPROPERTI The table is populated using the data from the select statement. -* **row_format** - - All descriptions about syntax in `row_format` can refer to [HIVE FORMAT](sql-ref-syntax-hive-format.html) - ### Examples ```sql From a49f1e5bb44e7b3066a7455f3ec99606456aa48a Mon Sep 17 00:00:00 2001 From: Angerszhuuuu Date: Wed, 21 Apr 2021 11:12:13 +0800 Subject: [PATCH 3/8] Update sql-ref-syntax-ddl-create-table-hiveformat.md --- docs/sql-ref-syntax-ddl-create-table-hiveformat.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/sql-ref-syntax-ddl-create-table-hiveformat.md b/docs/sql-ref-syntax-ddl-create-table-hiveformat.md index 1ef9a7c2e65a3..38953212d2471 100644 --- a/docs/sql-ref-syntax-ddl-create-table-hiveformat.md +++ b/docs/sql-ref-syntax-ddl-create-table-hiveformat.md @@ -76,7 +76,7 @@ as any order. For example, you can write COMMENT table_comment after TBLPROPERTI Specifies buckets numbers, which is used in `CLUSTERED BY` clause. * **row_format** - + All descriptions about syntax in `row_format` can refer to [HIVE FORMAT](sql-ref-syntax-hive-format.html) * **STORED AS** From d98f82521a505b14d04bafd63191c033a3443bb3 Mon Sep 17 00:00:00 2001 From: Angerszhuuuu Date: Wed, 21 Apr 2021 11:12:27 +0800 Subject: [PATCH 4/8] Update sql-ref-syntax-qry-select-transform.md --- docs/sql-ref-syntax-qry-select-transform.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/sql-ref-syntax-qry-select-transform.md b/docs/sql-ref-syntax-qry-select-transform.md index 8e5b098689c96..55949017ded1b 100644 --- a/docs/sql-ref-syntax-qry-select-transform.md +++ b/docs/sql-ref-syntax-qry-select-transform.md @@ -42,7 +42,7 @@ SELECT TRANSFORM ( expression [ , ... ] ) Specifies a combination of one or more values, operators and SQL functions that results in a value. * **row_format** - + All descriptions about syntax in `row_format` can refer to [HIVE FORMAT](sql-ref-syntax-hive-format.html) * **RECORDWRITER** From 203f544d637cf9668998cbb018eb38bfd585ebdb Mon Sep 17 00:00:00 2001 From: Angerszhuuuu Date: Wed, 21 Apr 2021 18:11:24 +0800 Subject: [PATCH 5/8] Update sql-ref-syntax-hive-format.md --- docs/sql-ref-syntax-hive-format.md | 20 +++++++------------- 1 file changed, 7 insertions(+), 13 deletions(-) diff --git a/docs/sql-ref-syntax-hive-format.md b/docs/sql-ref-syntax-hive-format.md index bdeb5a8a8eae7..eefc62b0dbc2b 100644 --- a/docs/sql-ref-syntax-hive-format.md +++ b/docs/sql-ref-syntax-hive-format.md @@ -21,8 +21,10 @@ license: | ### Description -Spark support Hive format in `CREATE TABLE` clause and `TRANSFORM` clause, Hive format support -`SERDE` and native `DELIMITED`. +Spark supports Hive format in `CREATE TABLE` clause and `TRANSFORM` clause, +to specify serde or text delimeter. In `row_format`, uses the `SERDE` clause to specify a custom SerDe +for one table or processing inputs and outputs data. Otherwise, use the `DELIMITED` clause +to use the native SerDe and specify the delimiter, escape character, null character and so on. ### Syntax @@ -37,18 +39,10 @@ row_format: ``` ### Parameters + +* **SERDE serde_class** -* **row_format** - - Use the `SERDE` clause to specify a custom SerDe for one table or processing inputs and outputs data. Otherwise, use the `DELIMITED` clause to use the native SerDe and specify the delimiter, escape character, null character and so on. - -* **SERDE** - - Specifies a custom SerDe for one table or processing inputs and outputs data. - -* **serde_class** - - Specifies a fully-qualified class name of a custom SerDe. + Specifies a fully-qualified class name of custom SerDe for one table or processing inputs and outputs data. * **SERDEPROPERTIES** From 5fe64b575b8aed7d050e1e9959dbfc134f8b9345 Mon Sep 17 00:00:00 2001 From: Angerszhuuuu Date: Thu, 22 Apr 2021 16:34:40 +0800 Subject: [PATCH 6/8] Update sql-ref-syntax-hive-format.md --- docs/sql-ref-syntax-hive-format.md | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/docs/sql-ref-syntax-hive-format.md b/docs/sql-ref-syntax-hive-format.md index eefc62b0dbc2b..126c572681566 100644 --- a/docs/sql-ref-syntax-hive-format.md +++ b/docs/sql-ref-syntax-hive-format.md @@ -22,9 +22,9 @@ license: | ### Description Spark supports Hive format in `CREATE TABLE` clause and `TRANSFORM` clause, -to specify serde or text delimeter. In `row_format`, uses the `SERDE` clause to specify a custom SerDe -for one table or processing inputs and outputs data. Otherwise, use the `DELIMITED` clause -to use the native SerDe and specify the delimiter, escape character, null character and so on. +to specify serde or text delimeter. In `row_format` There are two ways to specify the `row_format`: + 1. Use the `SERDE` clause to specify a custom SerDe class + 2. Use the `DELIMITED` clause to specify the delimiter, escape character, null character and so on for the native text Serde. ### Syntax @@ -42,16 +42,12 @@ row_format: * **SERDE serde_class** - Specifies a fully-qualified class name of custom SerDe for one table or processing inputs and outputs data. + Specifies a fully-qualified class name of custom SerDe. * **SERDEPROPERTIES** A list of key-value pairs that is used to tag the SerDe definition. - -* **DELIMITED** - The `DELIMITED` clause can be used to specify the native SerDe and state the delimiter, escape character, null character and so on. - * **FIELDS TERMINATED BY** Used to define a column separator. From be29ae834ca378435384c55a5cdd88907ab00390 Mon Sep 17 00:00:00 2001 From: Angerszhuuuu Date: Fri, 23 Apr 2021 08:44:59 +0800 Subject: [PATCH 7/8] follow comment --- docs/sql-ref-syntax-ddl-create-table-hiveformat.md | 2 +- docs/sql-ref-syntax-hive-format.md | 12 ++++++------ docs/sql-ref-syntax-qry-select-transform.md | 2 +- 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/sql-ref-syntax-ddl-create-table-hiveformat.md b/docs/sql-ref-syntax-ddl-create-table-hiveformat.md index 38953212d2471..b2f5957416a80 100644 --- a/docs/sql-ref-syntax-ddl-create-table-hiveformat.md +++ b/docs/sql-ref-syntax-ddl-create-table-hiveformat.md @@ -77,7 +77,7 @@ as any order. For example, you can write COMMENT table_comment after TBLPROPERTI * **row_format** - All descriptions about syntax in `row_format` can refer to [HIVE FORMAT](sql-ref-syntax-hive-format.html) + Specifies the row format for input and output. See [HIVE FORMAT](sql-ref-syntax-hive-format.html) for more syntax details. * **STORED AS** diff --git a/docs/sql-ref-syntax-hive-format.md b/docs/sql-ref-syntax-hive-format.md index 126c572681566..b6292d5b09f7b 100644 --- a/docs/sql-ref-syntax-hive-format.md +++ b/docs/sql-ref-syntax-hive-format.md @@ -1,7 +1,7 @@ --- layout: global -title: Data Retrieval -displayTitle: Data Retrieval +title: Hive Row Format +displayTitle: Hive Row Format license: | Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with @@ -21,10 +21,10 @@ license: | ### Description -Spark supports Hive format in `CREATE TABLE` clause and `TRANSFORM` clause, -to specify serde or text delimeter. In `row_format` There are two ways to specify the `row_format`: - 1. Use the `SERDE` clause to specify a custom SerDe class - 2. Use the `DELIMITED` clause to specify the delimiter, escape character, null character and so on for the native text Serde. +Spark supports a Hive row format in `CREATE TABLE` and `TRANSFORM` clause, to specify serde or text delimiter. +There are two ways to define a row format in `row_format` of `CREATE TABLE` and `TRANSFORM` clauses. + 1. `SERDE` clause to specify a custom SerDe class. + 2. `DELIMITED` clause to specify a delimiter, an escape character, a null character, and so on for the native SerDe. ### Syntax diff --git a/docs/sql-ref-syntax-qry-select-transform.md b/docs/sql-ref-syntax-qry-select-transform.md index 55949017ded1b..21966f2e1cc34 100644 --- a/docs/sql-ref-syntax-qry-select-transform.md +++ b/docs/sql-ref-syntax-qry-select-transform.md @@ -43,7 +43,7 @@ SELECT TRANSFORM ( expression [ , ... ] ) * **row_format** - All descriptions about syntax in `row_format` can refer to [HIVE FORMAT](sql-ref-syntax-hive-format.html) + Specifies the row format for input and output. See [HIVE FORMAT](sql-ref-syntax-hive-format.html) for more syntax details. * **RECORDWRITER** From 9bfa1cfe7e865ea4f3fa6c8d75b0ce0b22388bf9 Mon Sep 17 00:00:00 2001 From: Angerszhuuuu Date: Fri, 23 Apr 2021 08:51:36 +0800 Subject: [PATCH 8/8] Update sql-ref-syntax-hive-format.md --- docs/sql-ref-syntax-hive-format.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/sql-ref-syntax-hive-format.md b/docs/sql-ref-syntax-hive-format.md index b6292d5b09f7b..8092e582d97ad 100644 --- a/docs/sql-ref-syntax-hive-format.md +++ b/docs/sql-ref-syntax-hive-format.md @@ -17,11 +17,11 @@ license: | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --- +--- ### Description -Spark supports a Hive row format in `CREATE TABLE` and `TRANSFORM` clause, to specify serde or text delimiter. +Spark supports a Hive row format in `CREATE TABLE` and `TRANSFORM` clause to specify serde or text delimiter. There are two ways to define a row format in `row_format` of `CREATE TABLE` and `TRANSFORM` clauses. 1. `SERDE` clause to specify a custom SerDe class. 2. `DELIMITED` clause to specify a delimiter, an escape character, a null character, and so on for the native SerDe.