diff --git a/datafusion/sql/src/parser.rs b/datafusion/sql/src/parser.rs index 27bfa32501026..00386fd06ef00 100644 --- a/datafusion/sql/src/parser.rs +++ b/datafusion/sql/src/parser.rs @@ -43,6 +43,30 @@ fn parse_file_type(s: &str) -> Result { } /// DataFusion extension DDL for `CREATE EXTERNAL TABLE` +/// +/// Syntax: +/// +/// ```text +/// CREATE EXTERNAL TABLE +/// [ IF NOT EXISTS ] +/// [ () ] +/// STORED AS +/// [ WITH HEADER ROW ] +/// [ DELIMITER ] +/// [ COMPRESSION TYPE ] +/// [ PARTITIONED BY () ] +/// [ WITH ORDER () +/// [ OPTIONS () ] +/// LOCATION +/// +/// := ( , ...) +/// +/// := (, ...) +/// +/// := ( , ...) +/// +/// := ( , ...) +/// ``` #[derive(Debug, Clone, PartialEq, Eq)] pub struct CreateExternalTable { /// Table name diff --git a/docs/source/user-guide/sql/ddl.md b/docs/source/user-guide/sql/ddl.md index 45d7d81a0aec1..3bad7596459f2 100644 --- a/docs/source/user-guide/sql/ddl.md +++ b/docs/source/user-guide/sql/ddl.md @@ -47,8 +47,41 @@ CREATE SCHEMA cat.emu; ## CREATE EXTERNAL TABLE -Parquet data sources can be registered by executing a `CREATE EXTERNAL TABLE` SQL statement. It is not necessary -to provide schema information for Parquet files. +`CREATE EXTERNAL TABLE` SQL statement registers a location on a local +file system or remote object store as a named table which can be queried. + +The supported syntax is: + +``` +CREATE EXTERNAL TABLE +[ IF NOT EXISTS ] +[ () ] +STORED AS +[ WITH HEADER ROW ] +[ DELIMITER ] +[ COMPRESSION TYPE ] +[ PARTITIONED BY () ] +[ WITH ORDER () +[ OPTIONS () ] +LOCATION + + := ( , ...) + + := (, ...) + + := ( , ...) + + := ( , ...) +``` + +`file_type` is one of `CSV`, `PARQUET`, `AVRO` or `JSON` + +`LOCATION ` specfies the location to find the data. It can be +a path to a file or directory of partitioned files locally or on an +object store. + +Parquet data sources can be registered by executing a `CREATE EXTERNAL TABLE` SQL statement such as the following. It is not necessary to +provide schema information for Parquet files. ```sql CREATE EXTERNAL TABLE taxi @@ -56,8 +89,8 @@ STORED AS PARQUET LOCATION '/mnt/nyctaxi/tripdata.parquet'; ``` -CSV data sources can also be registered by executing a `CREATE EXTERNAL TABLE` SQL statement. The schema will be -inferred based on scanning a subset of the file. +CSV data sources can also be registered by executing a `CREATE EXTERNAL TABLE` SQL statement. The schema will be inferred based on +scanning a subset of the file. ```sql CREATE EXTERNAL TABLE test @@ -89,9 +122,20 @@ WITH HEADER ROW LOCATION '/path/to/aggregate_test_100.csv'; ``` -When creating an output from a data source that is already ordered by an expression, you can pre-specify the order of -the data using the `WITH ORDER` clause. This applies even if the expression used for sorting is complex, -allowing for greater flexibility. +It is also possible to specify a directory that contains a partitioned +table (multiple files with the same schema) + +```sql +CREATE EXTERNAL TABLE test +STORED AS CSV +WITH HEADER ROW +LOCATION '/path/to/directory/of/files'; +``` + +When creating an output from a data source that is already ordered by +an expression, you can pre-specify the order of the data using the +`WITH ORDER` clause. This applies even if the expression used for +sorting is complex, allowing for greater flexibility. Here's an example of how to use `WITH ORDER` clause.