Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 15 additions & 4 deletions docs/content/spark/auxiliary.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,18 +61,18 @@ SELECT * FROM default.T1 JOIN default.T2 ON xxxx;
```

## Describe table
DESCRIBE TABLE statement returns the basic metadata information of a table. The metadata information includes column name, column type and column comment.
DESCRIBE TABLE statement returns the basic metadata information of a table or view. The metadata information includes column name, column type and column comment.

```sql
-- describe table
-- describe table or view
DESCRIBE TABLE my_table;

-- describe table with additional metadata
-- describe table or view with additional metadata
DESCRIBE TABLE EXTENDED my_table;
```

## Show create table
SHOW CREATE TABLE returns the CREATE TABLE statement that was used to create a given table.
SHOW CREATE TABLE returns the CREATE TABLE statement or CREATE VIEW statement that was used to create a given table or view.

```sql
SHOW CREATE TABLE my_table;
Expand Down Expand Up @@ -107,6 +107,17 @@ SHOW TABLE EXTENDED IN db_name LIKE 'test*';
SHOW TABLE EXTENDED IN db_name LIKE 'table_name' PARTITION(pt = '2024');
```

## Show views
The SHOW VIEWS statement returns all the views for an optionally specified database.

```sql
-- Lists all views
SHOW VIEWS;

-- Lists all views that satisfy regular expressions
SHOW VIEWS LIKE 'test*';
```

## Analyze table

The ANALYZE TABLE statement collects statistics about the table, that are to be used by the query optimizer to find a better query execution plan.
Expand Down
48 changes: 39 additions & 9 deletions docs/content/spark/sql-ddl.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,9 @@ under the License.

# SQL DDL

## Create Catalog
## Catalog

### Create Catalog

Paimon catalogs currently support three types of metastores:

Expand All @@ -36,7 +38,7 @@ Paimon catalogs currently support three types of metastores:

See [CatalogOptions]({{< ref "maintenance/configurations#catalogoptions" >}}) for detailed options when creating a catalog.

### Create Filesystem Catalog
#### Create Filesystem Catalog

The following Spark SQL registers and uses a Paimon catalog named `my_catalog`. Metadata and table files are stored under `hdfs:///path/to/warehouse`.

Expand All @@ -56,7 +58,7 @@ After `spark-sql` is started, you can switch to the `default` database of the `p
USE paimon.default;
```

### Creating Hive Catalog
#### Creating Hive Catalog

By using Paimon Hive catalog, changes to the catalog will directly affect the corresponding Hive metastore. Tables created in such catalog can also be accessed directly from Hive.

Expand Down Expand Up @@ -84,13 +86,13 @@ USE paimon.default;

Also, you can create [SparkGenericCatalog]({{< ref "spark/quick-start" >}}).

#### Synchronizing Partitions into Hive Metastore
**Synchronizing Partitions into Hive Metastore**

By default, Paimon does not synchronize newly created partitions into Hive metastore. Users will see an unpartitioned table in Hive. Partition push-down will be carried out by filter push-down instead.

If you want to see a partitioned table in Hive and also synchronize newly created partitions into Hive metastore, please set the table property `metastore.partitioned-table` to true. Also see [CoreOptions]({{< ref "maintenance/configurations#coreoptions" >}}).

### Creating JDBC Catalog
#### Creating JDBC Catalog

By using the Paimon JDBC catalog, changes to the catalog will be directly stored in relational databases such as SQLite, MySQL, postgres, etc.

Expand Down Expand Up @@ -118,7 +120,9 @@ spark-sql ... \
USE paimon.default;
```

## Create Table
## Table

### Create Table

After use Paimon catalog, you can create and drop tables. Tables created in Paimon Catalogs are managed by the catalog.
When the table is dropped from catalog, its table files will also be deleted.
Expand Down Expand Up @@ -152,7 +156,7 @@ CREATE TABLE my_table (
);
```

## Create Table As Select
### Create Table As Select

Table can be created and populated by the results of a query, for example, we have a sql like this: `CREATE TABLE table_b AS SELECT id, name FORM table_a`,
The resulting table `table_b` will be equivalent to create the table and insert the data with the following statement:
Expand Down Expand Up @@ -210,8 +214,34 @@ CREATE TABLE my_table_all (
CREATE TABLE my_table_all_as PARTITIONED BY (dt) TBLPROPERTIES ('primary-key' = 'dt,hh') AS SELECT * FROM my_table_all;
```

## Tag DDL
### Create or replace Tag
## View

Views are based on the result-set of an SQL query, when using `org.apache.paimon.spark.SparkCatalog`, views are managed by paimon itself.
And in this case, views are supported when the `metastore` type is `hive`, and temporary views are not supported yet.

### Create Or Replace View

CREATE VIEW constructs a virtual table that has no physical data.

```sql
-- create a view.
CREATE VIEW v1 AS SELECT * FROM t1;

-- create a view, if a view of same name already exists, it will be replaced.
CREATE OR REPLACE VIEW v1 AS SELECT * FROM t1;
```

### Drop View

DROP VIEW removes the metadata associated with a specified view from the catalog.

```sql
-- drop a view
DROP VIEW v1;
```

## Tag
### Create or Replace Tag
Create or replace a tag syntax with the following options.
- Create a tag with or without the snapshot id and time retention.
- Create an existed tag is not failed if using `IF NOT EXISTS` syntax.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,13 @@ import org.apache.paimon.spark.catalog.SupportView
import org.apache.paimon.view.View

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.catalyst.analysis.{GetColumnByOrdinal, UnresolvedRelation}
import org.apache.spark.sql.catalyst.expressions.{Alias, UpCast}
import org.apache.spark.sql.catalyst.analysis.{GetColumnByOrdinal, UnresolvedRelation, UnresolvedTableOrView}
import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute, UpCast}
import org.apache.spark.sql.catalyst.parser.ParseException
import org.apache.spark.sql.catalyst.parser.extensions.{CurrentOrigin, Origin}
import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, Project, SubqueryAlias}
import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, LogicalPlan, Project, SubqueryAlias}
import org.apache.spark.sql.catalyst.rules.Rule
import org.apache.spark.sql.connector.catalog.PaimonLookupCatalog
import org.apache.spark.sql.connector.catalog.{Identifier, PaimonLookupCatalog}

case class PaimonViewResolver(spark: SparkSession)
extends Rule[LogicalPlan]
Expand All @@ -47,6 +47,15 @@ case class PaimonViewResolver(spark: SparkSession)
case _: ViewNotExistException =>
u
}

case u @ UnresolvedTableOrView(CatalogAndIdentifier(catalog: SupportView, ident), _, _) =>
try {
catalog.loadView(ident)
ResolvedPaimonView(catalog, ident)
} catch {
case _: ViewNotExistException =>
u
}
}

private def createViewRelation(nameParts: Seq[String], view: View): LogicalPlan = {
Expand Down Expand Up @@ -83,3 +92,7 @@ case class PaimonViewResolver(spark: SparkSession)
}
}
}

case class ResolvedPaimonView(catalog: SupportView, identifier: Identifier) extends LeafNode {
override def output: Seq[Attribute] = Nil
}
Original file line number Diff line number Diff line change
Expand Up @@ -20,13 +20,14 @@ package org.apache.paimon.spark.execution

import org.apache.paimon.spark.{SparkCatalog, SparkUtils}
import org.apache.paimon.spark.catalog.SupportView
import org.apache.paimon.spark.catalyst.analysis.ResolvedPaimonView
import org.apache.paimon.spark.catalyst.plans.logical.{CreateOrReplaceTagCommand, CreatePaimonView, DeleteTagCommand, DropPaimonView, PaimonCallCommand, RenameTagCommand, ResolvedIdentifier, ShowPaimonViews, ShowTagsCommand}

import org.apache.spark.sql.{SparkSession, Strategy}
import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.catalyst.analysis.ResolvedNamespace
import org.apache.spark.sql.catalyst.expressions.{Expression, GenericInternalRow, PredicateHelper}
import org.apache.spark.sql.catalyst.plans.logical.{CreateTableAsSelect, LogicalPlan}
import org.apache.spark.sql.catalyst.plans.logical.{CreateTableAsSelect, DescribeRelation, LogicalPlan, ShowCreateTable}
import org.apache.spark.sql.connector.catalog.{Identifier, PaimonLookupCatalog, TableCatalog}
import org.apache.spark.sql.execution.SparkPlan
import org.apache.spark.sql.execution.shim.PaimonCreateTableAsSelectStrategy
Expand Down Expand Up @@ -100,6 +101,12 @@ case class PaimonStrategy(spark: SparkSession)
if r.catalog.isInstanceOf[SupportView] =>
ShowPaimonViewsExec(output, r.catalog.asInstanceOf[SupportView], r.namespace, pattern) :: Nil

case ShowCreateTable(ResolvedPaimonView(viewCatalog, ident), _, output) =>
ShowCreatePaimonViewExec(output, viewCatalog, ident) :: Nil

case DescribeRelation(ResolvedPaimonView(viewCatalog, ident), _, isExtended, output) =>
DescribePaimonViewExec(output, viewCatalog, ident, isExtended) :: Nil

case _ => Nil
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,12 @@ package org.apache.paimon.spark.execution

import org.apache.paimon.spark.catalog.SupportView
import org.apache.paimon.spark.leafnode.PaimonLeafV2CommandExec
import org.apache.paimon.view.View

import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.catalyst.catalog.CatalogTable
import org.apache.spark.sql.catalyst.expressions.{Attribute, GenericInternalRow}
import org.apache.spark.sql.catalyst.util.StringUtils
import org.apache.spark.sql.catalyst.util.{escapeSingleQuotedString, quoteIfNeeded, StringUtils}
import org.apache.spark.sql.connector.catalog.Identifier
import org.apache.spark.sql.types.StructType
import org.apache.spark.unsafe.types.UTF8String
Expand Down Expand Up @@ -115,3 +117,116 @@ case class ShowPaimonViewsExec(
s"ShowPaimonViewsExec: $namespace"
}
}

case class ShowCreatePaimonViewExec(output: Seq[Attribute], catalog: SupportView, ident: Identifier)
extends PaimonLeafV2CommandExec {

override protected def run(): Seq[InternalRow] = {
val view = catalog.loadView(ident)

val builder = new StringBuilder
builder ++= s"CREATE VIEW ${view.fullName()} "
showDataColumns(view, builder)
showComment(view, builder)
showProperties(view, builder)
builder ++= s"AS\n${view.query}\n"

Seq(new GenericInternalRow(values = Array(UTF8String.fromString(builder.toString))))
}

private def showDataColumns(view: View, builder: StringBuilder): Unit = {
if (view.rowType().getFields.size() > 0) {
val viewColumns = view.rowType().getFields.asScala.map {
f =>
val comment = if (f.description() != null) s" COMMENT '${f.description()}'" else ""
// view columns shouldn't have data type info
s"${quoteIfNeeded(f.name)}$comment"
}
builder ++= concatByMultiLines(viewColumns)
}
}

private def showComment(view: View, builder: StringBuilder): Unit = {
if (view.comment().isPresent) {
builder ++= s"COMMENT '${view.comment().get()}'\n"
}
}

private def showProperties(view: View, builder: StringBuilder): Unit = {
if (!view.options().isEmpty) {
val props = view.options().asScala.toSeq.sortBy(_._1).map {
case (key, value) =>
s"'${escapeSingleQuotedString(key)}' = '${escapeSingleQuotedString(value)}'"
}
builder ++= s"TBLPROPERTIES ${concatByMultiLines(props)}"
}
}

private def concatByMultiLines(iter: Iterable[String]): String = {
iter.mkString("(\n ", ",\n ", ")\n")
}

override def simpleString(maxFields: Int): String = {
s"ShowCreatePaimonViewExec: $ident"
}
}

case class DescribePaimonViewExec(
output: Seq[Attribute],
catalog: SupportView,
ident: Identifier,
isExtended: Boolean)
extends PaimonLeafV2CommandExec {

override protected def run(): Seq[InternalRow] = {
val rows = new ArrayBuffer[InternalRow]()
val view = catalog.loadView(ident)

describeColumns(view, rows)
if (isExtended) {
describeExtended(view, rows)
}

rows.toSeq
}

private def describeColumns(view: View, rows: ArrayBuffer[InternalRow]) = {
view
.rowType()
.getFields
.asScala
.map(f => rows += row(f.name(), f.`type`().toString, f.description()))
}

private def describeExtended(view: View, rows: ArrayBuffer[InternalRow]) = {
rows += row("", "", "")
rows += row("# Detailed View Information", "", "")
rows += row("Name", view.fullName(), "")
rows += row("Comment", view.comment().orElse(""), "")
rows += row("View Text", view.query, "")
rows += row(
"View Query Output Columns",
view.rowType().getFieldNames.asScala.mkString("[", ", ", "]"),
"")
rows += row(
"View Properties",
view
.options()
.asScala
.toSeq
.sortBy(_._1)
.map { case (k, v) => s"$k=$v" }
.mkString("[", ", ", "]"),
"")
}

private def row(s1: String, s2: String, s3: String): InternalRow = {
new GenericInternalRow(
values =
Array(UTF8String.fromString(s1), UTF8String.fromString(s2), UTF8String.fromString(s3)))
}

override def simpleString(maxFields: Int): String = {
s"DescribePaimonViewExec: $ident"
}
}
Loading
Loading