From b5734a0af56b094a721713ff52dac82f68a04835 Mon Sep 17 00:00:00 2001 From: Jiashu-Hu Date: Thu, 20 Mar 2025 00:24:49 -0500 Subject: [PATCH 1/4] added explaination for Schema and DFSchema to documentation --- .../library-user-guide/working-with-exprs.md | 27 +++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/docs/source/library-user-guide/working-with-exprs.md b/docs/source/library-user-guide/working-with-exprs.md index 1a6e9123086d6..e50ff28373ff7 100644 --- a/docs/source/library-user-guide/working-with-exprs.md +++ b/docs/source/library-user-guide/working-with-exprs.md @@ -50,6 +50,33 @@ As another example, the SQL expression `a + b * c` would be represented as an `E As the writer of a library, you can use `Expr`s to represent computations that you want to perform. This guide will walk you through how to make your own scalar UDF as an `Expr` and how to rewrite `Expr`s to inline the simple UDF. +## Arrow Schema and DataFusion DFSchema + +Schema and DFSchema are both exist in datafusion because `Schema` provides a lightweight structure for defining data, and `DFSchema` extends it with extra information. This makes the engine could handle both simple data definitions and complex query scenarios efficiently. + +### Difference between Schema and DFSchema + +- Schema: A fundamental component of Apache Arrow, `Schema` defines a dataset's structure, specifying column names and their data types. + + > Please see [Struct Schema](https://docs.rs/arrow-schema/54.2.1/arrow_schema/struct.Schema.html) for a detailed document of Arrow Schema. + +- DFSchema: Extending `Schema`, `DFSchema` incorporates qualifiers such as table names, enabling it to carry additional context when required. This is particularly valuable for managing queries across multiple tables. + > Please see [Struct DFSchema](https://docs.rs/datafusion/latest/datafusion/common/struct.DFSchema.html) for a detailed document of DFSchema. + +### How to convert between Schema and DFSchema + +From Schema to DFSchema: Use `DFSchema::try_from_qualified_schema` with a table name and original schema. See example below: + +``` +let df_schema = DFSchema::try_from_qualified_schema("t1", &arrow_schema).unwrap(); +``` + +From DFSchema to Schema: Since the `Into` trait has been implemented for DFSchema to convert it into an Arrow Schema, you can simply use this trait to revert: + +``` +let schema = Schema::from(df_schema); +``` + ## Creating and Evaluating `Expr`s Please see [expr_api.rs](https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/expr_api.rs) for well commented code for creating, evaluating, simplifying, and analyzing `Expr`s. From e92301bcfe1ff7ff044649c4c72fea920933384e Mon Sep 17 00:00:00 2001 From: Jiashu-Hu Date: Thu, 20 Mar 2025 00:57:33 -0500 Subject: [PATCH 2/4] change code block to quote block since CICL request a full code but we are only introducing syntax --- docs/source/library-user-guide/working-with-exprs.md | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/docs/source/library-user-guide/working-with-exprs.md b/docs/source/library-user-guide/working-with-exprs.md index e50ff28373ff7..13d12deba2ea4 100644 --- a/docs/source/library-user-guide/working-with-exprs.md +++ b/docs/source/library-user-guide/working-with-exprs.md @@ -67,15 +67,11 @@ Schema and DFSchema are both exist in datafusion because `Schema` provides a lig From Schema to DFSchema: Use `DFSchema::try_from_qualified_schema` with a table name and original schema. See example below: -``` -let df_schema = DFSchema::try_from_qualified_schema("t1", &arrow_schema).unwrap(); -``` +> let df_schema = DFSchema::try_from_qualified_schema("t1", &arrow_schema).unwrap(); From DFSchema to Schema: Since the `Into` trait has been implemented for DFSchema to convert it into an Arrow Schema, you can simply use this trait to revert: -``` -let schema = Schema::from(df_schema); -``` +> let schema = Schema::from(df_schema);` ## Creating and Evaluating `Expr`s From 9df2a302397df497d028709eb4642454c3b085bd Mon Sep 17 00:00:00 2001 From: Jiashu-Hu Date: Thu, 20 Mar 2025 13:51:44 -0500 Subject: [PATCH 3/4] improved language --- docs/source/library-user-guide/working-with-exprs.md | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/docs/source/library-user-guide/working-with-exprs.md b/docs/source/library-user-guide/working-with-exprs.md index 13d12deba2ea4..113d3b434c708 100644 --- a/docs/source/library-user-guide/working-with-exprs.md +++ b/docs/source/library-user-guide/working-with-exprs.md @@ -65,13 +65,9 @@ Schema and DFSchema are both exist in datafusion because `Schema` provides a lig ### How to convert between Schema and DFSchema -From Schema to DFSchema: Use `DFSchema::try_from_qualified_schema` with a table name and original schema. See example below: +From Schema to DFSchema: Use `DFSchema::try_from_qualified_schema` with a table name and original schema, for detailed code example please see [creating-qualified-schemas](https://docs.rs/datafusion/latest/datafusion/common/struct.DFSchema.html#creating-qualified-schemas). -> let df_schema = DFSchema::try_from_qualified_schema("t1", &arrow_schema).unwrap(); - -From DFSchema to Schema: Since the `Into` trait has been implemented for DFSchema to convert it into an Arrow Schema, you can simply use this trait to revert: - -> let schema = Schema::from(df_schema);` +From DFSchema to Schema: Since the `Into` trait has been implemented for DFSchema to convert it into an Arrow Schema, for detailed code example please see [converting-back-to-arrow-schema](https://docs.rs/datafusion/latest/datafusion/common/struct.DFSchema.html#converting-back-to-arrow-schema). ## Creating and Evaluating `Expr`s From f54060e6ae00ecf24356bc5dae681bc2fa1c2525 Mon Sep 17 00:00:00 2001 From: Oleks V Date: Fri, 21 Mar 2025 09:27:49 -0700 Subject: [PATCH 4/4] Update docs/source/library-user-guide/working-with-exprs.md --- docs/source/library-user-guide/working-with-exprs.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/library-user-guide/working-with-exprs.md b/docs/source/library-user-guide/working-with-exprs.md index 113d3b434c708..df4e5e3940aa6 100644 --- a/docs/source/library-user-guide/working-with-exprs.md +++ b/docs/source/library-user-guide/working-with-exprs.md @@ -52,7 +52,7 @@ As the writer of a library, you can use `Expr`s to represent computations that y ## Arrow Schema and DataFusion DFSchema -Schema and DFSchema are both exist in datafusion because `Schema` provides a lightweight structure for defining data, and `DFSchema` extends it with extra information. This makes the engine could handle both simple data definitions and complex query scenarios efficiently. +Apache Arrow `Schema` provides a lightweight structure for defining data, and Apache Datafusion`DFSchema` extends it with extra information such as column qualifiers and functional dependencies. Column qualifiers are multi part path to the table e.g table, schema, catalog. Functional Dependency is the relationship between attributes(characteristics) of a table related to each other. ### Difference between Schema and DFSchema