From f0929f84a5329aeee0448a0ab2d227d0dbd58044 Mon Sep 17 00:00:00 2001 From: Andrew Lamb Date: Wed, 12 Oct 2022 16:23:20 -0400 Subject: [PATCH 1/5] Add rewrite_expr to example to run --- .github/workflows/rust.yml | 1 + 1 file changed, 1 insertion(+) diff --git a/.github/workflows/rust.yml b/.github/workflows/rust.yml index 31014fd306599..a7cb9e51dc518 100644 --- a/.github/workflows/rust.yml +++ b/.github/workflows/rust.yml @@ -120,6 +120,7 @@ jobs: cargo run --example parquet_sql cargo run --example parquet_sql_multiple_files cargo run --example memtable + cargo run --example rewrite_expr cargo run --example simple_udf cargo run --example simple_udaf From 95174fc6eda2f9b8904fa6f98b648053a190f9db Mon Sep 17 00:00:00 2001 From: Andrew Lamb Date: Wed, 12 Oct 2022 16:23:48 -0400 Subject: [PATCH 2/5] Move datafusion-examples README --- datafusion-examples/{examples => }/README.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename datafusion-examples/{examples => }/README.md (100%) diff --git a/datafusion-examples/examples/README.md b/datafusion-examples/README.md similarity index 100% rename from datafusion-examples/examples/README.md rename to datafusion-examples/README.md From 651b16d208db587d75276565384697008991eb9a Mon Sep 17 00:00:00 2001 From: Andrew Lamb Date: Wed, 12 Oct 2022 16:35:00 -0400 Subject: [PATCH 3/5] Update README to include all examples --- datafusion-examples/README.md | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/datafusion-examples/README.md b/datafusion-examples/README.md index 58c47e633a2a0..1512f111b7b3b 100644 --- a/datafusion-examples/README.md +++ b/datafusion-examples/README.md @@ -19,14 +19,31 @@ # DataFusion Examples +This crate includes several examples of how to use various DataFusion APIs and help you on your way. + Prerequisites: Run `git submodule update --init` to init test files. ## Single Process -The examples `csv_sql.rs` and `parquet_sql.rs` demonstrate building a query plan from a SQL statement and then executing the query plan against local CSV and Parquet files, respectively. +* [`avro_sql.rs`](examples/avro_sql.rs): Build and run a query plan from a SQL statement against a local AVRO file +* [`csv_sql.rs`](examples/csv_sql.rs): Build and run a query plan from a SQL statement against a local CSV file +* [`custom_datasource.rs`](examples/custom_datasource.rs): Run queris against a custom datasource (TableProvider) +* [`dataframe.rs`](examples/dataframe.rs): Run a query using a DataFrame against a local parquet file +* [`dataframe_in_memory.rs`](examples/dataframe_in_memory.rs): Run a query using a DataFrame against data in memory +* [`deserialize_to_struct.rs`](examples/deserialize_to_struct.rs): Convert query results into rust structs using serde +* [`expr_api.rs`](examples/expr_api.rs): Use the `Expr` construction and simplification API +* [`memtable.rs`](examples/memtable.rs): Create an query data in memory using SQL and `RecordBatch`es +* [`parquet_sql.rs`](examples/parquet_sql.rs): Build and run a query plan from a SQL statement against a local Parquet file +* [`parquet_sql_multiple_files.rs`](examples/parquet_sql_multiple_files.rs): Build and run a query plan from a SQL statement against multiple local Parquet files +* [`query-aws-s3.rs`](examples/query-aws-s3.rs): Confiure `object_store` and run a query against files stored in AWS S3 +* [`rewrite_expr.rs`](examples/rewrite_expr.rs): Define and invoke a custom Query Optimizer pass +* [`simple_udaf.rs`](examples/simple_udaf.rs): Define and invoke a User Defined Aggregate Function (UDAF) +* [`simple_udf.rs`](examples/simple_udf.rs): Define and invoke a User Defined (scalar) Function (UDF) + + ## Distributed -The `flight-client.rs` and `flight-server.rs` examples demonstrate how to run DataFusion as a standalone process and execute SQL queries from a client using the Flight protocol. +* [`flight-client.rs`](examples/flight-client.rs) and [`flight-server.rs`](examples/flight-server.rs): Run DataFusion as a standalone process and execute SQL queries from a client using the Flight protocol. From ec1f92c7331c3cd8e5835615e0a7991c17a3b3a9 Mon Sep 17 00:00:00 2001 From: Andrew Lamb Date: Wed, 12 Oct 2022 16:39:32 -0400 Subject: [PATCH 4/5] add note on the main page --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 3057838520c9e..291136782b431 100644 --- a/README.md +++ b/README.md @@ -100,7 +100,7 @@ Here are some of the projects known to use DataFusion: ## Example Usage -Please see [example usage](https://arrow.apache.org/datafusion/user-guide/example-usage.html) to find how to use DataFusion. +Please see the [example usage](https://arrow.apache.org/datafusion/user-guide/example-usage.html) in the user guide and the [datafusion-examples](https://github.com/apache/arrow-datafusion/tree/master/datafusion-examples) crate for more information on how to use DataFusion. ## Roadmap From a8cfada24dba50d02124f53da55a2c22c18c96e6 Mon Sep 17 00:00:00 2001 From: Andrew Lamb Date: Wed, 12 Oct 2022 16:40:21 -0400 Subject: [PATCH 5/5] prettier --- datafusion-examples/README.md | 32 +++++++++++++++----------------- 1 file changed, 15 insertions(+), 17 deletions(-) diff --git a/datafusion-examples/README.md b/datafusion-examples/README.md index 1512f111b7b3b..ea65987ad4355 100644 --- a/datafusion-examples/README.md +++ b/datafusion-examples/README.md @@ -27,23 +27,21 @@ Run `git submodule update --init` to init test files. ## Single Process -* [`avro_sql.rs`](examples/avro_sql.rs): Build and run a query plan from a SQL statement against a local AVRO file -* [`csv_sql.rs`](examples/csv_sql.rs): Build and run a query plan from a SQL statement against a local CSV file -* [`custom_datasource.rs`](examples/custom_datasource.rs): Run queris against a custom datasource (TableProvider) -* [`dataframe.rs`](examples/dataframe.rs): Run a query using a DataFrame against a local parquet file -* [`dataframe_in_memory.rs`](examples/dataframe_in_memory.rs): Run a query using a DataFrame against data in memory -* [`deserialize_to_struct.rs`](examples/deserialize_to_struct.rs): Convert query results into rust structs using serde -* [`expr_api.rs`](examples/expr_api.rs): Use the `Expr` construction and simplification API -* [`memtable.rs`](examples/memtable.rs): Create an query data in memory using SQL and `RecordBatch`es -* [`parquet_sql.rs`](examples/parquet_sql.rs): Build and run a query plan from a SQL statement against a local Parquet file -* [`parquet_sql_multiple_files.rs`](examples/parquet_sql_multiple_files.rs): Build and run a query plan from a SQL statement against multiple local Parquet files -* [`query-aws-s3.rs`](examples/query-aws-s3.rs): Confiure `object_store` and run a query against files stored in AWS S3 -* [`rewrite_expr.rs`](examples/rewrite_expr.rs): Define and invoke a custom Query Optimizer pass -* [`simple_udaf.rs`](examples/simple_udaf.rs): Define and invoke a User Defined Aggregate Function (UDAF) -* [`simple_udf.rs`](examples/simple_udf.rs): Define and invoke a User Defined (scalar) Function (UDF) - - +- [`avro_sql.rs`](examples/avro_sql.rs): Build and run a query plan from a SQL statement against a local AVRO file +- [`csv_sql.rs`](examples/csv_sql.rs): Build and run a query plan from a SQL statement against a local CSV file +- [`custom_datasource.rs`](examples/custom_datasource.rs): Run queris against a custom datasource (TableProvider) +- [`dataframe.rs`](examples/dataframe.rs): Run a query using a DataFrame against a local parquet file +- [`dataframe_in_memory.rs`](examples/dataframe_in_memory.rs): Run a query using a DataFrame against data in memory +- [`deserialize_to_struct.rs`](examples/deserialize_to_struct.rs): Convert query results into rust structs using serde +- [`expr_api.rs`](examples/expr_api.rs): Use the `Expr` construction and simplification API +- [`memtable.rs`](examples/memtable.rs): Create an query data in memory using SQL and `RecordBatch`es +- [`parquet_sql.rs`](examples/parquet_sql.rs): Build and run a query plan from a SQL statement against a local Parquet file +- [`parquet_sql_multiple_files.rs`](examples/parquet_sql_multiple_files.rs): Build and run a query plan from a SQL statement against multiple local Parquet files +- [`query-aws-s3.rs`](examples/query-aws-s3.rs): Confiure `object_store` and run a query against files stored in AWS S3 +- [`rewrite_expr.rs`](examples/rewrite_expr.rs): Define and invoke a custom Query Optimizer pass +- [`simple_udaf.rs`](examples/simple_udaf.rs): Define and invoke a User Defined Aggregate Function (UDAF) +- [`simple_udf.rs`](examples/simple_udf.rs): Define and invoke a User Defined (scalar) Function (UDF) ## Distributed -* [`flight-client.rs`](examples/flight-client.rs) and [`flight-server.rs`](examples/flight-server.rs): Run DataFusion as a standalone process and execute SQL queries from a client using the Flight protocol. +- [`flight-client.rs`](examples/flight-client.rs) and [`flight-server.rs`](examples/flight-server.rs): Run DataFusion as a standalone process and execute SQL queries from a client using the Flight protocol.