From e5612544123d53c8a55ca030abc59ec6c9f8e872 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E7=8E=8B=E7=A3=8A?= Date: Thu, 12 May 2022 19:10:36 +0800 Subject: [PATCH 1/2] REDME File Modity --- docs/en/ecosystem/spark-doris-connector.md | 15 ++++++++++++++- docs/zh-CN/ecosystem/spark-doris-connector.md | 17 ++++++++++++++++- 2 files changed, 30 insertions(+), 2 deletions(-) diff --git a/docs/en/ecosystem/spark-doris-connector.md b/docs/en/ecosystem/spark-doris-connector.md index 8a565f85093827..79f73f9c9fc2ef 100644 --- a/docs/en/ecosystem/spark-doris-connector.md +++ b/docs/en/ecosystem/spark-doris-connector.md @@ -96,7 +96,7 @@ sh build.sh 3.1.2 2.12 ## spark 3.1.2 version, and scala 2.12 ``` > Note: If you check out the source code from tag, you can just run sh build.sh --tag without specifying the spark and scala versions. This is because the version in the tag source code is fixed. -After successful compilation, the file `doris-spark-2.3.4-2.11-1.0.0-SNAPSHOT.jar` will be generated in the `output/` directory. Copy this file to `ClassPath` in `Spark` to use `Spark-Doris-Connector`. For example, `Spark` running in `Local` mode, put this file in the `jars/` folder. `Spark` running in `Yarn` cluster mode, put this file in the pre-deployment package. +After successful compilation, the file `doris-spark-2.3.4-2.11-1.0.0-SNAPSHOT.jar` will be generated in the `output/` directory. Copy this file to `ClassPath` in `Spark` to use `Spark-Doris-Connector`. For example, `Spark` running in `Local` mode, put this file in the `jars/` folder. `Spark` running in `Yarn` cluster mode, put this file in the pre-deployment package Link:[apache/incubator-doris#9486](https://github.com/apache/incubator-doris/discussions/9486). ## Using Maven @@ -159,6 +159,19 @@ val dorisSparkRDD = sc.dorisRDD( dorisSparkRDD.collect() ``` +#### pySpark + +```scala +dorisSparkDF = spark.read.format("doris") +.option("doris.table.identifier", "$YOUR_DORIS_DATABASE_NAME.$YOUR_DORIS_TABLE_NAME") +.option("doris.fenodes", "$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT") +.option("user", "$YOUR_DORIS_USERNAME") +.option("password", "$YOUR_DORIS_PASSWORD") +.load() +# show 5 lines data +dorisSparkDF.show(5) +``` + ### Write #### SQL diff --git a/docs/zh-CN/ecosystem/spark-doris-connector.md b/docs/zh-CN/ecosystem/spark-doris-connector.md index 5b8c7e80069a08..8294d43e87b276 100644 --- a/docs/zh-CN/ecosystem/spark-doris-connector.md +++ b/docs/zh-CN/ecosystem/spark-doris-connector.md @@ -98,7 +98,7 @@ sh build.sh 3.1.2 2.12 ## spark 3.1.2, scala 2.12 ``` > 注:如果你是从 tag 检出的源码,则可以直接执行 `sh build.sh --tag`,而无需指定 spark 和 scala 的版本。因为 tag 源码中的版本是固定的。 -编译成功后,会在 `output/` 目录下生成文件 `doris-spark-2.3.4-2.11-1.0.0-SNAPSHOT.jar`。将此文件复制到 `Spark` 的 `ClassPath` 中即可使用 `Spark-Doris-Connector`。例如,`Local` 模式运行的 `Spark`,将此文件放入 `jars/` 文件夹下。`Yarn`集群模式运行的`Spark`,则将此文件放入预部署包中。 +编译成功后,会在 `output/` 目录下生成文件 `doris-spark-2.3.4-2.11-1.0.0-SNAPSHOT.jar`。将此文件复制到 `Spark` 的 `ClassPath` 中即可使用 `Spark-Doris-Connector`。例如,`Local` 模式运行的 `Spark`,将此文件放入 `jars/` 文件夹下。`Yarn`集群模式运行的`Spark`,则将此文件放入预部署包中Link:[apache/incubator-doris#9486](https://github.com/apache/incubator-doris/discussions/9486)。 ## 使用Maven管理 @@ -162,6 +162,21 @@ val dorisSparkRDD = sc.dorisRDD( dorisSparkRDD.collect() ``` +#### pySpark + +```scala +dorisSparkDF = spark.read.format("doris") +.option("doris.table.identifier", "$YOUR_DORIS_DATABASE_NAME.$YOUR_DORIS_TABLE_NAME") +.option("doris.fenodes", "$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT") +.option("user", "$YOUR_DORIS_USERNAME") +.option("password", "$YOUR_DORIS_PASSWORD") +.load() +# show 5 lines data +dorisSparkDF.show(5) +``` + + + ### 写入 #### SQL From 53b960a9eba7f7c60d1e8416cf5f4c73600ef069 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E7=8E=8B=E7=A3=8A?= Date: Thu, 12 May 2022 20:05:24 +0800 Subject: [PATCH 2/2] REDME File Modity --- docs/en/ecosystem/spark-doris-connector.md | 18 +++++++++++++++++- docs/zh-CN/ecosystem/spark-doris-connector.md | 17 ++++++++++++++++- 2 files changed, 33 insertions(+), 2 deletions(-) diff --git a/docs/en/ecosystem/spark-doris-connector.md b/docs/en/ecosystem/spark-doris-connector.md index 79f73f9c9fc2ef..66b90065f0d760 100644 --- a/docs/en/ecosystem/spark-doris-connector.md +++ b/docs/en/ecosystem/spark-doris-connector.md @@ -96,7 +96,23 @@ sh build.sh 3.1.2 2.12 ## spark 3.1.2 version, and scala 2.12 ``` > Note: If you check out the source code from tag, you can just run sh build.sh --tag without specifying the spark and scala versions. This is because the version in the tag source code is fixed. -After successful compilation, the file `doris-spark-2.3.4-2.11-1.0.0-SNAPSHOT.jar` will be generated in the `output/` directory. Copy this file to `ClassPath` in `Spark` to use `Spark-Doris-Connector`. For example, `Spark` running in `Local` mode, put this file in the `jars/` folder. `Spark` running in `Yarn` cluster mode, put this file in the pre-deployment package Link:[apache/incubator-doris#9486](https://github.com/apache/incubator-doris/discussions/9486). +After successful compilation, the file `doris-spark-2.3.4-2.11-1.0.0-SNAPSHOT.jar` will be generated in the `output/` directory. Copy this file to `ClassPath` in `Spark` to use `Spark-Doris-Connector`. For example, `Spark` running in `Local` mode, put this file in the `jars/` folder. `Spark` running in `Yarn` cluster mode, put this file in the pre-deployment package ,for example upload `doris-spark-2.3.4-2.11-1.0.0-SNAPSHOT.jar` to hdfs and add hdfs file path in spark.yarn.jars. + +1. Upload doris-spark-connector-3.1.2-2.12-1.0.0.jar Jar to hdfs. + +``` +hdfs dfs -mkdir /spark-jars/ +hdfs dfs -put /your_local_path/doris-spark-connector-3.1.2-2.12-1.0.0.jar /spark-jars/ + +``` + +2. Add doris-spark-connector-3.1.2-2.12-1.0.0.jar depence in Cluster. + +``` +spark.yarn.jars=hdfs:///spark-jars/doris-spark-connector-3.1.2-2.12-1.0.0.jar +``` + + ## Using Maven diff --git a/docs/zh-CN/ecosystem/spark-doris-connector.md b/docs/zh-CN/ecosystem/spark-doris-connector.md index 8294d43e87b276..4eea79432c5720 100644 --- a/docs/zh-CN/ecosystem/spark-doris-connector.md +++ b/docs/zh-CN/ecosystem/spark-doris-connector.md @@ -98,7 +98,22 @@ sh build.sh 3.1.2 2.12 ## spark 3.1.2, scala 2.12 ``` > 注:如果你是从 tag 检出的源码,则可以直接执行 `sh build.sh --tag`,而无需指定 spark 和 scala 的版本。因为 tag 源码中的版本是固定的。 -编译成功后,会在 `output/` 目录下生成文件 `doris-spark-2.3.4-2.11-1.0.0-SNAPSHOT.jar`。将此文件复制到 `Spark` 的 `ClassPath` 中即可使用 `Spark-Doris-Connector`。例如,`Local` 模式运行的 `Spark`,将此文件放入 `jars/` 文件夹下。`Yarn`集群模式运行的`Spark`,则将此文件放入预部署包中Link:[apache/incubator-doris#9486](https://github.com/apache/incubator-doris/discussions/9486)。 +编译成功后,会在 `output/` 目录下生成文件 `doris-spark-2.3.4-2.11-1.0.0-SNAPSHOT.jar`。将此文件复制到 `Spark` 的 `ClassPath` 中即可使用 `Spark-Doris-Connector`。例如,`Local` 模式运行的 `Spark`,将此文件放入 `jars/` 文件夹下。`Yarn`集群模式运行的`Spark`,则将此文件放入预部署包中。 + +例如将 `doris-spark-2.3.4-2.11-1.0.0-SNAPSHOT.jar` 上传到 hdfs并在spark.yarn.jars参数上添加 hdfs上的Jar包路径 + +1. 上传doris-spark-connector-3.1.2-2.12-1.0.0.jar 到hdfs。 + +``` +hdfs dfs -mkdir /spark-jars/ +hdfs dfs -put /your_local_path/doris-spark-connector-3.1.2-2.12-1.0.0.jar /spark-jars/ +``` + +2. 在集群中添加doris-spark-connector-3.1.2-2.12-1.0.0.jar 依赖。 + +``` +spark.yarn.jars=hdfs:///spark-jars/doris-spark-connector-3.1.2-2.12-1.0.0.jar +``` ## 使用Maven管理