diff --git a/docs/en/ecosystem/spark-doris-connector.md b/docs/en/ecosystem/spark-doris-connector.md index 8a565f85093827..66b90065f0d760 100644 --- a/docs/en/ecosystem/spark-doris-connector.md +++ b/docs/en/ecosystem/spark-doris-connector.md @@ -96,7 +96,23 @@ sh build.sh 3.1.2 2.12 ## spark 3.1.2 version, and scala 2.12 ``` > Note: If you check out the source code from tag, you can just run sh build.sh --tag without specifying the spark and scala versions. This is because the version in the tag source code is fixed. -After successful compilation, the file `doris-spark-2.3.4-2.11-1.0.0-SNAPSHOT.jar` will be generated in the `output/` directory. Copy this file to `ClassPath` in `Spark` to use `Spark-Doris-Connector`. For example, `Spark` running in `Local` mode, put this file in the `jars/` folder. `Spark` running in `Yarn` cluster mode, put this file in the pre-deployment package. +After successful compilation, the file `doris-spark-2.3.4-2.11-1.0.0-SNAPSHOT.jar` will be generated in the `output/` directory. Copy this file to `ClassPath` in `Spark` to use `Spark-Doris-Connector`. For example, `Spark` running in `Local` mode, put this file in the `jars/` folder. `Spark` running in `Yarn` cluster mode, put this file in the pre-deployment package ,for example upload `doris-spark-2.3.4-2.11-1.0.0-SNAPSHOT.jar` to hdfs and add hdfs file path in spark.yarn.jars. + +1. Upload doris-spark-connector-3.1.2-2.12-1.0.0.jar Jar to hdfs. + +``` +hdfs dfs -mkdir /spark-jars/ +hdfs dfs -put /your_local_path/doris-spark-connector-3.1.2-2.12-1.0.0.jar /spark-jars/ + +``` + +2. Add doris-spark-connector-3.1.2-2.12-1.0.0.jar depence in Cluster. + +``` +spark.yarn.jars=hdfs:///spark-jars/doris-spark-connector-3.1.2-2.12-1.0.0.jar +``` + + ## Using Maven @@ -159,6 +175,19 @@ val dorisSparkRDD = sc.dorisRDD( dorisSparkRDD.collect() ``` +#### pySpark + +```scala +dorisSparkDF = spark.read.format("doris") +.option("doris.table.identifier", "$YOUR_DORIS_DATABASE_NAME.$YOUR_DORIS_TABLE_NAME") +.option("doris.fenodes", "$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT") +.option("user", "$YOUR_DORIS_USERNAME") +.option("password", "$YOUR_DORIS_PASSWORD") +.load() +# show 5 lines data +dorisSparkDF.show(5) +``` + ### Write #### SQL diff --git a/docs/zh-CN/ecosystem/spark-doris-connector.md b/docs/zh-CN/ecosystem/spark-doris-connector.md index 5b8c7e80069a08..4eea79432c5720 100644 --- a/docs/zh-CN/ecosystem/spark-doris-connector.md +++ b/docs/zh-CN/ecosystem/spark-doris-connector.md @@ -100,6 +100,21 @@ sh build.sh 3.1.2 2.12 ## spark 3.1.2, scala 2.12 编译成功后,会在 `output/` 目录下生成文件 `doris-spark-2.3.4-2.11-1.0.0-SNAPSHOT.jar`。将此文件复制到 `Spark` 的 `ClassPath` 中即可使用 `Spark-Doris-Connector`。例如,`Local` 模式运行的 `Spark`,将此文件放入 `jars/` 文件夹下。`Yarn`集群模式运行的`Spark`,则将此文件放入预部署包中。 +例如将 `doris-spark-2.3.4-2.11-1.0.0-SNAPSHOT.jar` 上传到 hdfs并在spark.yarn.jars参数上添加 hdfs上的Jar包路径 + +1. 上传doris-spark-connector-3.1.2-2.12-1.0.0.jar 到hdfs。 + +``` +hdfs dfs -mkdir /spark-jars/ +hdfs dfs -put /your_local_path/doris-spark-connector-3.1.2-2.12-1.0.0.jar /spark-jars/ +``` + +2. 在集群中添加doris-spark-connector-3.1.2-2.12-1.0.0.jar 依赖。 + +``` +spark.yarn.jars=hdfs:///spark-jars/doris-spark-connector-3.1.2-2.12-1.0.0.jar +``` + ## 使用Maven管理 ``` @@ -162,6 +177,21 @@ val dorisSparkRDD = sc.dorisRDD( dorisSparkRDD.collect() ``` +#### pySpark + +```scala +dorisSparkDF = spark.read.format("doris") +.option("doris.table.identifier", "$YOUR_DORIS_DATABASE_NAME.$YOUR_DORIS_TABLE_NAME") +.option("doris.fenodes", "$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT") +.option("user", "$YOUR_DORIS_USERNAME") +.option("password", "$YOUR_DORIS_PASSWORD") +.load() +# show 5 lines data +dorisSparkDF.show(5) +``` + + + ### 写入 #### SQL