Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 30 additions & 1 deletion docs/en/ecosystem/spark-doris-connector.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,23 @@ sh build.sh 3.1.2 2.12 ## spark 3.1.2 version, and scala 2.12
```
> Note: If you check out the source code from tag, you can just run sh build.sh --tag without specifying the spark and scala versions. This is because the version in the tag source code is fixed.

After successful compilation, the file `doris-spark-2.3.4-2.11-1.0.0-SNAPSHOT.jar` will be generated in the `output/` directory. Copy this file to `ClassPath` in `Spark` to use `Spark-Doris-Connector`. For example, `Spark` running in `Local` mode, put this file in the `jars/` folder. `Spark` running in `Yarn` cluster mode, put this file in the pre-deployment package.
After successful compilation, the file `doris-spark-2.3.4-2.11-1.0.0-SNAPSHOT.jar` will be generated in the `output/` directory. Copy this file to `ClassPath` in `Spark` to use `Spark-Doris-Connector`. For example, `Spark` running in `Local` mode, put this file in the `jars/` folder. `Spark` running in `Yarn` cluster mode, put this file in the pre-deployment package ,for example upload `doris-spark-2.3.4-2.11-1.0.0-SNAPSHOT.jar` to hdfs and add hdfs file path in spark.yarn.jars.

1. Upload doris-spark-connector-3.1.2-2.12-1.0.0.jar Jar to hdfs.

```
hdfs dfs -mkdir /spark-jars/
hdfs dfs -put /your_local_path/doris-spark-connector-3.1.2-2.12-1.0.0.jar /spark-jars/

```

2. Add doris-spark-connector-3.1.2-2.12-1.0.0.jar depence in Cluster.

```
spark.yarn.jars=hdfs:///spark-jars/doris-spark-connector-3.1.2-2.12-1.0.0.jar
```



## Using Maven

Expand Down Expand Up @@ -159,6 +175,19 @@ val dorisSparkRDD = sc.dorisRDD(

dorisSparkRDD.collect()
```
#### pySpark

```scala
dorisSparkDF = spark.read.format("doris")
.option("doris.table.identifier", "$YOUR_DORIS_DATABASE_NAME.$YOUR_DORIS_TABLE_NAME")
.option("doris.fenodes", "$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT")
.option("user", "$YOUR_DORIS_USERNAME")
.option("password", "$YOUR_DORIS_PASSWORD")
.load()
# show 5 lines data
dorisSparkDF.show(5)
```

### Write

#### SQL
Expand Down
30 changes: 30 additions & 0 deletions docs/zh-CN/ecosystem/spark-doris-connector.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,21 @@ sh build.sh 3.1.2 2.12 ## spark 3.1.2, scala 2.12

编译成功后,会在 `output/` 目录下生成文件 `doris-spark-2.3.4-2.11-1.0.0-SNAPSHOT.jar`。将此文件复制到 `Spark` 的 `ClassPath` 中即可使用 `Spark-Doris-Connector`。例如,`Local` 模式运行的 `Spark`,将此文件放入 `jars/` 文件夹下。`Yarn`集群模式运行的`Spark`,则将此文件放入预部署包中。

例如将 `doris-spark-2.3.4-2.11-1.0.0-SNAPSHOT.jar` 上传到 hdfs并在spark.yarn.jars参数上添加 hdfs上的Jar包路径

1. 上传doris-spark-connector-3.1.2-2.12-1.0.0.jar 到hdfs。

```
hdfs dfs -mkdir /spark-jars/
hdfs dfs -put /your_local_path/doris-spark-connector-3.1.2-2.12-1.0.0.jar /spark-jars/
```

2. 在集群中添加doris-spark-connector-3.1.2-2.12-1.0.0.jar 依赖。

```
spark.yarn.jars=hdfs:///spark-jars/doris-spark-connector-3.1.2-2.12-1.0.0.jar
```

## 使用Maven管理

```
Expand Down Expand Up @@ -162,6 +177,21 @@ val dorisSparkRDD = sc.dorisRDD(
dorisSparkRDD.collect()
```

#### pySpark

```scala
dorisSparkDF = spark.read.format("doris")
.option("doris.table.identifier", "$YOUR_DORIS_DATABASE_NAME.$YOUR_DORIS_TABLE_NAME")
.option("doris.fenodes", "$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT")
.option("user", "$YOUR_DORIS_USERNAME")
.option("password", "$YOUR_DORIS_PASSWORD")
.load()
# show 5 lines data
dorisSparkDF.show(5)
```



### 写入

#### SQL
Expand Down