Skip to content

Conversation

@marsishandsome
Copy link
Contributor

@marsishandsome marsishandsome commented Oct 9, 2021

Signed-off-by: marsishandsome marsishandsome@gmail.com

close #37

Add SSTDataSource

subtask of tikv/tikv#11007

In order to support decoding SST files parallelly, a Spark DataSource library SSTDataSource will be implemented in TiKV migration Repo as follows:

class SSTDataSource extends FileDataSourceV2

Users can simplly use the folowing code to decode SST files by Spark:

val path = "hdfs:///path/to/sst/files/"
val keyValueDF = spark.read.format("sst").load(path)

depend on tikv/client-java#284

@marsishandsome marsishandsome force-pushed the feature/add-sst-decoder branch 22 times, most recently from fab4fd7 to 1906517 Compare October 11, 2021 07:44
@marsishandsome marsishandsome changed the title support sst datasource support Spark SST Datasource Oct 11, 2021
@marsishandsome marsishandsome force-pushed the feature/add-sst-decoder branch 6 times, most recently from d6e6803 to 2539c46 Compare October 18, 2021 07:54
@marsishandsome marsishandsome force-pushed the feature/add-sst-decoder branch 8 times, most recently from bf6493a to 591f5e2 Compare October 18, 2021 08:24
Signed-off-by: marsishandsome <marsishandsome@gmail.com>
@marsishandsome marsishandsome force-pushed the feature/add-sst-decoder branch from 591f5e2 to d5c0ecc Compare October 18, 2021 08:28
@marsishandsome
Copy link
Contributor Author

Copy link

@Little-Wallace Little-Wallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. But I'm not very similar to spark code.

@marsishandsome marsishandsome merged commit 9354431 into tikv:main Oct 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add SSTDataSource

2 participants