GitHub - bukarev/streamsetsdelta

This implementation is Microsoft Azure based. Register an application in Azure. You should have tenant ID, client ID and client secret. Add that application to Storage Blob Contributors role of the storage account you are going to use.

Upload the following Java libraries to external resources of a Data Collector (or use those provided in the /lib folder in this repo, if you don't feel like collecting the libraries in Maven repository):

Jar file
arquet-hadoop-1.12.3.jar
avro-1.11.1.jar
commons-collections-3.2.2.jar
commons-compress-1.21.jar
commons-configuration2-2.1.1.jar
commons-io-2.11.0.jar
commons-lang3-3.12.0.jar
commons-logging-1.2.jar
delta-standalone_2.13-0.5.0.jar
delta-storage-2.0.0.jar
hadoop-auth-3.3.4.jar
hadoop-azure-3.3.4.jar
hadoop-azure-datalake-3.3.4.jar
hadoop-common-3.3.4.jar
hadoop-mapreduce-client-core-3.3.4.jar
hadoop-shaded-guava-1.1.1.jar
httpclient-4.5.13.jar
jackson-annotations-2.13.3.jar
jackson-core-2.12.7.jar
jackson-core-asl-1.9.13.jar
jackson-databind-2.12.7.jar
jackson-mapper-asl-1.9.13.jar
parquet-avro-1.12.3.jar
parquet-column-1.12.3.jar
parquet-common-1.12.3.jar
parquet-encoding-1.12.3.jar
parquet-format-structures-1.12.3.jar
parquet-jackson-1.12.3.jar
scala-library-2.13.8.jar
scala-parallel-collections_2.13-1.0.4.jar
shapeless_2.13-2.3.4.jar
slf4j-api-1.7.36.jar
snappy-java-1.1.8.4.jar
stax2-api-4.2.1.jar
woodstox-core-5.3.0.jar

In your new pipeline, add Schema Generator stage followed by Groovy Evaluator stage.

The Schema Generator should have configuration like in the screenshot below:

Add the code from he provided file delta_write.groovy into Groovy Evaluator. In the Groovy code, find the class Сonstants and fill in values for your registered Azure app, storage account and Delta table name. Values in this class don't have to be hardcoded, you can:

use pipeline parameters functionality for storage account, container, or Delta table;
use ${runtime:conf} functionality of SDC to pick authorisation values from secured config files.

Your pipeline should look like this: (Origin) -> (Anything Else) -> (Schema Generator) -> (Groovy Evaluator) -> (Trash)

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
lib		lib
Blog on Delta Standalone.pdf		Blog on Delta Standalone.pdf
README.md		README.md
delta_read.groovy		delta_read.groovy
delta_write.groovy		delta_write.groovy
schema_gen.png		schema_gen.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages