GitHub

Simple standalone Scala Spark application to convert CSV to Parquet files

Installation

Install Spark on Mac: https://medium.freecodecamp.org/installing-scala-and-apache-spark-on-mac-os-837ae57d283f

plus

brew install sbt

Run

# build 
sbt build;

# run local spark
spark-submit \
    --class Converter \
    --master local[4] \
    target/scala-2.11/csv2parquet_2.11-1.0.jar <in_csvfile> <out_parquetfile> [out_schemafile] [num_partitions] [sep]

References

Spark resource tuning

From [cldr]

master[workers] - the number of worker threads started (see [submit]) *spark.executor.memory and spark.executor.cores - memory and cores available to each executor
(automated >spark1.3) spark.executor.instances - number of executors for an application. executors run on workers

Spark refs

Scala beginner refs

Option[String]: http://danielwestheide.com/blog/2012/12/19/the-neophytes-guide-to-scala-part-5-the-option-type.html

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src/main/scala		src/main/scala
.gitignore		.gitignore
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

Run

References

Spark resource tuning

Spark refs

Scala beginner refs

About

Uh oh!

Releases

Packages

Languages

smram/csv2parq

Folders and files

Latest commit

History

Repository files navigation

Installation

Run

References

Spark resource tuning

Spark refs

Scala beginner refs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages