Skip to content

smram/csv2parq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Simple standalone Scala Spark application to convert CSV to Parquet files

Installation

Install Spark on Mac: https://medium.freecodecamp.org/installing-scala-and-apache-spark-on-mac-os-837ae57d283f

plus

brew install sbt

Run

# build 
sbt build;

# run local spark
spark-submit \
    --class Converter \
    --master local[4] \
    target/scala-2.11/csv2parquet_2.11-1.0.jar <in_csvfile> <out_parquetfile> [out_schemafile] [num_partitions] [sep]

References

Spark resource tuning

From [cldr]

  • master[workers] - the number of worker threads started (see [submit]) *spark.executor.memory and spark.executor.cores - memory and cores available to each executor
  • (automated >spark1.3) spark.executor.instances - number of executors for an application. executors run on workers

Spark refs

Scala beginner refs

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages