Batch Spark program that processes logs generated by an ecommerce application as JSON:
- Read raw files.
- Cleanup formats and values
- Post to Parquet and add to a Hive table.
====================== To start the spark job run the script nx_live.sh with parameters: ./nx_live.sh [dev|prod] [hive_table]
- dev or prod is the environment for which the script is executed
- hive_table is the hive table, into which the migrated files are added. Currently it is nxmetrics_prod.clickstream
Keep in minf that the script refers to the location of a JAR file containing the application code (currently points to /home/mapr/)