Skip to content

bukarev/logs_process

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Batch Spark program that processes logs generated by an ecommerce application as JSON:

  1. Read raw files.
  2. Cleanup formats and values
  3. Post to Parquet and add to a Hive table.

====================== To start the spark job run the script nx_live.sh with parameters: ./nx_live.sh [dev|prod] [hive_table]

  1. dev or prod is the environment for which the script is executed
  2. hive_table is the hive table, into which the migrated files are added. Currently it is nxmetrics_prod.clickstream

Keep in minf that the script refers to the location of a JAR file containing the application code (currently points to /home/mapr/)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors