MR job that compresses and uncompresses binary files into gzip or bzip2.
Usage: FileCompressorJob < inputPath > < outputPath > < # mappers > < compressionCodec > < GZIPBufferSize (default 8192; optional) >
Example: hadoop jar ArcmainMapper-0.0.1-SNAPSHOT.jar com.cloudera.sa.ArcmainMapper.FileCompressorJob input_files output_files 10 gzip
Usage: FileUnCompressorJob < inputPath > < outputPath > < # mappers >
Example: hadoop jar ArcmainMapper-0.0.1-SNAPSHOT.jar com.cloudera.sa.ArcmainMapper.FileUnCompressorJob input_files output_files 10
Description
- Read files from a directory and compress using specified codec (gzip | bzip2).
- Map-Only job that reads / writes directly to HDFS.
- Utilizes custom input format to read files in the directory and distribute them to the mappers.
- Supports decompressing gzip or bzip2.
- Files must end in .gz or .bz2 for Decompress to function properly.