Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
137 commits
Select commit Hold shift + click to select a range
ad44447
Initial commit
Jul 11, 2014
845a817
Dumbest proof of concept possible
Jul 11, 2014
cb7c866
First bit of work to get this running against the new Dataflow API
Nov 4, 2014
dce03e4
Update version of dataflow to get new API method access
Dec 2, 2014
08e94b2
Add support for getters and a Flatten impl
Dec 2, 2014
9fdac6c
Such code. Much features.
Dec 2, 2014
deca2c0
Adding some more operators: toiterable, seqdo
Dec 2, 2014
6ee38b2
Support for ParDo.BoundMulti
Dec 3, 2014
64c6d8d
Fix bug in deserializing side inputs
Dec 3, 2014
40adbec
Add SparkRuntimeContext for handling shared runtime objects
Dec 3, 2014
565509d
First cut at aggregators
Dec 3, 2014
bb219d4
First minimally working aggregators
Dec 3, 2014
9152769
Updates for 141206 SDK release
Dec 9, 2014
3bd04ae
Dummy impls of windowing-related ProcContext functions
Dec 9, 2014
ba74f19
Simplify pom.xml
Dec 9, 2014
6aa08e0
Add proper coder handling to RDD retrieval
Dec 9, 2014
45be508
Refactor aggregation related classes.
Dec 10, 2014
67cf364
Add README.md and update project version in pom.xml.
Dec 11, 2014
137d54a
Adds Javadoc and Tests to project.
Dec 10, 2014
b954589
Add apache2 license and cloudera copyright.
Dec 11, 2014
1523ffd
Adds custom checkstyle.
Dec 12, 2014
2992838
Factor out tranform translation logic in to its own class.
Dec 12, 2014
2e3fe1a
Specify and rationalize generic types in State, CoderHelpers to start
srowen Dec 13, 2014
7489263
Add simple word count test.
Dec 16, 2014
f9e8fab
Factor out spark pipeline options.
Dec 17, 2014
ec172ba
Miscellaneous inspection changes from IntelliJ
srowen Dec 19, 2014
ed1e2f7
Issue #13 : attempt to remove all generics warnings, or handle them e…
srowen Dec 20, 2014
225f6c0
Update and specify POM plugin config; Update Spark to 1.1.1, JUnit to…
srowen Dec 21, 2014
1f9cd04
Improve readme to explain current state of the repo, and to encourage…
Dec 24, 2014
ba4b326
Update version of dataflow we depend on.
Dec 23, 2014
d6523b7
Fix side input loading and re-enable approxuniq test
jwills Jan 6, 2015
e798262
Add a close() method to EvaluationResult/EvaluationContext which stop…
dennishuo Jan 16, 2015
4cd1a1c
Add tests for CombinePerKey transform and update other tests to use E…
Jan 17, 2015
8a4daa2
Rename artifact to spark-dataflow; add release plugin config; add RAT…
srowen Jan 26, 2015
54a547a
Release preparation: needs to begin with a -SNAPSHOT version; Javadoc…
srowen Jan 26, 2015
15a622f
[maven-release-plugin] prepare release spark-dataflow-0.0.1
srowen Jan 26, 2015
6ea0b52
[maven-release-plugin] prepare for next development iteration
srowen Jan 26, 2015
3cc19b5
Add Cloudera repo info
srowen Jan 26, 2015
58228c2
Fix formatting of cloudera repo
srowen Jan 26, 2015
eee09b2
Update README w/some notes on motivation. Fixes #20.
Jan 27, 2015
68b72e7
Added new test for TransformTranslator and tested TextIO.Read and Tex…
Feb 5, 2015
b77179a
Update to latest version of Dataflow APIs and mimic
Apr 10, 2015
d5aefef
Add a test for Avro.
tomwhite Apr 28, 2015
60a4c25
Add test that uses DeDupExample from the SDK.
tomwhite May 11, 2015
ba882f2
Add test that uses TfIdf from the SDK and switch to use KryoSerializer.
tomwhite May 11, 2015
e1e9ba7
Make it possible to run the SDK word count example with the SparkPipe…
tomwhite May 12, 2015
e5deb38
Change to use SLF4J rather than java.util.logging.
tomwhite May 13, 2015
ab1503b
Exclude jul-to-slf4j to avoid loops.
tomwhite May 13, 2015
4fa925c
Exclude old Jetty version of servlet API.
tomwhite May 13, 2015
24bad5e
Update to dataflow 0.4.150414.
tomwhite May 13, 2015
ddaa532
Use DataflowAssert in tests.
tomwhite May 13, 2015
aed5e27
Add Travis, Jacoco / Codecov integration. Fix Spark errors due to reu…
srowen May 14, 2015
3962b82
Enable checkstyle, remove some rules, comply with checkstyle rules, f…
srowen May 14, 2015
06e611a
Update to Spark 1.3. Closes issue #38
srowen May 14, 2015
fe5a2bc
Add CONTRIBUTING.md to clarify contribution license; add build flair …
srowen May 14, 2015
2afc03b
Enable RAT during verify phase and add some missing copyright headers
srowen May 14, 2015
64ab065
Remove unnecessary build properties for mvn exec.
tomwhite May 15, 2015
cdb9665
Document how to run on a cluster with spark-submit.
tomwhite May 15, 2015
b8190ea
Support withNumShards() and withoutSharding() for TextIO output
tomwhite May 12, 2015
d82355c
Implement dataflow sharding for text output. With this change the dir…
tomwhite May 15, 2015
1df47a1
Fix README to reflect changes from b6e4787. Also fixes a corner case
tomwhite May 19, 2015
ea2c4ad
Shard number replacement should only be for 'S' pattern in the templa…
tomwhite May 19, 2015
5fb3712
Use internal coder for side inputs, rather than trying to infer our own
tomwhite May 19, 2015
94a8f29
Address review feedback.
tomwhite May 20, 2015
916e039
Use Coders to convert from object-based RDDs to byte-array RDDS for all
tomwhite May 21, 2015
ec2f9e7
Switch to use Kryo serializer.
tomwhite May 21, 2015
a3259c1
Add a comment explaining use of Coders for serialization.
tomwhite May 22, 2015
7d8f818
Add HadoopIO for reading from Hadoop InputFormat classes.
tomwhite May 28, 2015
6d37b90
Address review feedback.
tomwhite May 28, 2015
70b41e3
Remove user-dependent configuration from maven-gpg-plugin.
tomwhite Jun 5, 2015
c04a5a2
[maven-release-plugin] prepare release spark-dataflow-0.1.0
tomwhite Jun 5, 2015
0a599c4
[maven-release-plugin] prepare for next development iteration
tomwhite Jun 5, 2015
5f8fc59
Update README with latest release. Add brief release instructions.
tomwhite Jun 5, 2015
a35ea6a
Compute PCollections that are created only for the side effects in th…
tomwhite Jun 17, 2015
2bb6c11
Wrap failures in pipeline.run() in a RuntimeException.
tomwhite Jun 17, 2015
45d3e61
Implement DoFn.Context.getPipelineOptions().
tomwhite Jun 17, 2015
48347ca
[maven-release-plugin] prepare release spark-dataflow-0.1.1
tomwhite Jun 18, 2015
ced9b30
[maven-release-plugin] prepare for next development iteration
tomwhite Jun 18, 2015
f85beb3
Update README with 0.1.1 minor release.
tomwhite Jun 18, 2015
b0312a1
Allow tests to share a single Spark context.
tomwhite Jun 19, 2015
148979f
Update to dataflow 0.4.150602.
tomwhite Jun 11, 2015
62c0337
[maven-release-plugin] prepare release spark-dataflow-0.2.0
tomwhite Jun 20, 2015
b6c71c2
[maven-release-plugin] prepare for next development iteration
tomwhite Jun 20, 2015
32be82e
Update README with 0.2.0 release.
tomwhite Jun 20, 2015
9e6daf2
Unwrap cause from SparkException if set.
tomwhite Jun 25, 2015
d08675c
Specialize CombineGlobally
tomwhite Jun 26, 2015
80be89e
Fix bug in combinePerKey where accumulator in return value is ignored.
tomwhite Jun 26, 2015
b404b9c
[maven-release-plugin] prepare release spark-dataflow-0.2.1
tomwhite Jun 26, 2015
290193e
[maven-release-plugin] prepare for next development iteration
tomwhite Jun 26, 2015
bacaf8c
Factor out common code from DoFnFunction and MultiDoFnFunction.
tomwhite Jun 29, 2015
13edbec
Implement Aggregator#getCombineFn.
tomwhite Jun 29, 2015
89e2bb5
Implement getAggregatorValues.
tomwhite Jun 29, 2015
78d6614
More cleanup. View.AsSingleton is already exercised by the TfIdf test.
tomwhite Jun 30, 2015
5069eed
Set the RDD's name from the PValue's name, to help diagnosis.
tomwhite Jun 30, 2015
2820534
Resolve some generics warnings with some fancier footwork
srowen Jul 3, 2015
b2f495e
Remove some HadoopIO.Read.Bound factory methods and fluent setters; a…
srowen Jul 6, 2015
b47a8d0
Remove some HadoopIO.Read.Bound factory methods and fluent setters; a…
srowen Jul 6, 2015
c51bc32
Fix checkstyle error
srowen Jul 7, 2015
7cff304
Delay converting PCollection values to bytes in case they are only us…
tomwhite Jul 8, 2015
2d00b3b
Add a system property, dataflow.spark.directBroadcast, to allow pipel…
tomwhite Jul 9, 2015
79b08ad
Make access of boolean system property clearer. (From Sean Owen.)
tomwhite Jul 9, 2015
3cae69b
Only accumulate outputs from one call to processContext, rather than
tomwhite Jul 10, 2015
c01421c
Update to dataflow 0.4.150710.
tomwhite Jul 13, 2015
4415862
Prevent possible NPE.
tomwhite Jul 14, 2015
ebf7053
[maven-release-plugin] prepare release spark-dataflow-0.2.2
tomwhite Jul 14, 2015
72167a2
[maven-release-plugin] prepare for next development iteration
tomwhite Jul 14, 2015
27349ad
Update README to latest version (0.2.2).
tomwhite Jul 14, 2015
7681558
Fix bug where values written to the output in DoFn#startBundle and Do…
tomwhite Jul 14, 2015
fe0b8e9
[maven-release-plugin] prepare release spark-dataflow-0.2.3
tomwhite Jul 16, 2015
4ec8c60
[maven-release-plugin] prepare for next development iteration
tomwhite Jul 16, 2015
3b1441f
Avoid warning email by not running codecov unless it was configured; …
srowen Jul 21, 2015
d7a35bd
[maven-release-plugin] prepare release spark-dataflow-0.3.0
tomwhite Jul 28, 2015
383eeeb
[maven-release-plugin] prepare for next development iteration
tomwhite Jul 28, 2015
b83d666
Update README to latest version (0.3.0).
tomwhite Jul 28, 2015
89945bf
Update to dataflow 0.4.150727.
tomwhite Aug 5, 2015
5ec8d59
[maven-release-plugin] prepare release spark-dataflow-0.4.0
tomwhite Aug 6, 2015
1fdf602
[maven-release-plugin] prepare for next development iteration
tomwhite Aug 6, 2015
922508c
Update README to latest version (0.4.0).
tomwhite Aug 6, 2015
27fd290
Dataflow goes GA! Update to version 1.0.0.
tomwhite Aug 13, 2015
3e767f5
[maven-release-plugin] prepare release spark-dataflow-0.4.1
tomwhite Aug 13, 2015
4536853
[maven-release-plugin] prepare for next development iteration
tomwhite Aug 13, 2015
4b98c16
Update README to latest version (0.4.1).
tomwhite Aug 13, 2015
0c84c9d
Correct input parameter is --inputFile
amitsela Aug 17, 2015
7838865
Add support for writes with HadoopIO. This allows Hadoop
tomwhite Jun 5, 2015
8762b26
Add NullWritableCoder and test.
tomwhite Aug 20, 2015
b8949b8
[maven-release-plugin] prepare release spark-dataflow-0.4.2
tomwhite Aug 20, 2015
ecc33d8
[maven-release-plugin] prepare for next development iteration
tomwhite Aug 20, 2015
90c49b4
Update README to latest version (0.4.2).
tomwhite Aug 20, 2015
8779701
Add tests for Spark 1.4 / 1.5 in Travis
srowen Oct 8, 2015
1c603d1
Fix a few Coverity inspection results plus more IntelliJ results
srowen Oct 8, 2015
22331d1
Propagate user exceptions thrown in DoFns.
tomwhite Aug 13, 2015
f930380
The example needs --inputFile, not --input, to designate the input file
Nov 23, 2015
7a2e9a7
Add spark-streaming support to spark-dataflow
amitsela Oct 22, 2015
3478730
Add support for Flattenning (union) PCollections and test
Jan 16, 2016
a9168bf
Upgrade to latest SDK version 1.3.0
tomwhite Jan 21, 2016
89a21ca
Try to clean up some build warnings, related to generics, and try to …
srowen Jan 21, 2016
1229b00
First wave of changes from feedback
srowen Jan 22, 2016
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions runners/spark/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
.classpath
.project
.settings
.cache
target
*.iml
.idea
gen
.DS_Store
dependency-reduced-pom.xml
22 changes: 22 additions & 0 deletions runners/spark/.travis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
language: java
sudo: false
install: mvn ${JAVA} ${SPARK} -DskipTests=true -Dmaven.javadoc.skip=true -B -V install
script: mvn ${JAVA} ${SPARK} ${JACOCO} -Dmaven.javadoc.skip=true -B verify
matrix:
include:
# Covers Java 7, Open JDK, Spark 1.3.x, and code coverage
- jdk: openjdk7
env: JACOCO=-Pjacoco
# Covers Spark 1.4.x
- jdk: openjdk7
env: SPARK=-Dspark.version=1.4.1
# Covers Spark 1.5.x
- jdk: openjdk7
env: SPARK=-Dspark.version=1.5.1
# Covers Java 8, Oracle JDK
- jdk: oraclejdk8
env: JAVA=-Djava.version=1.8
cache:
directories:
- $HOME/.m2
after_success: if [ -n "$JACOCO" ]; then bash <(curl -s https://codecov.io/bash); fi
8 changes: 8 additions & 0 deletions runners/spark/CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
## Licensing

Contributions via GitHub pull requests are gladly accepted from their original author.
Along with any pull requests, please state that the contribution is your original work and
that you license the work to the project under the project's open source license.
Whether or not you state this explicitly, by submitting any copyrighted material via
pull request, email, or other means you agree to license the material under the project's
open source license and warrant that you have the legal authority to do so.
161 changes: 161 additions & 0 deletions runners/spark/LICENSE

Large diffs are not rendered by default.

113 changes: 113 additions & 0 deletions runners/spark/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
spark-dataflow
==============

## Intro

Spark-dataflow allows users to execute data pipelines written against the Google Cloud Dataflow API
with Apache Spark. Spark-dataflow is an early prototype, and we'll be working on it continuously.
If this project interests you, we welcome issues, comments, and (especially!) pull requests.
To get an idea of what we have already identified as
areas that need improvement, checkout the issues listed in the github repo.

## Motivation

We had two primary goals when we started working on Spark-dataflow:

1. *Provide portability for data pipelines written for Google Cloud Dataflow.* Google makes
it really easy to get started writing pipelines against the Dataflow API, but they wanted
to be sure that creating a pipeline using their tools would not lock developers in to their
platform. A Spark-based implementation of Dataflow means that you can take your pipeline
logic with you wherever you go. This also means that any new machine learning and anomaly
detection algorithms that are developed against the Dataflow API are available to everyone,
regardless of their underlying execution platform.

2. *Experiment with new data pipeline design patterns.* The Dataflow API has a number of
interesting ideas, especially with respect to the unification of batch and stream data
processing into a single API that maps into two separate engines. The Dataflow streaming
engine, based on Google's [Millwheel](http://research.google.com/pubs/pub41378.html), does
not have a direct open source analogue, and we wanted to understand how to replicate its
functionality using frameworks like Spark Streaming.

## Getting Started

The Maven coordinates of the current version of this project are:

<groupId>com.cloudera.dataflow.spark</groupId>
<artifactId>spark-dataflow</artifactId>
<version>0.4.2</version>

and are hosted in Cloudera's repository at:

<repository>
<id>cloudera.repo</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
</repository>

If we wanted to run a dataflow pipeline with the default options of a single threaded spark
instance in local mode, we would do the following:

Pipeline p = <logic for pipeline creation >
EvaluationResult result = SparkPipelineRunner.create().run(p);

To create a pipeline runner to run against a different spark cluster, with a custom master url we
would do the following:

Pipeline p = <logic for pipeline creation >
SparkPipelineOptions options = SparkPipelineOptionsFactory.create();
options.setSparkMaster("spark://host:port");
EvaluationResult result = SparkPipelineRunner.create(options).run(p);

## Word Count Example

First download a text document to use as input:

curl http://www.gutenberg.org/cache/epub/1128/pg1128.txt > /tmp/kinglear.txt

Then run the [word count example][wc] from the SDK using a single threaded Spark instance
in local mode:

mvn exec:exec -DmainClass=com.google.cloud.dataflow.examples.WordCount \
-Dinput=/tmp/kinglear.txt -Doutput=/tmp/out -Drunner=SparkPipelineRunner \
-DsparkMaster=local

Check the output by running:

head /tmp/out-00000-of-00001

__Note: running examples using `mvn exec:exec` only works for Spark local mode at the
moment. See the next section for how to run on a cluster.__

[wc]: https://github.com/GoogleCloudPlatform/DataflowJavaSDK/blob/master/examples/src/main/java/com/google/cloud/dataflow/examples/WordCount.java

## Running on a Cluster

Spark Dataflow pipelines can be run on a cluster using the `spark-submit` command.

First copy a text document to HDFS:

curl http://www.gutenberg.org/cache/epub/1128/pg1128.txt | hadoop fs -put - kinglear.txt

Then run the word count example using Spark submit with the `yarn-client` master
(`yarn-cluster` works just as well):

spark-submit \
--class com.google.cloud.dataflow.examples.WordCount \
--master yarn-client \
target/spark-dataflow-*-spark-app.jar \
--inputFile=kinglear.txt --output=out --runner=SparkPipelineRunner --sparkMaster=yarn-client

Check the output by running:

hadoop fs -tail out-00000-of-00002

## How to Release

Committers can release the project using the standard [Maven Release Plugin](http://maven.apache.org/maven-release/maven-release-plugin/) commands:

mvn release:prepare
mvn release:perform -Darguments="-Dgpg.passphrase=XXX"

Note that you will need a [public GPG key](http://www.apache.org/dev/openpgp.html).

[![Build Status](https://travis-ci.org/cloudera/spark-dataflow.png?branch=master)](https://travis-ci.org/cloudera/spark-dataflow)
[![codecov.io](https://codecov.io/github/cloudera/spark-dataflow/coverage.svg?branch=master)](https://codecov.io/github/cloudera/spark-dataflow?branch=master)
222 changes: 222 additions & 0 deletions runners/spark/build-resources/checkstyle.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,222 @@
<?xml version="1.0"?>
<!DOCTYPE module PUBLIC
"-//Puppy Crawl//DTD Check Configuration 1.2//EN"
"http://www.puppycrawl.com/dtds/configuration_1_2.dtd">
<!--
Copyright (c) 2014, Cloudera, Inc. All Rights Reserved.

Cloudera, Inc. licenses this file to you under the Apache License,
Version 2.0 (the "License"). You may not use this file except in
compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

This software is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
CONDITIONS OF ANY KIND, either express or implied. See the License for
the specific language governing permissions and limitations under the
License.
-->
<!--

Checkstyle configuration for spark-dataflow that is based on the
sun_checks.xml file that is bundled with Checkstyle and includes
checks for:

- the Java Language Specification at
http://java.sun.com/docs/books/jls/second_edition/html/index.html

- the Sun Code Conventions at http://java.sun.com/docs/codeconv/

- the Javadoc guidelines at
http://java.sun.com/j2se/javadoc/writingdoccomments/index.html

- the JDK Api documentation http://java.sun.com/j2se/docs/api/index.html

- some best practices

Checkstyle is very configurable. Be sure to read the documentation at
http://checkstyle.sf.net (or in your downloaded distribution).

Most Checks are configurable, be sure to consult the documentation.

To completely disable a check, just comment it out or delete it from the file.

Finally, it is worth reading the documentation.

-->

<module name="Checker">
<!-- Checks for the file header. -->
<!-- See http://checkstyle.sf.net/config_header.html -->
<module name="Header">
<property name="headerFile" value="${checkstyle.header.file}"/>
<property name="ignoreLines" value="2"/>
<property name="fileExtensions" value="java, scala"/>
</module>

<!-- Checks whether files end with a new line. -->
<!-- See http://checkstyle.sf.net/config_misc.html#NewlineAtEndOfFile -->
<module name="NewlineAtEndOfFile"/>

<module name="FileLength"/>
<module name="FileTabCharacter"/>

<!-- <module name="JavadocPackage"/> -->

<module name="TreeWalker">
<!-- Checks for Javadoc comments. -->
<!-- See http://checkstyle.sf.net/config_javadoc.html -->
<!-- <module name="JavadocType"/> -->
<module name="JavadocMethod">
<property name="scope" value="package"/>
<property name="allowUndeclaredRTE" value="true"/>
<property name="allowThrowsTagsForSubclasses" value="true"/>
<property name="validateThrows" value="true"/>
<property name="allowMissingJavadoc" value="true"/>
</module>
<module name="JavadocStyle"/>

<module name="SuperClone"/>
<module name="SuperFinalize"/>

<!-- Checks for Naming Conventions. -->
<!-- See http://checkstyle.sf.net/config_naming.html -->
<module name="ConstantName"/>
<module name="ClassTypeParameterName">
<property name="format" value="^[A-Z]+$"/>
</module>
<module name="LocalFinalVariableName"/>
<module name="LocalVariableName"/>
<module name="MethodName"/>
<module name="MethodTypeParameterName">
<property name="format" value="^[A-Z]+$"/>
</module>
<module name="PackageName"/>
<module name="ParameterName"/>
<!-- <module name="StaticVariableName"/> -->
<module name="TypeName"/>

<!-- Checks for imports -->
<!-- See http://checkstyle.sf.net/config_import.html -->
<module name="IllegalImport"/>
<!-- defaults to sun.* packages -->
<module name="RedundantImport"/>
<module name="UnusedImports"/>
<module name="ImportOrder">
<property name="groups" value="/^(java)|(javax)/,*,/^(com\.cloudera)/"/>
<property name="ordered" value="true"/>
<property name="separated" value="true"/>
<property name="option" value="top"/>
</module>

<!-- Checks for Size Violations. -->
<!-- See http://checkstyle.sf.net/config_sizes.html -->
<module name="LineLength">
<property name="max" value="100"/>
</module>
<module name="MethodLength"/>
<module name="ParameterNumber">
<property name="max" value="8"/>
</module>
<module name="OuterTypeNumber"/>

<!-- Checks for whitespace -->
<!-- See http://checkstyle.sf.net/config_whitespace.html -->
<module name="GenericWhitespace"/>
<module name="EmptyForIteratorPad"/>
<module name="MethodParamPad"/>
<module name="NoWhitespaceAfter">
<property name="tokens"
value="BNOT, DEC, DOT, INC, LNOT, UNARY_MINUS, UNARY_PLUS"/>
</module>
<module name="NoWhitespaceBefore"/>
<!-- <module name="OperatorWrap"/> -->
<module name="ParenPad"/>
<module name="TypecastParenPad"/>
<module name="WhitespaceAfter">
<property name="tokens" value="COMMA, SEMI"/>
</module>
<module name="WhitespaceAround">
<property name="allowEmptyConstructors" value="true"/>
<property name="allowEmptyMethods" value="true"/>
<property name="tokens"
value="BAND, BAND_ASSIGN, BOR, BOR_ASSIGN, BSR, BSR_ASSIGN, BXOR, BXOR_ASSIGN, DIV, DIV_ASSIGN, EQUAL, GE, GT, LAND, LE, LITERAL_CATCH, LITERAL_DO, LITERAL_ELSE, LITERAL_FINALLY, LITERAL_FOR, LITERAL_IF, LITERAL_RETURN, LITERAL_SYNCHRONIZED, LITERAL_TRY, LITERAL_WHILE, LOR, LT, MINUS, MINUS_ASSIGN, MOD, MOD_ASSIGN, NOT_EQUAL, PLUS, PLUS_ASSIGN, QUESTION, SL, SLIST, SL_ASSIGN, SR, SR_ASSIGN, STAR, STAR_ASSIGN, TYPE_EXTENSION_AND"/>
</module>

<!-- Modifier Checks -->
<!-- See http://checkstyle.sf.net/config_modifiers.html -->
<module name="ModifierOrder"/>
<module name="RedundantModifier"/>


<!-- Checks for blocks. You know, those {}'s -->
<!-- See http://checkstyle.sf.net/config_blocks.html -->
<module name="AvoidNestedBlocks">
<property name="allowInSwitchCase" value="true"/>
</module>
<module name="EmptyBlock">
<!-- catch blocks need a statement or a comment. -->
<property name="option" value="text"/>
<property name="tokens" value="LITERAL_CATCH"/>
</module>
<module name="EmptyBlock">
<!-- all other blocks need a real statement. -->
<property name="option" value="stmt"/>
<property name="tokens" value="LITERAL_DO, LITERAL_ELSE, LITERAL_FINALLY,
LITERAL_IF, LITERAL_FOR, LITERAL_TRY, LITERAL_WHILE, INSTANCE_INIT,
STATIC_INIT"/>
</module>
<module name="LeftCurly"/>
<module name="NeedBraces"/>
<module name="RightCurly"/>

<!-- Checks for common coding problems -->
<!-- See http://checkstyle.sf.net/config_coding.html -->
<!-- module name="AvoidInlineConditionals"/-->
<module name="EmptyStatement"/>
<module name="EqualsHashCode"/>
<module name="StringLiteralEquality"/>
<module name="HiddenField">
<property name="ignoreConstructorParameter" value="true"/>
</module>
<module name="IllegalInstantiation"/>
<module name="InnerAssignment"/>
<module name="MissingSwitchDefault"/>
<!--<module name="RedundantThrows"/>-->
<module name="SimplifyBooleanExpression"/>
<module name="SimplifyBooleanReturn"/>
<module name="DefaultComesLast"/>

<!-- Checks for class design -->
<!-- See http://checkstyle.sf.net/config_design.html -->
<module name="FinalClass"/>
<module name="HideUtilityClassConstructor"/>
<module name="InterfaceIsType"/>
<module name="VisibilityModifier">
<property name="protectedAllowed" value="true"/>
</module>
<module name="MissingOverride"/>

<!-- Miscellaneous other checks. -->
<!-- See http://checkstyle.sf.net/config_misc.html -->
<module name="ArrayTypeStyle"/>
<module name="ArrayTrailingComma"/>
<module name="UpperEll"/>
<module name="Regexp">
<property name="format" value="[ \t]+$"/>
<property name="illegalPattern" value="true"/>
<property name="message" value="Trailing whitespace"/>
</module>

<module name="FileContentsHolder"/>
</module>

<!-- allow warnings to be suppressed -->
<module name="SuppressionCommentFilter">
<property name="offCommentFormat" value="CSOFF\: ([\w\|]+)"/>
<property name="onCommentFormat" value="CSON\: ([\w\|]+)"/>
<property name="checkFormat" value="$1"/>
</module>

<module name="SuppressionFilter"/>
</module>
14 changes: 14 additions & 0 deletions runners/spark/build-resources/header-file.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
/*
* Copyright (c) 2015, Cloudera, Inc. All Rights Reserved.
*
* Cloudera, Inc. licenses this file to you under the Apache License,
* Version 2.0 (the "License"). You may not use this file except in
* compliance with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* This software is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
* CONDITIONS OF ANY KIND, either express or implied. See the License for
* the specific language governing permissions and limitations under the
* License.
*/
Loading