Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
21d355a
beam-wide: blacklist Throwables.propagate and remove uses
dhalperi May 2, 2016
003a300
Fix the java doc for Combine.perKey and ApproximateQuantiles
peihe May 2, 2016
0e9041a
Use CommittedResult in InMemoryWatermarkManager
tgroh Apr 28, 2016
8797b14
[BEAM-48] Upgrade bigquery library to v2-rev292-1.21.0
peihe May 3, 2016
b587417
[BEAM-255] Write: add limited logging
dhalperi May 3, 2016
cb71b56
[BEAM-154] Use dependencyManagement and pluginManagement to keep all …
jbonofre Apr 29, 2016
376f88e
[BEAM-168] IntervalBEB: remove deprecated function
dhalperi May 3, 2016
e6d48c0
Replace dataflow stagingLocation with tempLocation in example module.
peihe Apr 27, 2016
86a68c2
Add Matcher serializer in TestPipeline.
Apr 30, 2016
9ad04ab
[BEAM-256] Address wrong import order and add millis to output path f…
lukecwik May 4, 2016
677c412
[BEAM-53] Add PubsubApiaryClient, PubsubTestClient
Apr 27, 2016
7ecef4e
Create runners/core module for artifact org.apache.beam:runners-core
kennknowles May 3, 2016
658e609
Fix direct runner pom & deps
kennknowles May 4, 2016
dc0f6bb
Move ReadyCheckingSideInputReader to util
tgroh May 2, 2016
2366fa5
Add PushbackSideInputDoFnRunner
tgroh May 2, 2016
8529c69
[BEAM-48] Refactor BigQueryServices to support extract and query jobs
peihe May 4, 2016
7a8b1cc
Speed up non-release builds
kennknowles May 4, 2016
50012a4
Refactor CompletionCallbacks
tgroh May 3, 2016
eba8a49
Add TestFlinkPipelineRunner to FlinkRunnerRegistrar
kennknowles May 2, 2016
16a2c08
Configure RunnableOnService tests for Flink in batch mode
kennknowles May 2, 2016
c231418
Remove unused threadCount from integration tests
kennknowles May 6, 2016
4d9e400
Disable Flink streaming integration tests for now
kennknowles May 6, 2016
e58c215
Special casing job exec AssertionError in TestFlinkPipelineRunner
kennknowles May 6, 2016
97a388a
Add hamcrest dependency to Flink Runner
aljoscha May 6, 2016
de71ecf
Fix Dangling Flink DataSets
aljoscha May 6, 2016
d16522b
Fix faulty Flink Flatten when PCollectionList is empty
aljoscha May 13, 2016
7972dcd
[BEAM-270] Support Timestamps/Windows in Flink Batch
aljoscha May 10, 2016
072343d
Remove superfluous Flink Tests, Fix those that stay in
aljoscha May 13, 2016
4292f04
Fix last last outstanding test
aljoscha May 14, 2016
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions examples/java/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ the same pipeline on fully managed resources in Google Cloud Platform:
mvn compile exec:java -pl examples \
-Dexec.mainClass=com.google.cloud.dataflow.examples.WordCount \
-Dexec.args="--project=<YOUR CLOUD PLATFORM PROJECT ID> \
--stagingLocation=<YOUR CLOUD STORAGE LOCATION> \
--tempLocation=<YOUR CLOUD STORAGE LOCATION> \
--runner=BlockingDataflowPipelineRunner"

Make sure to use your project id, not the project number or the descriptive name.
Expand All @@ -66,7 +66,7 @@ Platform:
java -cp examples/target/google-cloud-dataflow-java-examples-all-bundled-<VERSION>.jar \
com.google.cloud.dataflow.examples.WordCount \
--project=<YOUR CLOUD PLATFORM PROJECT ID> \
--stagingLocation=<YOUR CLOUD STORAGE LOCATION> \
--tempLocation=<YOUR CLOUD STORAGE LOCATION> \
--runner=BlockingDataflowPipelineRunner

Other examples can be run similarly by replacing the `WordCount` class path with the example classpath, e.g.
Expand Down
96 changes: 0 additions & 96 deletions examples/java/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -60,23 +60,6 @@
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-source-plugin</artifactId>
<version>2.4</version>
<executions>
<execution>
<id>attach-sources</id>
<phase>compile</phase>
<goals>
<goal>jar</goal>
</goals>
</execution>
<execution>
<id>attach-test-sources</id>
<phase>test-compile</phase>
<goals>
<goal>test-jar</goal>
</goals>
</execution>
</executions>
</plugin>

<plugin>
Expand Down Expand Up @@ -150,7 +133,6 @@
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.4.1</version>
<executions>
<execution>
<phase>package</phase>
Expand Down Expand Up @@ -182,20 +164,6 @@
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<executions>
<execution>
<id>default-jar</id>
<goals>
<goal>jar</goal>
</goals>
</execution>
<execution>
<id>default-test-jar</id>
<goals>
<goal>test-jar</goal>
</goals>
</execution>
</executions>
</plugin>

<!-- Coverage analysis for unit tests. -->
Expand Down Expand Up @@ -232,7 +200,6 @@
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>java-sdk-all</artifactId>
<version>${project.version}</version>
</dependency>

<dependency>
Expand All @@ -244,121 +211,61 @@
<dependency>
<groupId>com.google.api-client</groupId>
<artifactId>google-api-client</artifactId>
<version>${google-clients.version}</version>
<exclusions>
<!-- Exclude an old version of guava that is being pulled
in by a transitive dependency of google-api-client -->
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava-jdk5</artifactId>
</exclusion>
</exclusions>
</dependency>

<dependency>
<groupId>com.google.apis</groupId>
<artifactId>google-api-services-dataflow</artifactId>
<version>${dataflow.version}</version>
<exclusions>
<!-- Exclude an old version of guava that is being pulled
in by a transitive dependency of google-api-client -->
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava-jdk5</artifactId>
</exclusion>
</exclusions>
</dependency>

<dependency>
<groupId>com.google.apis</groupId>
<artifactId>google-api-services-bigquery</artifactId>
<version>${bigquery.version}</version>
<exclusions>
<!-- Exclude an old version of guava that is being pulled
in by a transitive dependency of google-api-client -->
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava-jdk5</artifactId>
</exclusion>
</exclusions>
</dependency>

<dependency>
<groupId>com.google.http-client</groupId>
<artifactId>google-http-client</artifactId>
<version>${google-clients.version}</version>
<exclusions>
<!-- Exclude an old version of guava that is being pulled
in by a transitive dependency of google-api-client -->
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava-jdk5</artifactId>
</exclusion>
</exclusions>
</dependency>

<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro</artifactId>
<version>${avro.version}</version>
</dependency>

<dependency>
<groupId>com.google.apis</groupId>
<artifactId>google-api-services-datastore-protobuf</artifactId>
<version>${datastore.version}</version>
<exclusions>
<!-- Exclude an old version of guava that is being pulled
in by a transitive dependency of google-api-client -->
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava-jdk5</artifactId>
</exclusion>
</exclusions>
</dependency>

<dependency>
<groupId>com.google.apis</groupId>
<artifactId>google-api-services-pubsub</artifactId>
<version>${pubsub.version}</version>
<exclusions>
<!-- Exclude an old version of guava that is being pulled
in by a transitive dependency of google-api-client -->
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava-jdk5</artifactId>
</exclusion>
</exclusions>
</dependency>

<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>${guava.version}</version>
</dependency>

<dependency>
<groupId>com.google.code.findbugs</groupId>
<artifactId>jsr305</artifactId>
<version>${jsr305.version}</version>
</dependency>

<dependency>
<groupId>joda-time</groupId>
<artifactId>joda-time</artifactId>
<version>${joda.version}</version>
</dependency>

<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>${slf4j.version}</version>
</dependency>

<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-jdk14</artifactId>
<version>${slf4j.version}</version>
<scope>runtime</scope>
</dependency>

Expand All @@ -374,19 +281,16 @@
<dependency>
<groupId>org.hamcrest</groupId>
<artifactId>hamcrest-all</artifactId>
<version>${hamcrest.version}</version>
</dependency>

<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>${junit.version}</version>
</dependency>

<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-all</artifactId>
<version>1.10.19</version>
<scope>test</scope>
</dependency>
</dependencies>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@
* below, specify pipeline configuration:
* <pre>{@code
* --project=YOUR_PROJECT_ID
* --stagingLocation=gs://YOUR_STAGING_DIRECTORY
* --tempLocation=gs://YOUR_TEMP_DIRECTORY
* --runner=BlockingDataflowPipelineRunner
* --workerLogLevelOverrides={"org.apache.beam.examples":"DEBUG"}
* }
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,26 +50,26 @@
* 4. Writing data to Cloud Storage as text files
* </pre>
*
* <p>To execute this pipeline, first edit the code to set your project ID, the staging
* <p>To execute this pipeline, first edit the code to set your project ID, the temp
* location, and the output location. The specified GCS bucket(s) must already exist.
*
* <p>Then, run the pipeline as described in the README. It will be deployed and run using the
* Dataflow service. No args are required to run the pipeline. You can see the results in your
* <p>Then, run the pipeline as described in the README. It will be deployed and run with the
* selected runner. No args are required to run the pipeline. You can see the results in your
* output bucket in the GCS browser.
*/
public class MinimalWordCount {

public static void main(String[] args) {
// Create a DataflowPipelineOptions object. This object lets us set various execution
// Create a PipelineOptions object. This object lets us set various execution
// options for our pipeline, such as the associated Cloud Platform project and the location
// in Google Cloud Storage to stage files.
DataflowPipelineOptions options = PipelineOptionsFactory.create()
.as(DataflowPipelineOptions.class);
.as(DataflowPipelineOptions.class);
options.setRunner(BlockingDataflowPipelineRunner.class);
// CHANGE 1/3: Your project ID is required in order to run your pipeline on the Google Cloud.
options.setProject("SET_YOUR_PROJECT_ID_HERE");
// CHANGE 2/3: Your Google Cloud Storage path is required for staging local files.
options.setStagingLocation("gs://SET_YOUR_BUCKET_NAME_HERE/AND_STAGING_DIRECTORY");
options.setTempLocation("gs://SET_YOUR_BUCKET_NAME_HERE/AND_TEMP_DIRECTORY");

// Create the Pipeline object with the options we defined above.
Pipeline p = Pipeline.create(options);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@
* <p>To execute this pipeline using the Dataflow service, specify pipeline configuration:
* <pre>{@code
* --project=YOUR_PROJECT_ID
* --stagingLocation=gs://YOUR_STAGING_DIRECTORY
* --tempLocation=gs://YOUR_TEMP_DIRECTORY
* --runner=BlockingDataflowPipelineRunner
* }
* </pre>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@
*/
package org.apache.beam.examples;

import org.apache.beam.runners.dataflow.options.DataflowPipelineOptions;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.io.TextIO;
import org.apache.beam.sdk.options.Default;
Expand All @@ -37,9 +36,8 @@
import org.apache.beam.sdk.values.KV;
import org.apache.beam.sdk.values.PCollection;


/**
* An example that counts words in Shakespeare and includes Dataflow best practices.
* An example that counts words in Shakespeare and includes Beam best practices.
*
* <p>This class, {@link WordCount}, is the second in a series of four successively more detailed
* 'word count' examples. You may first want to take a look at {@link MinimalWordCount}.
Expand All @@ -56,13 +54,13 @@
*
* <p>New Concepts:
* <pre>
* 1. Executing a Pipeline both locally and using the Dataflow service
* 1. Executing a Pipeline both locally and using the selected runner
* 2. Using ParDo with static DoFns defined out-of-line
* 3. Building a composite transform
* 4. Defining your own pipeline options
* </pre>
*
* <p>Concept #1: you can execute this pipeline either locally or using the Dataflow service.
* <p>Concept #1: you can execute this pipeline either locally or using the selected runner.
* These are now command-line options and not hard-coded as they were in the MinimalWordCount
* example.
* To execute this pipeline locally, specify general pipeline configuration:
Expand All @@ -78,7 +76,7 @@
* <p>To execute this pipeline using the Dataflow service, specify pipeline configuration:
* <pre>{@code
* --project=YOUR_PROJECT_ID
* --stagingLocation=gs://YOUR_STAGING_DIRECTORY
* --tempLocation=gs://YOUR_TEMP_DIRECTORY
* --runner=BlockingDataflowPipelineRunner
* }
* </pre>
Expand Down Expand Up @@ -173,17 +171,16 @@ public static interface WordCountOptions extends PipelineOptions {
void setOutput(String value);

/**
* Returns "gs://${YOUR_STAGING_DIRECTORY}/counts.txt" as the default destination.
* Returns "gs://${YOUR_TEMP_DIRECTORY}/counts.txt" as the default destination.
*/
public static class OutputFactory implements DefaultValueFactory<String> {
@Override
public String create(PipelineOptions options) {
DataflowPipelineOptions dataflowOptions = options.as(DataflowPipelineOptions.class);
if (dataflowOptions.getStagingLocation() != null) {
return GcsPath.fromUri(dataflowOptions.getStagingLocation())
if (options.getTempLocation() != null) {
return GcsPath.fromUri(options.getTempLocation())
.resolve("counts.txt").toString();
} else {
throw new IllegalArgumentException("Must specify --output or --stagingLocation");
throw new IllegalArgumentException("Must specify --output or --tempLocation");
}
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,6 @@
import com.google.api.services.pubsub.model.Subscription;
import com.google.api.services.pubsub.model.Topic;
import com.google.common.base.Strings;
import com.google.common.base.Throwables;
import com.google.common.collect.Lists;
import com.google.common.collect.Sets;
import com.google.common.util.concurrent.Uninterruptibles;
Expand Down Expand Up @@ -116,7 +115,7 @@ public void setup() throws IOException {
Thread.currentThread().interrupt();
// Ignore InterruptedException
}
Throwables.propagate(lastException);
throw new RuntimeException(lastException);
}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@
* specify pipeline configuration:
* <pre>{@code
* --project=YOUR_PROJECT_ID
* --stagingLocation=gs://YOUR_STAGING_DIRECTORY
* --tempLocation=gs://YOUR_TEMP_DIRECTORY
* --runner=DataflowPipelineRunner
* --inputFile=gs://path/to/input*.txt
* }</pre>
Expand All @@ -90,7 +90,7 @@
* specify pipeline configuration:
* <pre>{@code
* --project=YOUR_PROJECT_ID
* --stagingLocation=gs://YOUR_STAGING_DIRECTORY
* --tempLocation=gs://YOUR_TEMP_DIRECTORY
* --runner=DataflowPipelineRunner
* --inputFile=gs://YOUR_INPUT_DIRECTORY/*.txt
* --streaming
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@
* <p>To execute this pipeline using the Dataflow service, specify pipeline configuration:
* <pre>{@code
* --project=YOUR_PROJECT_ID
* --stagingLocation=gs://YOUR_STAGING_DIRECTORY
* --tempLocation=gs://YOUR_TEMP_DIRECTORY
* --runner=BlockingDataflowPipelineRunner
* and an output prefix on GCS:
* --output=gs://YOUR_OUTPUT_PREFIX
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@
* <p>To execute this pipeline using the Dataflow service, specify pipeline configuration:
* <pre>{@code
* --project=YOUR_PROJECT_ID
* --stagingLocation=gs://YOUR_STAGING_DIRECTORY
* --tempLocation=gs://YOUR_TEMP_DIRECTORY
* --runner=BlockingDataflowPipelineRunner
* }
* </pre>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@
* <p>To execute this pipeline using the Dataflow service, specify pipeline configuration:
* <pre>{@code
* --project=YOUR_PROJECT_ID
* --stagingLocation=gs://YOUR_STAGING_DIRECTORY
* --tempLocation=gs://YOUR_TEMP_DIRECTORY
* --runner=BlockingDataflowPipelineRunner
* }
* </pre>
Expand Down
Loading