From 6a581d0a6421a90c36c0c25b79770edb676a9e33 Mon Sep 17 00:00:00 2001 From: oborysevych Date: Thu, 8 Sep 2022 14:56:58 +0300 Subject: [PATCH 01/31] learning content for introduction module --- .../learning-content/go/content-info.yaml | 22 +++ .../basic-concepts/description.md | 106 ++++++++++ .../basic-concepts/example/main.go | 48 +++++ .../basic-concepts/unit-info.yaml | 22 +++ .../from-memory/description.md | 54 ++++++ .../from-memory/example/pardoExample.go | 49 +++++ .../from-memory/unit-info.yaml | 22 +++ .../creating-collections/group-info.yaml | 27 +++ .../reading-from-csv/description.md | 21 ++ .../reading-from-csv/example/csvExample.go | 84 ++++++++ .../reading-from-csv/unit-info.yaml | 22 +++ .../reading-from-text/description.md | 181 ++++++++++++++++++ .../reading-from-text/example/textIo.go | 92 +++++++++ .../reading-from-text/unit-info.yaml | 22 +++ .../introduction-concepts/group-info.yaml | 26 +++ .../introduction-guide/description.md | 23 +++ .../introduction-guide/unit-info.yaml | 21 ++ .../introduction-terms/description.md | 38 ++++ .../introduction-terms/unit-info.yaml | 21 ++ .../go/introduction/module-info.yaml | 27 +++ .../learning-content/java/content-info.yaml | 22 +++ .../basic-concepts/description.md | 134 +++++++++++++ .../basic-concepts/example/Task.java | 69 +++++++ .../basic-concepts/unit-info.yaml | 22 +++ .../from-memory/description.md | 59 ++++++ .../from-memory/example/ParDoExample.java | 82 ++++++++ .../from-memory/unit-info.yaml | 22 +++ .../creating-collections/group-info.yaml | 27 +++ .../reading-from-csv/description.md | 21 ++ .../reading-from-csv/example/CSVExample.java | 104 ++++++++++ .../reading-from-csv/unit-info.yaml | 22 +++ .../reading-from-text/description.md | 38 ++++ .../example/TextIOExample.java | 96 ++++++++++ .../reading-from-text/unit-info.yaml | 22 +++ .../introduction-concepts/group-info.yaml | 26 +++ .../introduction-guide/description.md | 23 +++ .../introduction-guide/unit-info.yaml | 22 +++ .../introduction-terms/description.md | 38 ++++ .../introduction-terms/unit-info.yaml | 22 +++ .../java/introduction/module-info.yaml | 26 +++ .../learning-content/python/content-info.yaml | 22 +++ .../basic-concepts/description.md | 117 +++++++++++ .../basic-concepts/example/task.py | 38 ++++ .../basic-concepts/unit-info.yaml | 22 +++ .../from-memory/description.md | 51 +++++ .../from-memory/example/pardoExample.py | 47 +++++ .../from-memory/unit-info.yaml | 22 +++ .../creating-collections/group-info.yaml | 27 +++ .../reading-from-csv/description.md | 20 ++ .../reading-from-csv/example/csvExample.py | 63 ++++++ .../reading-from-csv/unit-info.yaml | 22 +++ .../reading-from-text/description.md | 33 ++++ .../reading-from-text/example/textIo.py | 58 ++++++ .../reading-from-text/unit-info.yaml | 22 +++ .../introduction-concepts/group-info.yaml | 26 +++ .../introduction-guide/description.md | 23 +++ .../introduction-guide/unit-info.yaml | 22 +++ .../introduction-terms/description.md | 38 ++++ .../introduction-terms/unit-info.yaml | 22 +++ .../python/introduction/module-info.yaml | 27 +++ 60 files changed, 2547 insertions(+) create mode 100644 learning/tour-of-beam/learning-content/go/content-info.yaml create mode 100644 learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/basic-concepts/description.md create mode 100644 learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/basic-concepts/example/main.go create mode 100644 learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/basic-concepts/unit-info.yaml create mode 100644 learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/from-memory/description.md create mode 100644 learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/from-memory/example/pardoExample.go create mode 100644 learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/from-memory/unit-info.yaml create mode 100644 learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/group-info.yaml create mode 100644 learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/reading-from-csv/description.md create mode 100644 learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/reading-from-csv/example/csvExample.go create mode 100644 learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/reading-from-csv/unit-info.yaml create mode 100644 learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/reading-from-text/description.md create mode 100644 learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/reading-from-text/example/textIo.go create mode 100644 learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/creating-collections/reading-from-text/unit-info.yaml create mode 100644 learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/group-info.yaml create mode 100644 learning/tour-of-beam/learning-content/go/introduction/introduction-guide/description.md create mode 100644 learning/tour-of-beam/learning-content/go/introduction/introduction-guide/unit-info.yaml create mode 100644 learning/tour-of-beam/learning-content/go/introduction/introduction-terms/description.md create mode 100644 learning/tour-of-beam/learning-content/go/introduction/introduction-terms/unit-info.yaml create mode 100644 learning/tour-of-beam/learning-content/go/introduction/module-info.yaml create mode 100644 learning/tour-of-beam/learning-content/java/content-info.yaml create mode 100644 learning/tour-of-beam/learning-content/java/introduction/introduction-concepts/basic-concepts/description.md create mode 100644 learning/tour-of-beam/learning-content/java/introduction/introduction-concepts/basic-concepts/example/Task.java create mode 100644 learning/tour-of-beam/learning-content/java/introduction/introduction-concepts/basic-concepts/unit-info.yaml create mode 100644 learning/tour-of-beam/learning-content/java/introduction/introduction-concepts/creating-collections/from-memory/description.md create mode 100644 learning/tour-of-beam/learning-content/java/introduction/introduction-concepts/creating-collections/from-memory/example/ParDoExample.java create mode 100644 learning/tour-of-beam/learning-content/java/introduction/introduction-concepts/creating-collections/from-memory/unit-info.yaml create mode 100644 learning/tour-of-beam/learning-content/java/introduction/introduction-concepts/creating-collections/group-info.yaml create mode 100644 learning/tour-of-beam/learning-content/java/introduction/introduction-concepts/creating-collections/reading-from-csv/description.md create mode 100644 learning/tour-of-beam/learning-content/java/introduction/introduction-concepts/creating-collections/reading-from-csv/example/CSVExample.java create mode 100644 learning/tour-of-beam/learning-content/java/introduction/introduction-concepts/creating-collections/reading-from-csv/unit-info.yaml create mode 100644 learning/tour-of-beam/learning-content/java/introduction/introduction-concepts/creating-collections/reading-from-text/description.md create mode 100644 learning/tour-of-beam/learning-content/java/introduction/introduction-concepts/creating-collections/reading-from-text/example/TextIOExample.java create mode 100644 learning/tour-of-beam/learning-content/java/introduction/introduction-concepts/creating-collections/reading-from-text/unit-info.yaml create mode 100644 learning/tour-of-beam/learning-content/java/introduction/introduction-concepts/group-info.yaml create mode 100644 learning/tour-of-beam/learning-content/java/introduction/introduction-guide/description.md create mode 100644 learning/tour-of-beam/learning-content/java/introduction/introduction-guide/unit-info.yaml create mode 100644 learning/tour-of-beam/learning-content/java/introduction/introduction-terms/description.md create mode 100644 learning/tour-of-beam/learning-content/java/introduction/introduction-terms/unit-info.yaml create mode 100644 learning/tour-of-beam/learning-content/java/introduction/module-info.yaml create mode 100644 learning/tour-of-beam/learning-content/python/content-info.yaml create mode 100644 learning/tour-of-beam/learning-content/python/introduction/introduction-concepts/basic-concepts/description.md create mode 100644 learning/tour-of-beam/learning-content/python/introduction/introduction-concepts/basic-concepts/example/task.py create mode 100644 learning/tour-of-beam/learning-content/python/introduction/introduction-concepts/basic-concepts/unit-info.yaml create mode 100644 learning/tour-of-beam/learning-content/python/introduction/introduction-concepts/creating-collections/from-memory/description.md create mode 100644 learning/tour-of-beam/learning-content/python/introduction/introduction-concepts/creating-collections/from-memory/example/pardoExample.py create mode 100644 learning/tour-of-beam/learning-content/python/introduction/introduction-concepts/creating-collections/from-memory/unit-info.yaml create mode 100644 learning/tour-of-beam/learning-content/python/introduction/introduction-concepts/creating-collections/group-info.yaml create mode 100644 learning/tour-of-beam/learning-content/python/introduction/introduction-concepts/creating-collections/reading-from-csv/description.md create mode 100644 learning/tour-of-beam/learning-content/python/introduction/introduction-concepts/creating-collections/reading-from-csv/example/csvExample.py create mode 100644 learning/tour-of-beam/learning-content/python/introduction/introduction-concepts/creating-collections/reading-from-csv/unit-info.yaml create mode 100644 learning/tour-of-beam/learning-content/python/introduction/introduction-concepts/creating-collections/reading-from-text/description.md create mode 100644 learning/tour-of-beam/learning-content/python/introduction/introduction-concepts/creating-collections/reading-from-text/example/textIo.py create mode 100644 learning/tour-of-beam/learning-content/python/introduction/introduction-concepts/creating-collections/reading-from-text/unit-info.yaml create mode 100644 learning/tour-of-beam/learning-content/python/introduction/introduction-concepts/group-info.yaml create mode 100644 learning/tour-of-beam/learning-content/python/introduction/introduction-guide/description.md create mode 100644 learning/tour-of-beam/learning-content/python/introduction/introduction-guide/unit-info.yaml create mode 100644 learning/tour-of-beam/learning-content/python/introduction/introduction-terms/description.md create mode 100644 learning/tour-of-beam/learning-content/python/introduction/introduction-terms/unit-info.yaml create mode 100644 learning/tour-of-beam/learning-content/python/introduction/module-info.yaml diff --git a/learning/tour-of-beam/learning-content/go/content-info.yaml b/learning/tour-of-beam/learning-content/go/content-info.yaml new file mode 100644 index 000000000000..8b75fa8771d2 --- /dev/null +++ b/learning/tour-of-beam/learning-content/go/content-info.yaml @@ -0,0 +1,22 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + +sdk: Go +content: + - introduction \ No newline at end of file diff --git a/learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/basic-concepts/description.md b/learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/basic-concepts/description.md new file mode 100644 index 000000000000..19c852591246 --- /dev/null +++ b/learning/tour-of-beam/learning-content/go/introduction/introduction-concepts/basic-concepts/description.md @@ -0,0 +1,106 @@ + +# Tour of Beam Programming Guide + +The Beam Programming Guide is intended for Beam users who want to use the Beam SDKs to create data processing pipelines. This guide provides guidance for using the Beam SDK classes to build and test pipelines. The programming guide is not intended to be an exhaustive reference, but rather a language-agnostic, high-level guide to programmatically building your Beam pipeline. As the programming guide is filled out, the text will include code samples in multiple languages to help illustrate how to implement Beam concepts in your pipelines. + +For a brief introduction to Beam’s basic concepts,take a look at the Basics of the Beam model page before reading the programming guide. + +### Overview + +To use Beam, you first need to first create a driver program using the classes in one of the Beam SDKs. Your driver program defines your pipeline, including all of the inputs, transforms, and outputs. It also sets execution options for your pipeline (typically passed by using command-line options). These include the Pipeline Runner, which, in turn, determines what back-end your pipeline will run on. + +The Beam SDKs provide several abstractions that simplify the mechanics of large-scale distributed data processing. The same Beam abstractions work with both batch and streaming data sources. When you create your Beam pipeline, you can think about your data processing task in terms of these abstractions. They include: + +→ `Pipeline`: A Pipeline encapsulates your entire data processing task, from start to finish. This includes reading input data, transforming that data, and writing output data. All Beam driver programs must create a Pipeline. When you create the Pipeline, you must also specify the execution options that tell the Pipeline where and how to run. + +→ `PCollection`: A PCollection represents a distributed data set that your Beam pipeline operates on. The data set can be bounded, meaning it comes from a fixed source like a file, or unbounded, meaning it comes from a continuously updating source via a subscription or other mechanism. Your pipeline typically creates an initial PCollection by reading data from an external data source, but you can also create a PCollection from in-memory data within your driver program. From there, PCollections are the inputs and outputs for each step in your pipeline. + +→ `PTransform`: A PTransform represents a data processing operation, or a step, in your pipeline. Every PTransform takes one or more PCollection objects as the input, performs a processing function that you provide on the elements of that PCollection, and then produces zero or more output PCollection objects. + +→ `Scope`: The Go SDK has an explicit scope variable used to build a `Pipeline`. A Pipeline can return it’s root scope with the `Root()` method. The scope variable is then passed to `PTransform` functions that place them in the `Pipeline` that owns the `Scope`. + +→ `I/O transforms`: Beam comes with a number of “IOs” - library PTransforms that read or write data to various external storage systems. + +A typical Beam driver program works as follows: + +→ Create a Pipeline object and set the pipeline execution options, including the Pipeline Runner. + +→ Create an initial `PCollection` for pipeline data, either using the IOs to read data from an external storage system, or using a Create transform to build a `PCollection` from in-memory data. + +→ Apply `PTransforms` to each `PCollection`. Transforms can change, filter, group, analyze, or otherwise process the elements in a PCollection. A transform creates a new output PCollection without modifying the input collection. A typical pipeline applies subsequent transforms to each new output PCollection in turn until the processing is complete. However, note that a pipeline does not have to be a single straight line of transforms applied one after another: think of PCollections as variables and PTransforms as functions applied to these variables: the shape of the pipeline can be an arbitrarily complex processing graph. + +→ Use IOs to write the final, transformed PCollection(s) to an external source. + +→ Run the pipeline using the designated Pipeline Runner. + +When you run your Beam driver program, the Pipeline Runner that you designate constructs a workflow graph of your pipeline based on the PCollection objects you’ve created and the transforms that you’ve applied. That graph is then executed using the appropriate distributed processing back-end, becoming an asynchronous “job” (or equivalent) on that back-end. + +### Creating a pipeline + +The `Pipeline` abstraction encapsulates all the data and steps in your data processing task. Your Beam driver program typically starts by constructing a Pipeline object, and then using that object as the basis for creating the pipeline’s data sets as PCollections and its operations as `Transforms`. + +To use Beam, your driver program must first create an instance of the Beam SDK class Pipeline (typically in the main() function). When you create your `Pipeline`, you’ll also need to set some configuration options. You can set your pipeline’s configuration options programmatically, but it’s often easier to set the options ahead of time (or read them from the command line) and pass them to the Pipeline object when you create the object. + +``` +// beam.Init() is an initialization hook that must be called +// near the beginning of main(), before creating a pipeline. +beam.Init() + +// Create the Pipeline object and root scope. +pipeline, scope := beam.NewPipelineWithRoot() +``` + +### Configuring pipeline options + +Use the pipeline options to configure different aspects of your pipeline, such as the pipeline runner that will execute your pipeline and any runner-specific configuration required by the chosen runner. Your pipeline options will potentially include information such as your project ID or a location for storing files. + +### Setting PipelineOptions from command-line arguments + +Use Go flags to parse command line arguments to configure your pipeline. Flags must be parsed before `beam.Init()` is called. + +``` +// If beamx or Go flags are used, flags must be parsed first, +// before beam.Init() is called. +flag.Parse() +``` + +This interprets command-line arguments this follow the format: + +``` +--