[SPARK-14555] First cut of Python API for Structured Streaming #12320

brkyvz · 2016-04-12T03:23:23Z

What changes were proposed in this pull request?

This patch provides a first cut of python APIs for structured streaming. This PR provides the new classes:

ContinuousQuery
Trigger
ProcessingTime
in pyspark under pyspark.sql.streaming.

In addition, it contains the new methods added under:

DataFrameWriter
a) startStream
b) trigger
c) queryName
DataFrameReader
a) stream
DataFrame
a) isStreaming

This PR doesn't contain all methods exposed for ContinuousQuery, for example:

exception
sourceStatuses
sinkStatus

They may be added in a follow up.

This PR also contains some very minor doc fixes in the Scala side.

How was this patch tested?

Python doc tests

TODO:

verify Python docs look good

brkyvz · 2016-04-12T03:25:56Z

cc @marmbrus @rxin @zsxwing

SparkQA · 2016-04-12T03:29:18Z

Test build #55579 has finished for PR 12320 at commit 043ab9d.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class ContinuousQuery(object):
- class ProcessingTime(Trigger):
- case class ProcessingTime(interval: Long) extends Trigger

SparkQA · 2016-04-12T03:34:13Z

Test build #55580 has finished for PR 12320 at commit ce4171b.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-12T03:42:48Z

Test build #55581 has finished for PR 12320 at commit 6ae7fd1.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-12T07:45:46Z

Test build #55592 has finished for PR 12320 at commit da63975.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class ProcessingTime(intervalMs: Long) extends Trigger

zsxwing · 2016-04-12T17:34:21Z

The Python APIs looks great.

SparkQA · 2016-04-12T18:50:35Z

Test build #55619 has finished for PR 12320 at commit 96ac9f9.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-12T21:38:20Z

Test build #55640 has finished for PR 12320 at commit 1fe20ed.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class Trigger(object):

davies · 2016-04-14T07:19:50Z

python/pyspark/sql/readwriter.py

+        return self
+
+    @since(2.0)
+    def trigger(self, trigger):


Put a default value here?

It's already default on the Scala side. I don't want people calling write.trigger() unnecessarily.

I see. It may say that "Without calling this, it will run the query as fast as possible"

SparkQA · 2016-04-14T23:45:29Z

Test build #55860 has finished for PR 12320 at commit 6dde6b8.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-14T23:52:41Z

Test build #55863 has finished for PR 12320 at commit 2e0a527.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-15T01:12:17Z

Test build #55870 has finished for PR 12320 at commit c55e605.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-15T02:40:57Z

Test build #55885 has finished for PR 12320 at commit 588ce1f.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-15T17:47:21Z

Test build #55936 has finished for PR 12320 at commit 147e9f9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

davies · 2016-04-15T17:52:51Z

LGTM. Do we need another people to look at the these APIs?

brkyvz · 2016-04-15T17:55:50Z

Maybe @rxin or @marmbrus would want to take a look?

marmbrus · 2016-04-15T18:15:37Z

python/pyspark/sql/readwriter.py

+
+        :param path: the path in a Hadoop supported file system
+        :param format: the format used to save
+        :param mode: specifies the behavior of the save operation when data already exists.


What does mode mean for a stream? I don't think we support that.

…ython

SparkQA · 2016-04-18T18:24:47Z

Test build #56112 has finished for PR 12320 at commit 1b92f98.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-18T18:54:49Z

Test build #56119 has finished for PR 12320 at commit 7c61467.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-18T20:28:00Z

Test build #56121 has finished for PR 12320 at commit 0e0b10b.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-18T21:09:56Z

Test build #56116 has finished for PR 12320 at commit 538f410.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

brkyvz · 2016-04-18T21:16:01Z

retest this please

SparkQA · 2016-04-18T21:50:46Z

Test build #56131 has finished for PR 12320 at commit fbe93c9.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-18T22:36:19Z

Test build #56137 has finished for PR 12320 at commit fbe93c9.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-18T23:30:16Z

Test build #56147 has finished for PR 12320 at commit 302da9b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tdas · 2016-04-18T23:47:36Z

python/pyspark/sql/streaming.py

+        return self._jcq.isActive()
+
+    @since(2.0)
+    def awaitTermination(self, timeoutMs=None):


No test for verifying that the params are correctly passed.

tdas · 2016-04-19T01:12:53Z

Other than the test issues, overall looks good.

…ython

SparkQA · 2016-04-19T18:39:44Z

Test build #56253 has finished for PR 12320 at commit ed2cd50.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-19T21:57:48Z

Test build #56264 has finished for PR 12320 at commit c07d795.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-19T23:18:21Z

Test build #56280 has finished for PR 12320 at commit 981f8e1.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-20T01:03:33Z

Test build #56293 has finished for PR 12320 at commit 3d36543.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

marmbrus · 2016-04-20T17:31:27Z

Thanks, merging to master.

added python API for streaming dataframes

043ab9d

minor

ce4171b

fix pystyle

6ae7fd1

fix test

da63975

fix py tests

96ac9f9

fix object

1fe20ed

davies reviewed Apr 14, 2016
View reviewed changes

brkyvz added 3 commits April 14, 2016 15:26

address comments

6dde6b8

minor

b95d6ed

more

2e0a527

register subclass

c55e605

try this

588ce1f

fix check

147e9f9

marmbrus reviewed Apr 15, 2016
View reviewed changes

Merge branch 'stream-python' of github.com:brkyvz/spark into stream-p…

1b92f98

…ython

brkyvz added 2 commits April 18, 2016 11:45

move keyword_args

538f410

fix mcs

7c61467

import wraps

0e0b10b

Update tests.py

fbe93c9

Update readwriter.py

302da9b

tdas reviewed Apr 18, 2016
View reviewed changes

brkyvz added 2 commits April 19, 2016 11:32

address comments

b784114

Merge branch 'stream-python' of github.com:brkyvz/spark into stream-p…

ed2cd50

…ython

add process all available

c07d795

fix test

981f8e1

Update tests.py

3d36543

asfgit closed this in 80bf48f Apr 20, 2016

brkyvz deleted the stream-python branch February 3, 2019 20:54

[SPARK-14555] First cut of Python API for Structured Streaming #12320

[SPARK-14555] First cut of Python API for Structured Streaming #12320

Uh oh!

Conversation

brkyvz commented Apr 12, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

brkyvz commented Apr 12, 2016

Uh oh!

SparkQA commented Apr 12, 2016

Uh oh!

SparkQA commented Apr 12, 2016

Uh oh!

SparkQA commented Apr 12, 2016

Uh oh!

SparkQA commented Apr 12, 2016

Uh oh!

zsxwing commented Apr 12, 2016

Uh oh!

SparkQA commented Apr 12, 2016

Uh oh!

SparkQA commented Apr 12, 2016

Uh oh!

davies Apr 14, 2016

Choose a reason for hiding this comment

Uh oh!

brkyvz Apr 14, 2016

Choose a reason for hiding this comment

Uh oh!

davies Apr 14, 2016

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Apr 14, 2016

Uh oh!

SparkQA commented Apr 14, 2016

Uh oh!

SparkQA commented Apr 15, 2016

Uh oh!

SparkQA commented Apr 15, 2016

Uh oh!

SparkQA commented Apr 15, 2016

Uh oh!

davies commented Apr 15, 2016

Uh oh!

brkyvz commented Apr 15, 2016

Uh oh!

marmbrus Apr 15, 2016

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Apr 18, 2016

Uh oh!

SparkQA commented Apr 18, 2016

Uh oh!

SparkQA commented Apr 18, 2016

Uh oh!

SparkQA commented Apr 18, 2016

Uh oh!

brkyvz commented Apr 18, 2016

Uh oh!

SparkQA commented Apr 18, 2016

Uh oh!

SparkQA commented Apr 18, 2016

Uh oh!

SparkQA commented Apr 18, 2016

Uh oh!

tdas Apr 18, 2016

Choose a reason for hiding this comment

Uh oh!

brkyvz Apr 19, 2016

Choose a reason for hiding this comment

Uh oh!

tdas commented Apr 19, 2016

Uh oh!

SparkQA commented Apr 19, 2016

Uh oh!

SparkQA commented Apr 19, 2016

Uh oh!

SparkQA commented Apr 19, 2016

Uh oh!