Skip to content

Conversation

@brkyvz
Copy link
Contributor

@brkyvz brkyvz commented Apr 12, 2016

What changes were proposed in this pull request?

This patch provides a first cut of python APIs for structured streaming. This PR provides the new classes:

  • ContinuousQuery
  • Trigger
  • ProcessingTime
    in pyspark under pyspark.sql.streaming.

In addition, it contains the new methods added under:

  • DataFrameWriter
    a) startStream
    b) trigger
    c) queryName
  • DataFrameReader
    a) stream
  • DataFrame
    a) isStreaming

This PR doesn't contain all methods exposed for ContinuousQuery, for example:

  • exception
  • sourceStatuses
  • sinkStatus

They may be added in a follow up.

This PR also contains some very minor doc fixes in the Scala side.

How was this patch tested?

Python doc tests

TODO:

  • verify Python docs look good

@brkyvz
Copy link
Contributor Author

brkyvz commented Apr 12, 2016

cc @marmbrus @rxin @zsxwing

@SparkQA
Copy link

SparkQA commented Apr 12, 2016

Test build #55579 has finished for PR 12320 at commit 043ab9d.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class ContinuousQuery(object):
    • class ProcessingTime(Trigger):
    • case class ProcessingTime(interval: Long) extends Trigger

@SparkQA
Copy link

SparkQA commented Apr 12, 2016

Test build #55580 has finished for PR 12320 at commit ce4171b.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 12, 2016

Test build #55581 has finished for PR 12320 at commit 6ae7fd1.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 12, 2016

Test build #55592 has finished for PR 12320 at commit da63975.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class ProcessingTime(intervalMs: Long) extends Trigger

@zsxwing
Copy link
Member

zsxwing commented Apr 12, 2016

The Python APIs looks great.

@SparkQA
Copy link

SparkQA commented Apr 12, 2016

Test build #55619 has finished for PR 12320 at commit 96ac9f9.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 12, 2016

Test build #55640 has finished for PR 12320 at commit 1fe20ed.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class Trigger(object):

return self

@since(2.0)
def trigger(self, trigger):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put a default value here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already default on the Scala side. I don't want people calling write.trigger() unnecessarily.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. It may say that "Without calling this, it will run the query as fast as possible"

@SparkQA
Copy link

SparkQA commented Apr 14, 2016

Test build #55860 has finished for PR 12320 at commit 6dde6b8.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 14, 2016

Test build #55863 has finished for PR 12320 at commit 2e0a527.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 15, 2016

Test build #55870 has finished for PR 12320 at commit c55e605.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 15, 2016

Test build #55885 has finished for PR 12320 at commit 588ce1f.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 15, 2016

Test build #55936 has finished for PR 12320 at commit 147e9f9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@davies
Copy link
Contributor

davies commented Apr 15, 2016

LGTM. Do we need another people to look at the these APIs?

@brkyvz
Copy link
Contributor Author

brkyvz commented Apr 15, 2016

Maybe @rxin or @marmbrus would want to take a look?


:param path: the path in a Hadoop supported file system
:param format: the format used to save
:param mode: specifies the behavior of the save operation when data already exists.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does mode mean for a stream? I don't think we support that.

@SparkQA
Copy link

SparkQA commented Apr 18, 2016

Test build #56112 has finished for PR 12320 at commit 1b92f98.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 18, 2016

Test build #56119 has finished for PR 12320 at commit 7c61467.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 18, 2016

Test build #56121 has finished for PR 12320 at commit 0e0b10b.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 18, 2016

Test build #56116 has finished for PR 12320 at commit 538f410.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@brkyvz
Copy link
Contributor Author

brkyvz commented Apr 18, 2016

retest this please

@SparkQA
Copy link

SparkQA commented Apr 18, 2016

Test build #56131 has finished for PR 12320 at commit fbe93c9.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 18, 2016

Test build #56137 has finished for PR 12320 at commit fbe93c9.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 18, 2016

Test build #56147 has finished for PR 12320 at commit 302da9b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

return self._jcq.isActive()

@since(2.0)
def awaitTermination(self, timeoutMs=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No test for verifying that the params are correctly passed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@tdas
Copy link
Contributor

tdas commented Apr 19, 2016

Other than the test issues, overall looks good.

@SparkQA
Copy link

SparkQA commented Apr 19, 2016

Test build #56253 has finished for PR 12320 at commit ed2cd50.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 19, 2016

Test build #56264 has finished for PR 12320 at commit c07d795.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 19, 2016

Test build #56280 has finished for PR 12320 at commit 981f8e1.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 20, 2016

Test build #56293 has finished for PR 12320 at commit 3d36543.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@marmbrus
Copy link
Contributor

Thanks, merging to master.

@asfgit asfgit closed this in 80bf48f Apr 20, 2016
@brkyvz brkyvz deleted the stream-python branch February 3, 2019 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants