-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-14555] First cut of Python API for Structured Streaming #12320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #55579 has finished for PR 12320 at commit
|
|
Test build #55580 has finished for PR 12320 at commit
|
|
Test build #55581 has finished for PR 12320 at commit
|
|
Test build #55592 has finished for PR 12320 at commit
|
|
The Python APIs looks great. |
|
Test build #55619 has finished for PR 12320 at commit
|
|
Test build #55640 has finished for PR 12320 at commit
|
python/pyspark/sql/readwriter.py
Outdated
| return self | ||
|
|
||
| @since(2.0) | ||
| def trigger(self, trigger): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put a default value here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already default on the Scala side. I don't want people calling write.trigger() unnecessarily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. It may say that "Without calling this, it will run the query as fast as possible"
|
Test build #55860 has finished for PR 12320 at commit
|
|
Test build #55863 has finished for PR 12320 at commit
|
|
Test build #55870 has finished for PR 12320 at commit
|
|
Test build #55885 has finished for PR 12320 at commit
|
|
Test build #55936 has finished for PR 12320 at commit
|
|
LGTM. Do we need another people to look at the these APIs? |
python/pyspark/sql/readwriter.py
Outdated
|
|
||
| :param path: the path in a Hadoop supported file system | ||
| :param format: the format used to save | ||
| :param mode: specifies the behavior of the save operation when data already exists. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does mode mean for a stream? I don't think we support that.
|
Test build #56112 has finished for PR 12320 at commit
|
|
Test build #56119 has finished for PR 12320 at commit
|
|
Test build #56121 has finished for PR 12320 at commit
|
|
Test build #56116 has finished for PR 12320 at commit
|
|
retest this please |
|
Test build #56131 has finished for PR 12320 at commit
|
|
Test build #56137 has finished for PR 12320 at commit
|
|
Test build #56147 has finished for PR 12320 at commit
|
| return self._jcq.isActive() | ||
|
|
||
| @since(2.0) | ||
| def awaitTermination(self, timeoutMs=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No test for verifying that the params are correctly passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
Other than the test issues, overall looks good. |
|
Test build #56253 has finished for PR 12320 at commit
|
|
Test build #56264 has finished for PR 12320 at commit
|
|
Test build #56280 has finished for PR 12320 at commit
|
|
Test build #56293 has finished for PR 12320 at commit
|
|
Thanks, merging to master. |
What changes were proposed in this pull request?
This patch provides a first cut of python APIs for structured streaming. This PR provides the new classes:
in pyspark under
pyspark.sql.streaming.In addition, it contains the new methods added under:
DataFrameWritera)
startStreamb)
triggerc)
queryNameDataFrameReadera)
streamDataFramea)
isStreamingThis PR doesn't contain all methods exposed for
ContinuousQuery, for example:exceptionsourceStatusessinkStatusThey may be added in a follow up.
This PR also contains some very minor doc fixes in the Scala side.
How was this patch tested?
Python doc tests
TODO: