-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-18459][SPARK-18460][StructuredStreaming] Rename triggerId to batchId and add triggerDetails to json in StreamingQueryStatus (for branch-2.0) #15908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
LGTM pending tests. |
marmbrus
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this! A few other comments after looking at these metrics more closely.
| """ | ||
| |{ | ||
| | "name" : "query", | ||
| | "id" : 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should make this a String, both here and in the type-safe API. The reasoning is, its already auto generated and I think its likely we will want to turn this into a UUID (maybe we should even just do that). Thoughts?
| | }, | ||
| | "sourceStatuses" : [ { | ||
| | "description" : "MySource1", | ||
| | "offsetDesc" : "#0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these not JSON now?
| | "isTriggerActive" : "true", | ||
| | "batchId" : "5", | ||
| | "latency.getOffset.total" : "10", | ||
| | "isDataPresentInTrigger" : "true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this redundant with numRows.input.total?
| |"latency.getOffset.total":"10","isDataPresentInTrigger":"true"}, | ||
| |"sourceStatuses":[{"description":"MySource1","offsetDesc":"#0","inputRate":15.5, | ||
| |"processingRate":23.5,"triggerDetails":{"numRows.input.source":"100", | ||
| |"latency.getOffset.source":"10","latency.getBatch.source":"20"}}], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider removing source. It is already nested inside of the list of sources.
| | "latency" : 345.0, | ||
| | "triggerDetails" : { | ||
| | "latency.getBatch.total" : "20", | ||
| | "numRows.input.total" : "100", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to removing source I'd consider removing total since this all of these metrics relate to the entire query.
| | "numRows.input.total" : "100", | ||
| | "isTriggerActive" : "true", | ||
| | "batchId" : "5", | ||
| | "latency.getOffset.total" : "10", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are the timestamps? I only see latency here.
|
Oh sorry... I'm realizing I'm commenting on a PR for branch-2.0. We don't have to address these comments here, but we should make sure we are happy with all the naming before 2.1 is released. |
|
@marmbrus I will make another PR to address those. Hopefully there wont be any more conflicts between master/2.1 and 2.0 |
|
Test build #68729 has finished for PR 15908 at commit
|
|
Sounds good. Merging to branch 2.0. |
…atchId and add triggerDetails to json in StreamingQueryStatus (for branch-2.0) This is a fix for branch-2.0 for the earlier PR #15895 ## What changes were proposed in this pull request? SPARK-18459: triggerId seems like a number that should be increasing with each trigger, whether or not there is data in it. However, actually, triggerId increases only where there is a batch of data in a trigger. So its better to rename it to batchId. SPARK-18460: triggerDetails was missing from json representation. Fixed it. ## How was this patch tested? Updated tests Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #15908 from tdas/SPARK-18459-2.0.
|
@tdas Merged. Could you close this one, please? |
This is a fix for branch-2.0 for the earlier PR #15895
What changes were proposed in this pull request?
SPARK-18459: triggerId seems like a number that should be increasing with each trigger, whether or not there is data in it. However, actually, triggerId increases only where there is a batch of data in a trigger. So its better to rename it to batchId.
SPARK-18460: triggerDetails was missing from json representation. Fixed it.
How was this patch tested?
Updated tests