-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-17426][SQL] Refactor TreeNode.toJSON to avoid OOM when converting unknown fields to JSON
#14990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #65020 has finished for PR 14990 at commit
|
|
Test build #65022 has finished for PR 14990 at commit
|
|
@cloud-fan Can you take a look? |
| name -> JInt(children.indexOf(value)) | ||
| case (name, value: Seq[BaseType]) if value.toSet.subsetOf(containsChild) => | ||
| // Check the value (Seq[BaseType]) element type first before converting it to a Set. | ||
| // Otherwise, it may take a lot of memory to convert a super big Seq to Set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about use forall here? if values.forall(containsChild)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea.
| ("product-class" -> JString(p.getClass.getName)) :: fieldNames.zip(fieldValues).map { | ||
| case (name, value) => name -> parseToJson(value) | ||
| ("product-class" -> JString(p.getClass.getName)) :: fieldNames.zip(fieldValues).collect { | ||
| // Only converts String fields in Product to JSON |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we handle primitive types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other types are not handled, as I think there are not very useful for documentation purpose.
|
Test build #65084 has finished for PR 14990 at commit
|
|
Test build #65091 has finished for PR 14990 at commit
|
|
Test build #65094 has finished for PR 14990 at commit
|
|
Test build #65467 has finished for PR 14990 at commit
|
|
Test build #65468 has finished for PR 14990 at commit
|
|
LGTM, pending jenkins |
|
Test build #65473 has finished for PR 14990 at commit
|
|
Test build #65471 has finished for PR 14990 at commit
|
|
retest this please |
|
Test build #65480 has finished for PR 14990 at commit
|
|
thanks, merging to master! |
…rting unknown fields to JSON ## What changes were proposed in this pull request? This PR is a follow up of SPARK-17356. Current implementation of `TreeNode.toJSON` recursively converts all fields of TreeNode to JSON, even if the field is of type `Seq` or type Map. This may trigger out of memory exception in cases like: 1. the Seq or Map can be very big. Converting them to JSON may take huge memory, which may trigger out of memory error. 2. Some user space input may also be propagated to the Plan. The user space input can be of arbitrary type, and may also be self-referencing. Trying to print user space input to JSON may trigger out of memory error or stack overflow error. For a code example, please check the Jira description of SPARK-17426. In this PR, we refactor the `TreeNode.toJSON` so that we only convert a field to JSON string if the field is a safe type. ## How was this patch tested? Unit test. Author: Sean Zhong <seanzhong@databricks.com> Closes apache#14990 from clockfly/json_oom2.
What changes were proposed in this pull request?
This PR is a follow up of SPARK-17356. Current implementation of
TreeNode.toJSONrecursively converts all fields of TreeNode to JSON, even if the field is of typeSeqor type Map. This may trigger out of memory exception in cases like:For a code example, please check the Jira description of SPARK-17426.
In this PR, we refactor the
TreeNode.toJSONso that we only convert a field to JSON string if the field is a safe type.How was this patch tested?
Unit test.