Skip to content

Conversation

@clockfly
Copy link
Contributor

@clockfly clockfly commented Sep 7, 2016

What changes were proposed in this pull request?

This PR is a follow up of SPARK-17356. Current implementation of TreeNode.toJSON recursively converts all fields of TreeNode to JSON, even if the field is of type Seq or type Map. This may trigger out of memory exception in cases like:

  1. the Seq or Map can be very big. Converting them to JSON may take huge memory, which may trigger out of memory error.
  2. Some user space input may also be propagated to the Plan. The user space input can be of arbitrary type, and may also be self-referencing. Trying to print user space input to JSON may trigger out of memory error or stack overflow error.

For a code example, please check the Jira description of SPARK-17426.

In this PR, we refactor the TreeNode.toJSON so that we only convert a field to JSON string if the field is a safe type.

How was this patch tested?

Unit test.

@SparkQA
Copy link

SparkQA commented Sep 7, 2016

Test build #65020 has finished for PR 14990 at commit 284d780.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 7, 2016

Test build #65022 has finished for PR 14990 at commit 33983e5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@clockfly
Copy link
Contributor Author

clockfly commented Sep 8, 2016

@cloud-fan Can you take a look?

name -> JInt(children.indexOf(value))
case (name, value: Seq[BaseType]) if value.toSet.subsetOf(containsChild) =>
// Check the value (Seq[BaseType]) element type first before converting it to a Set.
// Otherwise, it may take a lot of memory to convert a super big Seq to Set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about use forall here? if values.forall(containsChild)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.

("product-class" -> JString(p.getClass.getName)) :: fieldNames.zip(fieldValues).map {
case (name, value) => name -> parseToJson(value)
("product-class" -> JString(p.getClass.getName)) :: fieldNames.zip(fieldValues).collect {
// Only converts String fields in Product to JSON
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we handle primitive types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other types are not handled, as I think there are not very useful for documentation purpose.

@SparkQA
Copy link

SparkQA commented Sep 8, 2016

Test build #65084 has finished for PR 14990 at commit 16e7896.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 8, 2016

Test build #65091 has finished for PR 14990 at commit 22b8804.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 8, 2016

Test build #65094 has finished for PR 14990 at commit a4b04b4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 16, 2016

Test build #65467 has finished for PR 14990 at commit 996e392.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 16, 2016

Test build #65468 has finished for PR 14990 at commit ccdda37.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

LGTM, pending jenkins

@SparkQA
Copy link

SparkQA commented Sep 16, 2016

Test build #65473 has finished for PR 14990 at commit 1451753.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 16, 2016

Test build #65471 has finished for PR 14990 at commit d6838d0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@clockfly
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Sep 16, 2016

Test build #65480 has finished for PR 14990 at commit 1451753.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in a425a37 Sep 16, 2016
wgtmac pushed a commit to wgtmac/spark that referenced this pull request Sep 19, 2016
…rting unknown fields to JSON

## What changes were proposed in this pull request?

This PR is a follow up of SPARK-17356. Current implementation of `TreeNode.toJSON` recursively converts all fields of TreeNode to JSON, even if the field is of type `Seq` or type Map. This may trigger out of memory exception in cases like:

1. the Seq or Map can be very big. Converting them to JSON may take huge memory, which may trigger out of memory error.
2. Some user space input may also be propagated to the Plan. The user space input can be of arbitrary type, and may also be self-referencing. Trying to print user space input to JSON may trigger out of memory error or stack overflow error.

For a code example, please check the Jira description of SPARK-17426.

In this PR, we refactor the `TreeNode.toJSON` so that we only convert a field to JSON string if the field is a safe type.

## How was this patch tested?

Unit test.

Author: Sean Zhong <seanzhong@databricks.com>

Closes apache#14990 from clockfly/json_oom2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants