Skip to content

[SPARK-27256][CORE][SQL]If the configuration is used to set the number of bytes, we'd better use bytesConf'.#24187

Closed
10110346 wants to merge 1 commit intoapache:masterfrom
10110346:bytesConf
Closed

[SPARK-27256][CORE][SQL]If the configuration is used to set the number of bytes, we'd better use bytesConf'.#24187
10110346 wants to merge 1 commit intoapache:masterfrom
10110346:bytesConf

Conversation

@10110346
Copy link
Copy Markdown
Contributor

@10110346 10110346 commented Mar 23, 2019

What changes were proposed in this pull request?

Currently, if we want to configure spark.sql.files.maxPartitionBytes to 256 megabytes, we must set spark.sql.files.maxPartitionBytes=268435456, which is very unfriendly to users.

And if we set it like this:spark.sql.files.maxPartitionBytes=256M, we will encounter this exception:

Exception in thread "main" java.lang.IllegalArgumentException:
 spark.sql.files.maxPartitionBytes should be long, but was 256M
        at org.apache.spark.internal.config.ConfigHelpers$.toNumber(ConfigBuilder.scala)

This PR use bytesConf to replace longConf or intConf, if the configuration is used to set the number of bytes.
Configuration change list:
spark.files.maxPartitionBytes
spark.files.openCostInBytes
spark.shuffle.sort.initialBufferSize
spark.shuffle.spill.initialMemoryThreshold
spark.sql.autoBroadcastJoinThreshold
spark.sql.files.maxPartitionBytes
spark.sql.files.openCostInBytes
spark.sql.defaultSizeInBytes

How was this patch tested?

1.Existing unit tests
2.Manual testing

@SparkQA
Copy link
Copy Markdown

SparkQA commented Mar 23, 2019

Test build #103839 has finished for PR 24187 at commit 761e2b7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check if the input value exists in the integer range?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah,thanks

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is a net helpful change, as the parameter is maxPartitionBytes. I agree it would have been better to call it maxPartitionSize and accept values like "10m". I'm not strongly against it, as existing values would still work.

For other property values without "Bytes", I agree.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.
Yeah, the parameter name is a bit confusing, but I think it is not very important whether the parameter name contains "Bytes" or not, I prefer to change it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this change since both styles (i.e. 1024 and 1k) can be accepted.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'm fine with it.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Mar 25, 2019

Test build #103881 has finished for PR 24187 at commit ef67451.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@10110346
Copy link
Copy Markdown
Contributor Author

retest this please

@maropu
Copy link
Copy Markdown
Member

maropu commented Mar 25, 2019

We could have the same fix below, too?

Anyway, have you checked all the related places for the same fix?

@SparkQA
Copy link
Copy Markdown

SparkQA commented Mar 25, 2019

Test build #103894 has finished for PR 24187 at commit ef67451.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@10110346
Copy link
Copy Markdown
Contributor Author

10110346 commented Mar 25, 2019

We could have the same fix below, too?

spark/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala

Line 61 in 8ec6cb6
val FILESOURCE_TABLE_RELATION_CACHE_SIZE =

Anyway, have you checked all the related places for the same fix?

Thanks.
FILESOURCE_TABLE_RELATION_CACHE_SIZE is used to configure the number of entries, not the number of bytes.
I've checked where buildConf and ConfigBuilder are called.

@10110346
Copy link
Copy Markdown
Contributor Author

retest this please

@maropu
Copy link
Copy Markdown
Member

maropu commented Mar 25, 2019

ok, to make other reviewers understood easily, could you list up all the configs that this pr changed in the PR description?

@10110346
Copy link
Copy Markdown
Contributor Author

ok, to make other reviewers understood easily, could you list up all the configs that this pr changed in the PR description?

Ok, thanks.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we dont' have this long cast?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't convert to long first , it will encounter exception like this:
Caused by: java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz update the message, too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks

Copy link
Copy Markdown
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and I leave it to other reviwers. cc: @cloud-fan @srowen @dongjoon-hyun

@SparkQA
Copy link
Copy Markdown

SparkQA commented Mar 25, 2019

Test build #103900 has finished for PR 24187 at commit ef67451.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Mar 25, 2019

Test build #103909 has finished for PR 24187 at commit 206857d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Copy Markdown
Contributor

thanks, merging to master!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants