From 23428a74b4fbe7809c9bf266ebfa64e31c299ae6 Mon Sep 17 00:00:00 2001 From: Brett Stime Date: Wed, 5 Apr 2017 12:57:26 -0500 Subject: [PATCH] Corrects interval notation in doc comment The random number generated by XORShiftRandom.nextDouble() is a value between zero and one, including zero but not including one. I.e., 0 <= x < 1 . I've denoted this by changing the closing square bracket to a closing parenthesis. You can also think of trying to uniformly randomly assign items in a list to three classes 'A', 'B' and 'C'. For each item, if {randomDouble * 3.0} is between 0.000 and 0.999, it gets assigned to A. If between 1.000 and 1.999, it goes to B. If between 2.000 and 2.999 it goes to C. All three classes have the same probability of receiving the item. If it were possible for the raw random number to be exactly 1.000, then after scaling the range by multiplying times 3.0 class C would be slightly more likely to receive the item than A or B (assuming simple logic instead of more extensive/expensive logic to break ties). Also, see the existing comment in SamplingUtils which uses the same function: https://github.com/apache/spark/blob/79f5f281bb69cb2de9f64006180abd753e8ae427/core/src/main/scala/org/apache/spark/util/random/SamplingUtils.scala#L62 https://en.wikipedia.org/wiki/Interval_(mathematics) --- sql/core/src/main/scala/org/apache/spark/sql/functions.scala | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala index f07e04368389f..8d30afa6cfeb7 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala @@ -1142,7 +1142,7 @@ object functions { /** * Generate a random column with independent and identically distributed (i.i.d.) samples - * from U[0.0, 1.0]. + * from U[0.0, 1.0). * * @note This is indeterministic when data partitions are not fixed. * @@ -1153,7 +1153,7 @@ object functions { /** * Generate a random column with independent and identically distributed (i.i.d.) samples - * from U[0.0, 1.0]. + * from U[0.0, 1.0). * * @group normal_funcs * @since 1.4.0