Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion docs/sql-programming-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -1371,7 +1371,10 @@ the Data Sources API. The following options are supported:
<td>
These options must all be specified if any of them is specified. They describe how to
partition the table when reading in parallel from multiple workers.
<code>partitionColumn</code> must be a numeric column from the table in question.
<code>partitionColumn</code> must be a numeric column from the table in question. Notice
that <code>lowerBound</code> and <code>upperBound</code> are just used to decide the
partition stride, not for filtering the rows in table. So all rows in the table will be
partitioned and returned.
</td>
</tr>
</table>
Expand Down
4 changes: 2 additions & 2 deletions sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
Original file line number Diff line number Diff line change
Expand Up @@ -873,8 +873,8 @@ class SQLContext(@transient val sparkContext: SparkContext)
* passed to this function.
*
* @param columnName the name of a column of integral type that will be used for partitioning.
* @param lowerBound the minimum value of `columnName` to retrieve
* @param upperBound the maximum value of `columnName` to retrieve
* @param lowerBound the minimum value of `columnName` used to decide partition stride
* @param upperBound the maximum value of `columnName` used to decide partition stride
* @param numPartitions the number of partitions. the range `minValue`-`maxValue` will be split
* evenly into this many partitions
*
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,9 +50,11 @@ private[sql] object JDBCRelation {
* Given a partitioning schematic (a column of integral type, a number of
* partitions, and upper and lower bounds on the column's value), generate
* WHERE clauses for each partition so that each row in the table appears
* exactly once. The parameters minValue and maxValue are advisory in that
* exactly once. The parameters minValue and maxValue are advisory in that
* incorrect values may cause the partitioning to be poor, but no data
* will fail to be represented.
* will fail to be represented. Note: the upper and lower bounds are just
* used to decide partition stride, not for filtering. So all the rows in
* table will be partitioned.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameters minValue and maxValue are advisory in that incorrect values may cause the partitioning to be poor, but no data will fail to be represented.

The sentence above already explains that the filters are only used for partitioning and that all data will always be returned. I think the best place to update would be in the SQL programming guide, in the table under the section "JDBC To Other Databases".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

*/
def columnPartition(partitioning: JDBCPartitioningInfo): Array[Partition] = {
if (partitioning == null) return Array[Partition](JDBCPartition(null, 0))
Expand Down