Skip to content

Conversation

@JoshRosen
Copy link
Contributor

This patch modifies JDBCWrapper.schemaString to wrap column names in quotes, which is necessary in order to allow us to create tables with columns whose names are reserved words or which contain spaces. This fixes #80.

Note that, by itself, this patch does not enable full support for creating Redshift tables with column names that contain spaces; we are currently constrained by Avro's schema validation rules (see #84).

@JoshRosen JoshRosen added the bug label Sep 12, 2015
@JoshRosen JoshRosen added this to the 0.5.1 milestone Sep 12, 2015
@JoshRosen
Copy link
Contributor Author

@marmbrus, do you think that we should perform similar quoting in Spark SQL's built-in JDBC datasource? Is this type of quoting a dialect-specific thing? These questions aren't blockers to making this change here in spark-redshift, but I just wanted to briefly consider those questions to make sure that we're not overlooking potential bugs in Spark.

@codecov-io
Copy link

Current coverage is 94.59%

Merging #85 into master will not affect coverage as of 9f763ea

@@            master     #85   diff @@
======================================
  Files           11      11       
  Stmts          444     444       
  Branches       105     105       
  Methods          0       0       
======================================
  Hit            420     420       
  Partial          0       0       
  Missed          24      24       

Review entire Coverage Diff as of 9f763ea

Powered by Codecov. Updated on successful CI builds.

@marmbrus
Copy link
Contributor

Yeah, this is a known issue in Spark (SPARK-9505) as well. However there I think we will have to work it into dialects as I think different systems use different quoting mechanisms (at least Spark SQL is different than MySQL).

Do we need to do the same thing when querying such columns? or do we already escape there?

@rxin
Copy link
Contributor

rxin commented Sep 13, 2015

The JDBC dialect dev API already has a quoting mechanism defined.

On Sep 13, 2015, at 12:00 PM, Michael Armbrust notifications@github.com
wrote:

Yeah, this is a known issue in Spark (SPARK-9505
https://issues.apache.org/jira/browse/SPARK-9505) as well. However there
I think we will have to work it into dialects as I think different systems
use different quoting mechanisms (at least Spark SQL is different than
MySQL).

Do we need to do the same thing when querying such columns? or do we
already escape there?


Reply to this email directly or view it on GitHub
#85 (comment)
.

@JoshRosen
Copy link
Contributor Author

@marmbrus, we already wrap in quotes when querying; it looks like we were just missing support for this when creating tables.

@JoshRosen
Copy link
Contributor Author

Going to merge this now.

@JoshRosen
Copy link
Contributor Author

Added an unload to this test, just to make it clear that the read path is also covered.

@JoshRosen JoshRosen closed this in 4dcf6e9 Sep 14, 2015
@JoshRosen JoshRosen deleted the column-name-escaping branch September 14, 2015 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reserved words cannot be used as column names when writing back to Redshift

5 participants