-
Notifications
You must be signed in to change notification settings - Fork 31
Prevent insert queries from failing to parse when using "vector" as a column name #495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Possibly useful for testing purposes: |
| | K_TIMEUUID | ||
| | K_DATE | ||
| | K_TIME | ||
| | K_VECTOR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this a bit surprising. I haven't followed all the fuss around the introduction of this new type, but isn't it a complex type? As in: it can have parameters, e.g. vector<float, 3>. I don't think the current grammar would parse that.
Granted, type declarations are not very useful for DSBulk because it doesn't understand DDL statements. But they can still appear in SELECT or INSERT statements, notably when casting some value. Not sure if SELECT (vector<float, 3>) col1 FROM... makes sense, but someone might have a good usage for that.
Wouldn't it be better to make a little effort and properly create a rule for vectorType?
As for the problem of having the token VECTOR be considered an unreserved keyword: I see that in Cassandra's grammar it's listed under the basic_unreserved_keyword rule:
Maybe we can add it under basicUnreservedKeyword in DSBulk's grammar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These aren't unreasonable points @adutra. I think it's probably worth a brief explanation as to how I got here but I strongly suspect it won't change the fundamental analysis very much.
In both the Astra and OSS Cassandra case "vector" is being implemented as a custom type (largely in order to avoid introducing a new protocol version). Think very much along the lines of what was done for duration in protocol v4. vector is slightly different because it's qualified by a subtype and a size which is something we haven't had to support for custom types to date.... but otherwise it's very similar to duration.
All of that said I don't disagree with your fundamental point; I think we can do a better job of matching up to what Cassandra does. There is an additional complication in that Astra only supports vectors of floats while OSS Cassandra supports vectors of arbitrary subtypes... but again I don't think that changes our process too much. The grammar used by dsbulk should aim to support the broadest possible case; if we wind up sending vectors of strings to Astra it'll fail for other reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So after looking at this a bit I offer the following to reinforce @adutra 's point above:
cqlsh> select (vector<float,1>)[0.123] from system.local;
(org.apache.cassandra.cql3.CQL3Type$Raw$RawVector@2c7ea098)[0.123]
--------------------------------------------------------------------
b'=\xfb\xe7m'
(1 rows)
Pay no attention to the byte display problem there; that's a known issue with the version of cqlsh + Python driver I'm using.
Worth noting that CAST operations are not a concern here.
cqlsh> select CAST (broadcast_address as vector<float,1>) from system.local;
SyntaxException: line 1:34 no viable alternative at input 'vector' (select CAST (broadcast_address as [vector]...)
Rationale is that cast only works with native types... and vector is very explicitly not a native type.
|
Thanks for the review @adutra; your points are well-taken. I think I have a good working impl incorporating your suggestions... please take another look when you have a sec! |
adutra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the thorough explanations @absurdfarce. And yes, the grammar is looking much nicer 👍
|
|
||
| @ParameterizedTest | ||
| @MethodSource | ||
| void should_detect_unsupported_vector_selector(String query, boolean expected) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: do we really need a parameterized test here? Are you anticipating more test cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bah, good catch. I think this is a holdover from when I was originally planning on some other tests which would involve various permutations of quoted values. Obviously that doesn't apply at all here... I'll change it.
| import org.junit.jupiter.params.provider.Arguments; | ||
| import org.junit.jupiter.params.provider.MethodSource; | ||
|
|
||
| import javax.management.Query; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mistakenly added during a refactoring; removing.
Problem appeared to be that "vector" was added as a keyword but wasn't specified as an unreserved keyword. Added it to the parsers collection of native type names which (a) also makes it an unreserved keyword and (b) seems more correct on it's face anyway.