Skip to content

Array overlap to allow numeric operand#15964

Closed
sreemanamala wants to merge 10 commits intoapache:masterfrom
sreemanamala:array-contains-overlap
Closed

Array overlap to allow numeric operand#15964
sreemanamala wants to merge 10 commits intoapache:masterfrom
sreemanamala:array-contains-overlap

Conversation

@sreemanamala
Copy link
Copy Markdown
Contributor

@sreemanamala sreemanamala commented Feb 26, 2024

Description

Allow ARRAY_OVERLAP operator to take numeric value as input
Allow null & double values as elements of array passed as sql parameter

Release note


Key changed/added classes in this PR
  • ArrayOverlapOperatorConversion
  • SqlParameterizerShuttle
  • CalciteArraysQueryTest

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Comment on lines +62 to +63
OperandTypes.family(SqlTypeFamily.STRING),
OperandTypes.family(SqlTypeFamily.NUMERIC)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like its possible to possible to specify operands with different types
could you add some tests like:

SELECT dim3 FROM druid.numfoo WHERE ARRAY_OVERLAP(l1||'.1', ARRAY[2.1, 7.1]) LIMIT 5
SELECT dim3 FROM druid.numfoo WHERE ARRAY_OVERLAP(l1||'.1', ARRAY[2.1, 7.1]) LIMIT 5
SELECT dim3 FROM druid.numfoo WHERE ARRAY_OVERLAP(ARRAY[l2], ARRAY[2, 7]) LIMIT 5
SELECT dim3 FROM druid.numfoo WHERE ARRAY_OVERLAP(ARRAY['x',l2||''], ARRAY['2', '7']) LIMIT 5
SELECT dim3 FROM druid.numfoo WHERE ARRAY_OVERLAP(ARRAY[2,7], ARRAY['2', '7']) LIMIT 5
SELECT dim3 FROM druid.numfoo WHERE ARRAY_OVERLAP(ARRAY[2,7], ARRAY[2.0,7.0]) LIMIT 5

it seems like type mismatch was already there in some cases (array<int>,array<string>) - would it be possible to reject invalid cases?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are being taken care of in this PR
But rather than rejecting them, we would cast them properly and return correct response

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't explicitly see these cases being covered in that PR (or I missed it).
Can we cover them somehow?

@sreemanamala sreemanamala changed the title Array contains overlap Array overlap to allow numeric operand Feb 26, 2024
} else if (element instanceof String) {
node = SqlLiteral.createCharString((String) element, SqlParserPos.ZERO);
} else if (element instanceof Integer || element instanceof Long) {
} else if (element instanceof Integer || element instanceof Long || element instanceof Double) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, createExactNumeric doesn't really seem appropriate for Double, shouldn't that be handled with a separate else case, something like

...
      } else if (element instanceof Float || element instanceof Double) {
        node = SqlLiteral.createApproxNumeric(element.toString(), SqlParserPos.ZERO);
      }
...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even I thought so. But little worried about in filter converting them to Strings. Havent completely gone through it. Let me check and change it to approx numeric.

sreemanamala and others added 5 commits March 7, 2024 08:30
…5920)

* MSQ: Validate that strings and string arrays are not mixed.

When multi-value strings and string arrays coexist in the same column,
it causes problems with "classic MVD" style queries such as:

  select * from wikipedia -- fails at runtime
  select count(*) from wikipedia where flags = 'B' -- fails at planning time
  select flags, count(*) from wikipedia group by 1 -- fails at runtime

To avoid these problems, this patch adds type verification for INSERT
and REPLACE. It is targeted: the only type changes that are blocked are
string-to-array and array-to-string. There is also a way to exclude
certain columns from the type checks, if the user really knows what
they're doing.

* Fixes.

* Tests and docs and error messages.

* More docs.

* Adjustments.

* Adjust message.

* Fix tests.

* Fix test in DV mode.
@clintropolis
Copy link
Copy Markdown
Member

i think the only hesitation i have here is it feels odd that this is the only array function that allows numeric inputs, i wonder if we should just allow them all to accept any input since at the native layer we allow the array functions to work with anything, if it isn't already and array we make it a single element array. This is largely done to make stuff work with schema evolution, and also mvds fit this pattern, but SQL is currently more restrictive.. and probably shouldn't have allowed VARCHAR either if we werent going to allow ANY to be symmetrical with the native layer.

Need to think about it a bit.

Copy link
Copy Markdown
Member

@kgyrtkirk kgyrtkirk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please merge master - to get a fresh ci run?

Comment on lines +62 to +63
OperandTypes.family(SqlTypeFamily.STRING),
OperandTypes.family(SqlTypeFamily.NUMERIC)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't explicitly see these cases being covered in that PR (or I missed it).
Can we cover them somehow?

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Apr 18, 2024

IMO, rather than expanding the ARRAY functions to accept various kinds of scalars in array context, it's better to introduce a new function that explicitly accepts a scalar + an array. You're doing that already in #16306, so perhaps we can close this one?

@sreemanamala sreemanamala deleted the array-contains-overlap branch September 18, 2024 11:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants