Add lambda function and array related functions #3584

xinyual · 2025-04-27T06:03:03Z

Description

This pr adds lambda function and array related functions. Calcite don't have array related functions so we need to implement by ourselves.
Now the logic for lambda is:
We will consider lambda function as a new PPL expression and parse it regularly to construct rexnode. To get return type for lambda expression, we need to firstly map the argument type in the calciteContext. For example, forall(array(1, 2, 3), x -> x > 0), then x -> INTEGER.
We also have an exception for reduce because the acc is the dynamic type.
The calcite/lin4j generate code according to the input type. For example, reduce(array(1.0,2.0 ,3.0), 0, (acc, x) -> acc + x). Ideally, we should map acc -> INTEGER, x -> DOUBLE. But if we map through this, the code of + would be plus(INTERGER acc, DOUBLE x), then after first apply, the acc would be double, then it will throw exception. Thus, we apply ANY to the acc and infer the return type in getReturnTypeInference

The function is aligned with https://github.com/opensearch-project/opensearch-spark/blob/main/docs/ppl-lang/functions/ppl-collection.md

TODO: nested object is not supported in lambda currently. It will be automatically supported when we support this. E.g. x -> x.a > 0

For detailed implementation and description:

Functions	argument	description	return type	implementation
ARRAY	ARRAY(value1: ANY, value2:ANY, ...)	create an array with input values. Currently we don't allow mixture types. We will infer a least restricted type, for example array(1, "demo") -> ["1", "demo"]	ARRAY	wrap `SqlLibraryOperators.ARRAY`
ARRAY_LENGTH	ARRAY_LENGTH(value: ARRAY)	return array length	integer	`SqlLibraryOperators.ARRAY_LENGTH`
FORALL	forall(value:ARRAY, function: LAMBDA)	check whether all element inside array can meet the lambda function. The function should also return boolean.	boolean	implement by ourselves since we cannot find matched built-in calcite one.
EXISTS	exists(value:ARRAY, function: LAMBDA)	check whether existing one of element inside array can meet the lambda function. The function should also return boolean.	boolean	implement by ourselves since we cannot find matched built-in calcite one.
FILTER	filter(value:ARRAY, function: LAMBDA)	filter the element in the array by the lambda function. The function should return boolean	array	implement by ourselves since we cannot find matched built-in calcite one.
TRANSFORM	transform(value:ARRAY, function: LAMBDA)	transform the element of array one by one using lambda. Transform can accept one more argument like (x, i) -> x + i, where i is the index of element in array.	array	implement by ourselves since we cannot find matched built-in calcite one.
REDUCE	reduce(value:ARRAY, base_value:ANY, acc_function: LAMBDA)/reduce(value:ARRAY, base_value:ANY, acc_function: LAMBDA, reduce_function:LAMBDA)	The function will first use acc_function to go through all element and return value to the acc. Then apply reduce function to the acc if exists. The acc_function's lambda format is (acc,x) -> ..., the reduce_function format is (acc) -> ...	ANY, according to the lambda function	implement by ourselves since we cannot find matched built-in calcite one.

Related Issues

Resolves #[Issue number to be closed when this PR is merged]
#3575

Check List

New functionality includes testing.
New functionality has been documented.
New functionality has javadoc added.
New functionality has a user manual doc added.
API changes companion pull request created.
Commits are signed per the DCO using --signoff.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: xinyual <xinyual@amazon.com>

LantaoJin · 2025-06-04T01:54:12Z

core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java

+                  arguments,
+                  node.getFuncName(),
+                  lambdaNode.getType());
+          lambdaNode = analyze(arg, lambdaContext);


why analyze reduce twice?

Reduce is very special case since it will sometimes change the type of accumulator. For example, reduce([1.0, 2.0], 0, (acc, x) -> acc + x, acc -> acc * 10). Here the lambda (acc, x) -> acc + x firstly will find (integer, double) and then during the iteration, find (double, double). Current solution is we will first analyze once and find the return type is double, then use double as the expected input and cast the initial value of acc to the expected type.

Does it necessary to detect a non-any type in analyzing phase? What if the input list has type of ARRAY?

And does it make sense to infer the return type by using leastRestrictive(arg0. getComponentType(), arg1.getType() instead of analyzing twice.

Does it necessary to detect a non-any type in analyzing phase? What if the input list has type of ARRAY?

ANY would block the implementation in two parts: 1. The UDF part sometimes needs type to choose implementation 2. any would also be blocker for type checker. For example, we use calcite multiply, which only support numeric/interval * numeric/interval, any would throw exception when check the type.

And does it make sense to infer the return type by using leastRestrictive(arg0. getComponentType(), arg1.getType() instead of analyzing twice.

leastRestrictive(arg0. getComponentType(), arg1.getType() doesn't work here. For example, acc=0, (acc, x) -> acc + length(x) * 1.0 would return double, which means we need to cast acc base value to double. But leastRestrictive(integer, string) wouldn't be double.

LantaoJin · 2025-06-04T01:56:04Z

core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ArrayFunctionImpl.java

+   * @return We wrap it here to accept null since the original return type inference will generate
+   *     non-nullable type


Does Spark array accept null either? Why we do this wrap?

Yes. Spark array accept null.

LantaoJin · 2025-06-04T02:00:47Z

docs/user/ppl/functions/collection.rst

+
+Version: 3.1.0
+
+Usage: ``array(value1, value2, value3...)`` create an array with input values. Currently we don't allow mixture types. We will infer a least restricted type, for example ``array(1, "demo")`` -> ["1", "demo"]


Question: what is the reason to support infer a least restricted type instead of throwing exception?

please add a Limitation: after each Usage: to explain these functions only work with plugins.calcite.enabled=true

Question: what is the reason to support infer a least restricted type instead of throwing exception?

This is aligned with SPARK.

Signed-off-by: xinyual <xinyual@amazon.com>

LantaoJin · 2025-06-07T03:37:27Z

core/src/main/java/org/opensearch/sql/expression/function/PPLTypeChecker.java

+            // case DATETIME_INTERVAL ->
+            // SqlTypeName.INTERVAL_TYPES.stream().map(OpenSearchTypeFactory.TYPE_FACTORY::createSqlIntervalType).toList();


how DATETIME_INTERVAL impact reduce?

It's a useless change, I will revert it.

Signed-off-by: xinyual <xinyual@amazon.com>

qianheng-aws · 2025-06-10T03:25:58Z

core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java

+                  arguments,
+                  node.getFuncName(),
+                  lambdaNode.getType());
+          lambdaNode = analyze(arg, lambdaContext);


Does it necessary to detect a non-any type in analyzing phase? What if the input list has type of ARRAY?

qianheng-aws · 2025-06-10T03:30:14Z

core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java

+                  arguments,
+                  node.getFuncName(),
+                  lambdaNode.getType());
+          lambdaNode = analyze(arg, lambdaContext);


And does it make sense to infer the return type by using leastRestrictive(arg0. getComponentType(), arg1.getType() instead of analyzing twice.

qianheng-aws · 2025-06-10T03:38:07Z

core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ArrayFunctionImpl.java

+import org.opensearch.sql.expression.function.UDFOperandMetadata;
+
+// TODO: Support array of mixture types.
+public class ArrayFunctionImpl extends ImplementorUDF {


I would be great to add description and a simple example for each function. It should only show the functionality of this function and as simple as possible. For example, array(1, 2, 3) -> [1, 2, 3] would be enough for array function.

It will improve the code readability for developer, different from the doc for customer. You can do it later in another PR

Already add for each. Please check.

Signed-off-by: xinyual <xinyual@amazon.com>

* add forall Signed-off-by: xinyual <xinyual@amazon.com> * add filter/exists/ Signed-off-by: xinyual <xinyual@amazon.com> * add reduce Signed-off-by: xinyual <xinyual@amazon.com> * add return type inference Signed-off-by: xinyual <xinyual@amazon.com> * fix exists Signed-off-by: xinyual <xinyual@amazon.com> * add map for lambda Signed-off-by: xinyual <xinyual@amazon.com> * add infer for reduce Signed-off-by: xinyual <xinyual@amazon.com> * add java doc Signed-off-by: xinyual <xinyual@amazon.com> * revert useless change Signed-off-by: xinyual <xinyual@amazon.com> * renane Signed-off-by: xinyual <xinyual@amazon.com> * fix g4 Signed-off-by: xinyual <xinyual@amazon.com> * fix g4 Signed-off-by: xinyual <xinyual@amazon.com> * fix g4 file Signed-off-by: xinyual <xinyual@amazon.com> * apply spotless Signed-off-by: xinyual <xinyual@amazon.com> * test Signed-off-by: xinyual <xinyual@amazon.com> * use builtin operator Signed-off-by: xinyual <xinyual@amazon.com> * add array_length with test Signed-off-by: xinyual <xinyual@amazon.com> * optimize reduce Signed-off-by: xinyual <xinyual@amazon.com> * add UT Signed-off-by: xinyual <xinyual@amazon.com> * fix reduce and add doc Signed-off-by: xinyual <xinyual@amazon.com> * revert useless change Signed-off-by: xinyual <xinyual@amazon.com> * add doc Signed-off-by: xinyual <xinyual@amazon.com> * add type checker Signed-off-by: xinyual <xinyual@amazon.com> * fix ARRAY Signed-off-by: xinyual <xinyual@amazon.com> * optimize reduce logic Signed-off-by: xinyual <xinyual@amazon.com> * revert useless change Signed-off-by: xinyual <xinyual@amazon.com> * revert useless change Signed-off-by: xinyual <xinyual@amazon.com> * add description for each function Signed-off-by: xinyual <xinyual@amazon.com> * fix redundency error Signed-off-by: xinyual <xinyual@amazon.com> * fix redundency name Signed-off-by: xinyual <xinyual@amazon.com> --------- Signed-off-by: xinyual <xinyual@amazon.com>

xinyual added 11 commits April 23, 2025 16:49

add forall

06fe11a

Signed-off-by: xinyual <xinyual@amazon.com>

add filter/exists/

31733dd

Signed-off-by: xinyual <xinyual@amazon.com>

add reduce

e305747

Signed-off-by: xinyual <xinyual@amazon.com>

add return type inference

8a9d024

Signed-off-by: xinyual <xinyual@amazon.com>

fix exists

689534b

Signed-off-by: xinyual <xinyual@amazon.com>

add map for lambda

2cc41d8

Signed-off-by: xinyual <xinyual@amazon.com>

add infer for reduce

bacccdf

Signed-off-by: xinyual <xinyual@amazon.com>

add java doc

013f7df

Signed-off-by: xinyual <xinyual@amazon.com>

merge from main

9fb35fe

Signed-off-by: xinyual <xinyual@amazon.com>

revert useless change

3d465e6

Signed-off-by: xinyual <xinyual@amazon.com>

renane

8051c52

Signed-off-by: xinyual <xinyual@amazon.com>

xinyual marked this pull request as ready for review April 27, 2025 06:05

xinyual requested review from GumpacG, LantaoJin, MaxKsyunz, Swiddis, YANG-DB, Yury-Fridlyand, acarbonetto, anirudha, dai-chen, derek-ho, forestmvey, joshuali925, kavithacm, mengweieric, penghuo, ps48, seankao-az and ykmr1224 as code owners April 27, 2025 06:05

xinyual added 5 commits May 30, 2025 15:05

optimize reduce

7f9c6ec

Signed-off-by: xinyual <xinyual@amazon.com>

add UT

17a19d3

Signed-off-by: xinyual <xinyual@amazon.com>

fix reduce and add doc

70d6a7a

Signed-off-by: xinyual <xinyual@amazon.com>

revert useless change

49c3794

Signed-off-by: xinyual <xinyual@amazon.com>

add doc

b85aafa

Signed-off-by: xinyual <xinyual@amazon.com>

LantaoJin reviewed Jun 4, 2025

View reviewed changes

xinyual added 6 commits June 4, 2025 15:41

Merge remote-tracking branch 'origin/main' into addCollection

06d746f

Merge remote-tracking branch 'origin/main' into addCollection

9cffad0

add type checker

47d4189

Signed-off-by: xinyual <xinyual@amazon.com>

fix ARRAY

e7acda2

Signed-off-by: xinyual <xinyual@amazon.com>

optimize reduce logic

e4f8e5e

Signed-off-by: xinyual <xinyual@amazon.com>

revert useless change

23fa5d3

Signed-off-by: xinyual <xinyual@amazon.com>

LantaoJin previously approved these changes Jun 7, 2025

View reviewed changes

revert useless change

32bfaa8

Signed-off-by: xinyual <xinyual@amazon.com>

xinyual dismissed LantaoJin’s stale review via 32bfaa8 June 9, 2025 02:47

LantaoJin previously approved these changes Jun 9, 2025

View reviewed changes

qianheng-aws reviewed Jun 10, 2025

View reviewed changes

xinyual added 2 commits June 10, 2025 13:31

add description for each function

adb8997

Signed-off-by: xinyual <xinyual@amazon.com>

merge from main

13b64c2

Signed-off-by: xinyual <xinyual@amazon.com>

xinyual dismissed LantaoJin’s stale review via 13b64c2 June 10, 2025 05:38

xinyual added 2 commits June 10, 2025 13:48

fix redundency error

c857635

Signed-off-by: xinyual <xinyual@amazon.com>

fix redundency name

53e9918

Signed-off-by: xinyual <xinyual@amazon.com>

qianheng-aws approved these changes Jun 10, 2025

View reviewed changes

LantaoJin approved these changes Jun 10, 2025

View reviewed changes

LantaoJin merged commit 122ae79 into opensearch-project:main Jun 10, 2025
22 checks passed

This was referenced Jun 16, 2025

[META]Add PPL JSON extended functions support #3027

Closed

[RFC] Add PPL JSON extended functions support #3028

Closed

ps48 mentioned this pull request Jul 10, 2025

Update ppl documentation index for new functions #3868

Merged

7 tasks

tkykenmt mentioned this pull request Jan 10, 2026

[feature] SQL/PPL General Enhancements tkykenmt/opensearch-feature-explorer#847

Closed

3 tasks

		* @return We wrap it here to accept null since the original return type inference will generate
		* non-nullable type


		Version: 3.1.0

		Usage: ``array(value1, value2, value3...)`` create an array with input values. Currently we don't allow mixture types. We will infer a least restricted type, for example ``array(1, "demo")`` -> ["1", "demo"]

		// case DATETIME_INTERVAL ->
		// SqlTypeName.INTERVAL_TYPES.stream().map(OpenSearchTypeFactory.TYPE_FACTORY::createSqlIntervalType).toList();

Add lambda function and array related functions #3584

Add lambda function and array related functions #3584

Uh oh!

Conversation

xinyual commented Apr 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Check List

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xinyual Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xinyual Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xinyual commented Apr 27, 2025 •

edited

Loading

xinyual Jun 5, 2025 •

edited

Loading

xinyual Jun 10, 2025 •

edited

Loading