Skip to content

Conversation

@songkant-aws
Copy link
Contributor

@songkant-aws songkant-aws commented Jan 22, 2026

Description

Push down any UDAF as scripts to allow parallel evaluating sub aggregation result per shard and reduce them into a final aggregation result. We expect it will speed up some complex command like patterns or future UDAFs. Pending benchmark test.

Related Issues

Resolves #4354

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 22, 2026

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Optional pushdown of pattern aggregation to data nodes for distributed processing and improved performance
    • Option to show numbered tokens in pattern aggregation results
    • New cluster setting to enable/disable pattern aggregation pushdown
  • Documentation

    • Expanded docs explaining pushdown behavior, how to enable it, and memory/circuit-breaker considerations
  • Tests

    • Added integration and explain-plan tests covering pushdown and numbered-token scenarios

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Adds scripted-metric UDAF pushdown for pattern aggregation: introduces PatternAggregationHelpers, scripted-metric UDAF framework (UDAF interface, registry, per-phase factories, ScriptedMetricDataContext), Calcite/UDF integration, OpenSearch scripted-metric wiring, a runtime setting to toggle pushdown, tests, and docs/plan updates.

Changes

Cohort / File(s) Summary
Pattern Aggregation Helpers
common/src/main/java/org/opensearch/sql/common/patterns/PatternAggregationHelpers.java, core/src/main/java/org/opensearch/sql/calcite/udf/udaf/LogPatternAggFunction.java
New shared Map-based accumulator helpers for init/add/combine/result; LogPatternAggFunction refactored to delegate state updates and result production to helpers.
Settings & Runtime Flag
common/src/main/java/org/opensearch/sql/common/setting/Settings.java, opensearch/src/main/java/org/opensearch/sql/opensearch/setting/OpenSearchSettings.java
Added CALCITE_UDAF_PUSHDOWN_ENABLED key and a dynamic node-scoped OpenSearch setting to enable/disable UDAF pushdown.
Calcite UDFs & Function Wiring
core/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.java, core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java, core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java, core/src/main/java/org/opensearch/sql/calcite/utils/UserDefinedFunctionUtils.java
New builtin function names for pattern phases and adapters wrapping PatternAggregationHelpers static methods into Calcite UDF operators; helper to adapt static methods as UDFs.
Pattern Parsing & Projection
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java, core/src/main/java/org/opensearch/sql/expression/function/PatternParserFunctionImpl.java
Added flattenParsedPattern and buildEvalAggSamplesCall to project parsed pattern/tokens/sample_logs; new evalAggSamples path for aggregation-mode parsing including optional numbered tokens.
Scripted-metric UDAF Framework
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/scriptedmetric/ScriptedMetricUDAF.java, .../ScriptedMetricUDAFRegistry.java, .../ScriptedMetricDataContext.java, .../udaf/PatternScriptedMetricUDAF.java
New public interface for scripted-metric UDAFs (lifecycle methods), registry for UDAFs, DataContext implementations for phases, and PatternScriptedMetricUDAF that builds RexNode-based init/map/combine/reduce scripts.
Script Factories (per-phase)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/scriptedmetric/CalciteScriptedMetricInitScriptFactory.java, .../CalciteScriptedMetricMapScriptFactory.java, .../CalciteScriptedMetricCombineScriptFactory.java, .../CalciteScriptedMetricReduceScriptFactory.java
New factories that wrap compiled RexNode functions into ScriptedMetricAggContexts for init, map, combine, and reduce phases, binding DataContext/params/state/states.
Pushdown Integration & Request Analysis
opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java, opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
AggregateAnalyzer gains udafPushdownEnabled guard and delegates INTERNAL_PATTERN pushdown to ScriptedMetricUDAFRegistry; CalciteLogicalIndexScan propagates the pushdown flag to the analyzer helper.
Script Engine & Parameter Handling
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/CalciteScriptEngine.java, opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/ScriptParameterHelper.java
Added scripted-metric script contexts to CalciteScriptEngine; ScriptParameterHelper gains addSpecialVariable to register special-variable params for scripted-metric bindings.
Scripted-metric Result Parsing
opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/ScriptedMetricParser.java
New MetricParser implementation to parse scripted-metric aggregation results into List/Map structure expected by the engine.
OpenSearch value parsing
opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java
Enhanced array parsing flow with unified parseArray pre-check and improved handling of empty/mixed arrays and single-element arrays.
Tests & Expectations
integ-test/src/test/java/.../CalcitePPLPatternsIT.java, integ-test/src/test/java/.../ExplainIT.java, ppl/src/test/java/.../CalcitePPLPatternsTest.java, test resources under integ-test/src/test/resources/expectedOutput/..., opensearch/src/test/java/.../AggregateAnalyzerTest.java
Added integration tests toggling pushdown, updated explain-plan expected outputs for pushdown/no-pushdown plans, and adjusted unit tests for constructor signature changes.
Documentation
docs/user/ppl/cmd/patterns.md
Documented UDAF pushdown option, how to enable it, and operational cautions (memory/circuit-breaker).

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Coord as Coordinator
    participant DN1 as DataNode1
    participant DN2 as DataNode2
    participant DN3 as DataNode3
    participant Reducer as Reducer

    Client->>Coord: Submit pattern aggregation query (udaf pushdown enabled)
    Coord->>DN1: InitScript (state init)
    DN1->>DN1: MapScript (per-doc add -> buffer/partial merge)
    DN1->>DN1: CombineScript (emit shard state)
    Coord->>DN2: InitScript
    DN2->>DN2: MapScript
    DN2->>DN2: CombineScript
    Coord->>DN3: InitScript
    DN3->>DN3: MapScript
    DN3->>DN3: CombineScript
    DN1-->>Reducer: Shard state 1 (map)
    DN2-->>Reducer: Shard state 2
    DN3-->>Reducer: Shard state 3
    Reducer->>Reducer: ReduceScript (merge shard states -> PatternAggregationHelpers.produce)
    Reducer-->>Client: Final pattern results (List<Map<String,Object>>)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Possibly related PRs

  • #4868: Fixes LogPatternAggFunction buffer and return behavior — closely related to the refactor delegating accumulation/result to PatternAggregationHelpers.
  • #4914: Modifies ScriptParameterHelper and script pushdown infra — overlaps with added special-variable handling and scripted-metric integration.

Suggested labels

PPL, calcite

Suggested reviewers

  • LantaoJin
  • penghuo
  • ps48
  • kavithacm
  • derek-ho
  • joshuali925
  • anirudha
  • GumpacG
  • Swiddis
  • mengweieric
🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 37.30% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'General UDAF pushdown as scripts' clearly and concisely describes the main objective of this pull request: implementing pushdown of UDAFs as scripted metric aggregations.
Description check ✅ Passed The description is directly related to the changeset, explaining the motivation for UDAF pushdown, linking to the corresponding issue, and documenting the testing and documentation status.
Linked Issues check ✅ Passed The PR implementation meets the core requirements from issue #4354: enabling UDAF pushdown via scripted metric aggregations (init/map/combine/reduce pipeline), with pattern aggregation as a primary use case and supporting infrastructure.
Out of Scope Changes check ✅ Passed All code changes are in scope with the linked issue's objectives: new UDAF scripting infrastructure, pattern aggregation helpers, registry/factory implementations, and integration tests directly support the scripted metric pushdown feature.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
@yuancu yuancu added enhancement New feature or request pushdown pushdown related issues labels Jan 23, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🤖 Fix all issues with AI agents
In
`@common/src/main/java/org/opensearch/sql/common/patterns/PatternAggregationHelpers.java`:
- Around line 259-275: combinePatternAccumulators currently drops buffered
"logMessages" from acc1 and acc2; modify it to retrieve the "logMessages" lists
from both accumulators (cast to List<Map<String,Object>> or List<Object> as
appropriate), create a new ArrayList, addAll from acc1 then acc2 to preserve
order, and put that merged list into the result instead of a fresh empty list;
keep existing merging of "patternGroupMap" via PatternUtils.mergePatternGroups
and return the combined result so producePatternResult sees all buffered
messages.

In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`:
- Around line 3393-3417: The new private helper buildEvalAggSamplesCall (and the
INTERNAL_PATTERN_PARSER / evalAggSamples flow it invokes) lacks test coverage;
add unit or integration tests that exercise the end-to-end pattern aggregation
path: call the public API or caller that triggers buildEvalAggSamplesCall, feed
patterns with wildcards and sample_logs, and assert that
evalAggSamples/INTERNAL_PATTERN_PARSER produces numbered tokens (verify
showNumberedToken output or resulting transformed pattern and token list);
include tests for edge cases (no samples, multiple samples, complex wildcards)
and for the alternate branch where sample_logs is varchar to ensure
explicitMapType handling is covered.

In
`@core/src/main/java/org/opensearch/sql/calcite/udf/udaf/LogPatternAggFunction.java`:
- Around line 170-175: The Accumulator contract is violated:
LogPatternAggFunction.value(...) currently throws UnsupportedOperationException;
instead implement value(...) on LogParserAccumulator to return the
aggregated/pattern result (or wrapper state) so callers can obtain the final
value via value(...) just like FirstAggFunction/LastAggFunction/etc.; then
refactor LogPatternAggFunction.result() to call LogParserAccumulator.value(...)
(or delegate to PatternAggregationHelpers.producePatternResult(...) from within
the accumulator's value method) and remove the direct throw in value() so the
interface contract is honored (references: LogPatternAggFunction,
LogParserAccumulator, value(), result(),
PatternAggregationHelpers.producePatternResult()).

In
`@core/src/main/java/org/opensearch/sql/expression/function/PatternParserFunctionImpl.java`:
- Around line 138-151: The code in PatternParserFunctionImpl uses the boxed
Boolean showNumberedToken in a direct if(check) which can NPE when null; update
the conditional to a null-safe boolean test (e.g.
Boolean.TRUE.equals(showNumberedToken) as used in evalAggSamples) so the
parse/transform/extract block (ParseResult parseResult =
PatternUtils.parsePattern(...), outputPattern =
parseResult.toTokenOrderString(...), PatternUtils.extractVariables(...)) only
runs when showNumberedToken is explicitly true.

In
`@core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java`:
- Around line 490-531: The PATTERN_* UDFs are registered with null operand type
information and lack tests; replace the null operand-type arguments in the
UserDefinedFunctionUtils.adaptStaticMethodToUDF calls for PATTERN_INIT_UDF,
PATTERN_ADD_UDF, PATTERN_COMBINE_UDF and PATTERN_RESULT_UDF with explicit
operand type signatures matching the actual method parameter types in
PatternAggregationHelpers (e.g., types corresponding to initPatternState,
addLogToPattern's 6 parameters, combinePatternAccumulators,
producePatternResultFromStates), using the appropriate SqlTypeName entries and
SqlTypeUtil helpers to build array/map/any types where needed, and add unit
tests that exercise initPatternState, addLogToPattern (processing a log),
combinePatternAccumulators, and producePatternResultFromStates to validate
operand validation, runtime behavior, and result shapes.

In `@docs/user/ppl/cmd/patterns.md`:
- Around line 87-105: Update the earlier note that currently states aggregation
is "not executed on data nodes" to make it conditional on the UDAF pushdown
setting: explicitly say that when plugins.calcite.udaf_pushdown.enabled is true
(used with the patterns command when mode=aggregation and method=brain) the
aggregation may be pushed down and executed on data nodes as a scripted metric
aggregation, otherwise it runs locally on the coordinator; include a
cross-reference or short parenthetical pointing to the new "Enabling UDAF
pushdown" section for more details and the exact setting name.

In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java`:
- Around line 198-209: The array-handling path in OpenSearchExprValueFactory
routes unmapped arrays to parseArray even when supportArrays is false, and
parseArray currently calls content.array().next() which throws on empty arrays;
modify parseArray (or add a pre-check before calling it) to detect empty arrays
(e.g., !content.array().hasNext() or size==0) and short-circuit by returning an
empty array value (or ExprNullValue if project semantics prefer) instead of
attempting to read the first element; update parseArray and any callers
(referencing OpenSearchExprValueFactory and parseArray) to use this guard so
empty JSON arrays no longer cause NoSuchElementException.

In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java`:
- Around line 618-635: The INTERNAL_PATTERN branch in AggregateAnalyzer
currently throws UnsupportedOperationException when helper.udafPushdownEnabled
is false, which causes an uncaught exception to propagate; change the logic in
AggregateAnalyzer (the branch handling INTERNAL_PATTERN inside the
analyze/aggregation builder) so that when helper.udafPushdownEnabled is false it
does NOT throw but instead falls back to the non-pushdown path (i.e., skip or
bypass ScriptedMetricUDAFRegistry.INSTANCE.lookup and let normal aggregation
handling continue), so pattern aggregation degrades gracefully; update the
INTERNAL_PATTERN handling to only use
ScriptedMetricUDAFRegistry.lookup(functionName).map(...).orElseThrow(...) when
helper.udafPushdownEnabled is true.

In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/scriptedmetric/CalciteScriptedMetricCombineScriptFactory.java`:
- Around line 26-30: Add JavaDoc to the public factory method newInstance in
CalciteScriptedMetricCombineScriptFactory: document the parameters
(Map<String,Object> params, Map<String,Object> state) with `@param` tags and
describe what each represents, and add an `@return` tag describing that it returns
a ScriptedMetricAggContexts.CombineScript (specifically a new
CalciteScriptedMetricCombineScript configured with function and outputType).
Keep the JavaDoc concise and place it immediately above the newInstance method
declaration.

In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/scriptedmetric/CalciteScriptedMetricMapScriptFactory.java`:
- Around line 28-32: Add JavaDoc to the public method newFactory in
CalciteScriptedMetricMapScriptFactory: document each parameter with `@param` tags
for params, state, and lookup (describe their roles/types) and add an `@return`
tag that explains the returned ScriptedMetricAggContexts.MapScript.LeafFactory
(e.g., a CalciteMapScriptLeafFactory instance used to create leaf-level map
scripts). Keep the description concise and reference that the implementation
returns a new CalciteMapScriptLeafFactory constructed with function and
outputType.

In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/scriptedmetric/ScriptedMetricDataContext.java`:
- Around line 69-80: The method parseDynamicParamIndex currently calls
Integer.parseInt(name.substring(1)) without handling NumberFormatException;
update parseDynamicParamIndex to catch NumberFormatException around the parse
call (inside parseDynamicParamIndex) and rethrow an IllegalArgumentException
that includes the original parameter name (name) and a clear message like
"Malformed parameter name, expected '?N' pattern" so callers get an informative
error; keep the existing checks (startsWith("?") and the sources.size() bounds
check) and reference the methods/variables parseDynamicParamIndex and sources
when making the change.

In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/scriptedmetric/ScriptedMetricUDAF.java`:
- Around line 112-130: Add JavaDoc comments for the public accessors and the
class constructor in ScriptedMetricUDAF: document getRexBuilder(),
getParamHelper(), getCluster(), getRowType(), and getFieldTypes() with brief
descriptions and `@return` tags describing the returned type/meaning, and add a
short JavaDoc to the constructor describing its purpose and main parameters;
ensure each JavaDoc follows project style (one-line summary plus `@return` and
`@param` where applicable) and place them immediately above the corresponding
method/constructor declarations.
🧹 Nitpick comments (10)
opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java (1)

192-241: Consider splitting parse to reduce complexity.

parse(...) now exceeds 50 lines; extracting the array pre-check and type-dispatch branches into helpers would improve readability and maintainability. As per coding guidelines, consider refactoring.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/ScriptParameterHelper.java (1)

41-48: Document the new SPECIAL_VARIABLE source type.

The source types 0, 1, and 2 are documented in the comment block at lines 42-48, but the new SPECIAL_VARIABLE = 3 added at line 107 is not included in that documentation. Consider updating the comment block for consistency and maintainability.

📝 Suggested documentation update
   /**
    * Records the source of each parameter, it decides which kind of source to retrieve value.
    *
    * <p>0 stands for DOC_VALUE
    *
    * <p>1 stand for SOURCE
    *
    * <p>2 stands for LITERAL
+   *
+   * <p>3 stands for SPECIAL_VARIABLE (e.g., state, states in scripted metric aggregations)
    */
   List<Integer> sources;

Also applies to: 105-110

opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/ScriptedMetricParser.java (1)

19-28: Consider using @Getter to eliminate boilerplate.

The explicit getName() method can be replaced with Lombok's @Getter annotation on the class or field, reducing boilerplate.

♻️ Suggested simplification
 `@EqualsAndHashCode`
 `@RequiredArgsConstructor`
+@Getter
 public class ScriptedMetricParser implements MetricParser {

   private final String name;
-
-  `@Override`
-  public String getName() {
-    return name;
-  }
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/scriptedmetric/ScriptedMetricDataContext.java (1)

138-158: Consider logging or documenting silent null returns for debugging.

The get() method returns null silently when sourceLookup is null (line 145) or when doc values are missing/empty (lines 149-157). While this may be intentional for optional fields, it could mask configuration issues during debugging. Consider adding debug-level logging or documenting this behavior explicitly.

opensearch/src/test/java/org/opensearch/sql/opensearch/request/AggregateAnalyzerTest.java (1)

706-707: Consider adding test coverage for udafPushdownEnabled=true scenario.

The existing tests all use udafPushdownEnabled=false. To ensure comprehensive coverage of the new UDAF pushdown feature, consider adding a test case that exercises the udafPushdownEnabled=true path with a pattern aggregation call.

Would you like me to help generate a test case for the udafPushdownEnabled=true scenario, or open an issue to track this as a follow-up task?

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLPatternsIT.java (1)

550-550: Remove debug print statements before merging.

System.out.println(result.toString()) appears to be debug code left over from development. These should be removed or replaced with proper logging if needed for debugging purposes.

-      System.out.println(result.toString());

Also applies to: 616-616

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/scriptedmetric/CalciteScriptedMetricReduceScriptFactory.java (1)

37-38: Unused outputType field in inner class.

The outputType field is stored but never used in CalciteScriptedMetricReduceScript. This is consistent with the sibling factory classes (Combine, Map, Init), so it may be intentional for future extensibility or API consistency. Consider adding a brief comment explaining its purpose, or remove it if truly unused.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/scriptedmetric/CalciteScriptedMetricInitScriptFactory.java (1)

49-68: Consider extracting result-to-state logic to reduce duplication.

The pattern of checking result != null && result.length > 0 and then either calling putAll for Map results or putting under "accumulator" key is duplicated across CalciteScriptedMetricInitScriptFactory and CalciteScriptedMetricMapScriptFactory. Consider extracting this into a shared utility method in a helper class.

♻️ Optional refactor example
// In a shared helper class like ScriptedMetricUtils:
public static void storeResultInState(Object[] result, Map<String, Object> state) {
    if (result != null && result.length > 0) {
        if (result[0] instanceof Map) {
            state.putAll((Map<String, Object>) result[0]);
        } else {
            state.put("accumulator", result[0]);
        }
    }
}
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/scriptedmetric/ScriptedMetricUDAFRegistry.java (1)

8-37: Consider a thread-safe registry map.
If register() can be called after startup, HashMap is not safe for concurrent access. Consider ConcurrentHashMap (or make the registry immutable after init).

♻️ Possible adjustment
-import java.util.HashMap;
+import java.util.concurrent.ConcurrentHashMap;
@@
-    this.udafMap = new HashMap<>();
+    this.udafMap = new ConcurrentHashMap<>();
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)

3282-3383: Consider splitting flattenParsedPattern into smaller helpers.

This method mixes transformation logic and four projection variants in one block (~100+ LOC). Extracting helpers (pattern, count, tokens, sample_logs) would keep it under the 50-line guideline and make the aggregation vs. label paths easier to follow. As per coding guidelines, please keep methods under 50 lines.

Signed-off-by: Songkan Tang <songkant@amazon.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/CalciteScriptEngine.java (1)

234-242: Preserve the specific SPECIAL_VARIABLE error detail.
The current throw is swallowed by the generic catch, losing the intended message.

🛠️ Suggested fix
-      } catch (Exception e) {
-        throw new IllegalStateException("Failed to get value for parameter " + name);
-      }
+      } catch (IllegalStateException e) {
+        throw e;
+      } catch (Exception e) {
+        throw new IllegalStateException("Failed to get value for parameter " + name, e);
+      }
🤖 Fix all issues with AI agents
In
`@core/src/main/java/org/opensearch/sql/expression/function/PatternParserFunctionImpl.java`:
- Around line 196-235: The new public method
PatternParserFunctionImpl.evalAggSamples lacks unit tests; add JUnit tests for
it: (1) when showNumberedToken=true supply a wildcard pattern and sampleLogs and
assert the returned Map has PatternUtils.PATTERN transformed to numbered tokens
(use PatternUtils.TOKEN_PREFIX) and PatternUtils.TOKENS contains expected token
values extracted from sampleLogs; (2) when showNumberedToken=false assert the
returned pattern equals the original wildcard pattern and tokens map is empty;
and (3) when pattern is null/blank assert the method returns EMPTY_RESULT; call
PatternParserFunctionImpl.evalAggSamples directly and use assertions on the
returned Map/objects to validate behavior.

In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/CalciteScriptEngine.java`:
- Around line 78-88: The catch in CalciteScriptEngine currently catches the
IllegalStateException raised for SPECIAL_VARIABLE and re-throws a new generic
IllegalStateException, losing the original variable name/type context; fix by
either (A) handling the SPECIAL_VARIABLE case before entering the try block in
the method that builds/creates scripts in CalciteScriptEngine (so it never gets
wrapped), or (B) preserve the original exception when propagating by re-throwing
the caught exception (throw e;) or wrapping it while including the original
message and cause (new IllegalStateException("SPECIAL_VARIABLE: " +
e.getMessage(), e)), so the SPECIAL_VARIABLE name/type details are preserved.

In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/scriptedmetric/CalciteScriptedMetricInitScriptFactory.java`:
- Around line 24-27: Add a JavaDoc comment to the public factory method
CalciteScriptedMetricInitScriptFactory.newInstance that documents the method
purpose, the parameters (Map<String, Object> params and Map<String, Object>
state) and what the method returns (a ScriptedMetricAggContexts.InitScript,
specifically a new CalciteScriptedMetricInitScript created with the factory's
function, params, and state); place the JavaDoc immediately above the
newInstance method and ensure it follows project style (brief description,
`@param` tags for params and state, and an `@return` tag describing the InitScript
instance).

In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/scriptedmetric/CalciteScriptedMetricReduceScriptFactory.java`:
- Around line 25-28: Add a JavaDoc comment for the public factory method
newInstance in CalciteScriptedMetricReduceScriptFactory: document the params map
and states list parameters and describe the returned
ScriptedMetricAggContexts.ReduceScript instance; reference that the method
constructs and returns a new CalciteScriptedMetricReduceScript using the
factory's function field, mention that params is a Map<String,Object> of script
parameters and states is a List<Object> of intermediate aggregation states, and
state that the method returns the constructed ReduceScript.
♻️ Duplicate comments (2)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/scriptedmetric/CalciteScriptedMetricCombineScriptFactory.java (1)

24-27: Public newInstance JavaDoc still missing (already flagged).

As per coding guidelines, please add JavaDoc for params/return.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/scriptedmetric/CalciteScriptedMetricMapScriptFactory.java (1)

26-29: Public newFactory JavaDoc still missing (already flagged).

As per coding guidelines, please add JavaDoc for params/return.

Comment on lines +78 to +88
import org.opensearch.script.ScriptedMetricAggContexts;
import org.opensearch.script.StringSortScript;
import org.opensearch.search.lookup.SourceLookup;
import org.opensearch.sql.data.model.ExprTimestampValue;
import org.opensearch.sql.opensearch.storage.script.aggregation.CalciteAggregationScriptFactory;
import org.opensearch.sql.opensearch.storage.script.field.CalciteFieldScriptFactory;
import org.opensearch.sql.opensearch.storage.script.filter.CalciteFilterScriptFactory;
import org.opensearch.sql.opensearch.storage.script.scriptedmetric.CalciteScriptedMetricCombineScriptFactory;
import org.opensearch.sql.opensearch.storage.script.scriptedmetric.CalciteScriptedMetricInitScriptFactory;
import org.opensearch.sql.opensearch.storage.script.scriptedmetric.CalciteScriptedMetricMapScriptFactory;
import org.opensearch.sql.opensearch.storage.script.scriptedmetric.CalciteScriptedMetricReduceScriptFactory;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's determine the minimum supported OpenSearch version from the project
fd "pom.xml" | head -1 | xargs grep -A 5 "opensearch.version" 2>/dev/null || grep -r "opensearch.version" . --include="*.xml" --include="*.gradle" | head -5

Repository: opensearch-project/sql

Length of output: 561


🏁 Script executed:

# Find the CalciteScriptEngine.java file and examine the relevant lines
fd "CalciteScriptEngine.java" --type f

Repository: opensearch-project/sql

Length of output: 160


🌐 Web query:

ScriptedMetricAggContexts OpenSearch version introduced when first available

💡 Result:

ScriptedMetricAggContexts is available in OpenSearch starting with OpenSearch 1.0.0 (GA release on July 12, 2021), since OpenSearch 1.0 was derived from Elasticsearch 7.10.2, which already includes that scripted-metric aggregation script context. [1][2][3]

Sources:
[1] OpenSearch docs – version history (shows 1.0.0 release date) (docs.opensearch.org)
[2] OpenSearch blog – “1.0 is released!” (GA announcement; derived from Elasticsearch 7.10.2) (opensearch.org)
[3] OpenSearch blog – 1.0.0 RC1 announcement (explicitly “derived from Elasticsearch 7.10.2”) (opensearch.org)

Citations:


🏁 Script executed:

cat -n opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/CalciteScriptEngine.java | sed -n '70,145p'

Repository: opensearch-project/sql

Length of output: 4512


🏁 Script executed:

cat -n opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/CalciteScriptEngine.java | sed -n '230,275p'

Repository: opensearch-project/sql

Length of output: 2105


Exception handling masks the specific SPECIAL_VARIABLE error context.

The scripted-metric contexts (ScriptedMetricAggContexts) have been available since OpenSearch 1.0.0 and are compatible with this project (targeting 3.5.0).

However, at lines 234–242, the specific IllegalStateException thrown for SPECIAL_VARIABLE cases (line 237–238) is caught and re-thrown by the outer catch block (line 240–242) with a generic message that omits the variable name and type information. This masks the original error context and makes debugging harder.

Consider either handling the SPECIAL_VARIABLE case separately outside the try block, or re-throwing the original exception rather than wrapping it with a less informative message.

🤖 Prompt for AI Agents
In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/CalciteScriptEngine.java`
around lines 78 - 88, The catch in CalciteScriptEngine currently catches the
IllegalStateException raised for SPECIAL_VARIABLE and re-throws a new generic
IllegalStateException, losing the original variable name/type context; fix by
either (A) handling the SPECIAL_VARIABLE case before entering the try block in
the method that builds/creates scripts in CalciteScriptEngine (so it never gets
wrapped), or (B) preserve the original exception when propagating by re-throwing
the caught exception (throw e;) or wrapping it while including the original
message and cause (new IllegalStateException("SPECIAL_VARIABLE: " +
e.getMessage(), e)), so the SPECIAL_VARIABLE name/type details are preserved.

Signed-off-by: Songkan Tang <songkant@amazon.com>
Copy link
Collaborator

@yuancu yuancu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request pushdown pushdown related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Pushdown any UDAF by scripted metric aggregations

2 participants