Skip to content

Conversation

@LantaoJin
Copy link
Member

@LantaoJin LantaoJin commented Sep 25, 2025

Description

Change the default search sort tiebreaker of PIT search requests to _shard_doc after opensearch-project/OpenSearch#18924 being merged.

Main changes:

  1. Remove sort field _doc in pushDownFilter(). (_doc is a default hits ordering if no specific orders are set)
  2. Explicitly add _shard_doc as sort tiebreaker of PIT search (not 100% sure the current implementation of _shard_doc is an implicit sort in PIT search, see question).
  3. Keep the legacy engine unchanged.

Related Issues

Resolves #3064

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Lantao Jin <ltjin@amazon.com>
@LantaoJin LantaoJin added v3.3.0 enhancement New feature or request labels Sep 25, 2025
@LantaoJin
Copy link
Member Author

Is it a must have for 3.3.0?

Copy link
Collaborator

@dai-chen dai-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm: Do all PIT search requests add _shard_doc as tiebreaker field implicitly if no sort provided?

@LantaoJin
Copy link
Member Author

LantaoJin commented Sep 28, 2025

Just to confirm: Do all PIT search requests add _shard_doc as tiebreaker field implicitly if no sort provided?

I think all PIT search requests should add _shard_doc as tiebreaker, or the _score will used as implicit sort field in response. But this PR doesn't check all code base, it just changes the all places which use _doc + _id as tiebreaker to _shard_doc.

I checked PR of _shard_doc, seems only PIT search requests can sort by _shard_doc. Non-PIT search requests sort by _shard_doc throws exception. I will refactor this PR to force v2 and v3 only. The v1 (legacy) code is out of my knowledge, I won't touch them in this PR.

Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Lantao Jin <ltjin@amazon.com>
@LantaoJin LantaoJin changed the title Change the default search sorting to _shard_doc Change the default search sort tiebreaker to _shard_doc for PIT search Sep 28, 2025
Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Lantao Jin <ltjin@amazon.com>
// https://github.com/opensearch-project/OpenSearch/pull/18924#issuecomment-3342365950
if (this.sourceBuilder.sorts() == null
|| this.sourceBuilder.sorts().stream().noneMatch(ShardDocSortBuilder.class::isInstance)) {
this.sourceBuilder.sort(SortBuilders.shardDocSort());
Copy link
Member Author

@LantaoJin LantaoJin Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is the primary change of this PR.
Currently, the sort tiebreaker _shard_doc is not implicit added in PIT search request. So we explicit sort with SortBuilders.shardDocSort().

Swiddis
Swiddis previously approved these changes Sep 30, 2025
@Swiddis
Copy link
Collaborator

Swiddis commented Sep 30, 2025

#4420 still fails tests after merge

@LantaoJin
Copy link
Member Author

#4420 still fails tests after merge

Seems #4420 contains some changes that are not in this PR. Let me update from upstream and run CI again.

Signed-off-by: Lantao Jin <ltjin@amazon.com>
@RyanL1997
Copy link
Collaborator

https://github.com/opensearch-project/sql/actions/runs/18148240816/job/51654012596?pr=4378#step:6:481

CalciteNoPushdownIT > org.opensearch.sql.calcite.remote.CalciteSearchCommandIT.testSearchWithComplexChainedExpressions FAILED
REPRODUCE WITH: ./gradlew ':integ-test:integTest' --tests 'org.opensearch.sql.calcite.remote.CalciteSearchCommandIT.testSearchWithComplexChainedExpressions' -Dtests.seed=FD9C48B5A496F6A0 -Dtests.security.manager=false -Dtests.locale=ff-SN -Dtests.timezone=America/Santo_Domingo -Druntime.java=21
    java.lang.AssertionError: expected:<0> but was:<1>
        at __randomizedtesting.SeedInfo.seed([FD9C48B5A496F6A0:5AB6F276A0DA7BAD]:0)
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.failNotEquals(Assert.java:835)
        at org.junit.Assert.assertEquals(Assert.java:647)
        at org.junit.Assert.assertEquals(Assert.java:633)
        at org.opensearch.sql.util.MatcherUtils.verifyNumOfRows(MatcherUtils.java:188)
        at org.opensearch.sql.ppl.SearchCommandIT.testSearchWithComplexChainedExpressions(SearchCommandIT.java:1316)
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
        at java.base/java.lang.reflect.Method.invoke(Method.java:580)
        at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
        at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
        at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
        at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
        at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
        at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
        at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
        at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
        at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
        at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
        at java.base/java.lang.Thread.run(Thread.java:1583)

@RyanL1997
Copy link
Collaborator

this is the same failure on #4344

Swiddis
Swiddis previously approved these changes Oct 1, 2025
RyanL1997
RyanL1997 previously approved these changes Oct 1, 2025
Signed-off-by: Lantao Jin <ltjin@amazon.com>
@LantaoJin LantaoJin dismissed stale reviews from RyanL1997 and Swiddis via 0d93c14 October 1, 2025 01:59
@LantaoJin
Copy link
Member Author

https://github.com/opensearch-project/sql/actions/runs/18148240816/job/51654012596?pr=4378#step:6:481

CalciteNoPushdownIT > org.opensearch.sql.calcite.remote.CalciteSearchCommandIT.testSearchWithComplexChainedExpressions FAILED
REPRODUCE WITH: ./gradlew ':integ-test:integTest' --tests 'org.opensearch.sql.calcite.remote.CalciteSearchCommandIT.testSearchWithComplexChainedExpressions' -Dtests.seed=FD9C48B5A496F6A0 -Dtests.security.manager=false -Dtests.locale=ff-SN -Dtests.timezone=America/Santo_Domingo -Druntime.java=21
    java.lang.AssertionError: expected:<0> but was:<1>
        at __randomizedtesting.SeedInfo.seed([FD9C48B5A496F6A0:5AB6F276A0DA7BAD]:0)
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.failNotEquals(Assert.java:835)
        at org.junit.Assert.assertEquals(Assert.java:647)
        at org.junit.Assert.assertEquals(Assert.java:633)
        at org.opensearch.sql.util.MatcherUtils.verifyNumOfRows(MatcherUtils.java:188)
        at org.opensearch.sql.ppl.SearchCommandIT.testSearchWithComplexChainedExpressions(SearchCommandIT.java:1316)
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
        at java.base/java.lang.reflect.Method.invoke(Method.java:580)
        at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
        at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
        at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
        at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
        at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
        at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
        at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
        at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
        at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
        at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
        at java.base/java.lang.Thread.run(Thread.java:1583)

@RyanL1997 @Swiddis , confirmed with @yuancu , seems a test bug triggered at Oct 1st. To unblock our release, I have ignored it in the last commit and @yuancu will look into it offline.

@LantaoJin
Copy link
Member Author

LantaoJin commented Oct 1, 2025

Reminder: Do not backport it to 2.19-dev since it calls an API only provided in OS core 3.3.0.

@LantaoJin LantaoJin enabled auto-merge (squash) October 1, 2025 02:28
@LantaoJin LantaoJin merged commit 3e95147 into opensearch-project:main Oct 1, 2025
33 checks passed
@penghuo
Copy link
Collaborator

penghuo commented Oct 1, 2025

Just to confirm: Do all PIT search requests add _shard_doc as tiebreaker field implicitly if no sort provided?

I think all PIT search requests should add _shard_doc as tiebreaker, or the _score will used as implicit sort field in response. But this PR doesn't check all code base, it just changes the all places which use _doc + _id as tiebreaker to _shard_doc.

I checked PR of _shard_doc, seems only PIT search requests can sort by _shard_doc. Non-PIT search requests sort by _shard_doc throws exception. I will refactor this PR to force v2 and v3 only. The v1 (legacy) code is out of my knowledge, I won't touch them in this PR.

For search=index query, What is performance difference sort[_doc, _id], vs sort[_shard_doc] ?
Should we use _shard_docs as tiebreaker only?

@ahkcs ahkcs mentioned this pull request Oct 3, 2025
asifabashar added a commit to asifabashar/sql that referenced this pull request Oct 10, 2025
* main-apple: (218 commits)
  Add ignorePrometheus Flag for integTest and docTest (opensearch-project#4442)
  Create fab-radar.yml
  PPL `fillnull` command enhancement (opensearch-project#4421)
  reverting to _doc + _id (opensearch-project#4435)
  Support `multisearch` command in calcite (opensearch-project#4332)
  Add 3.3 release notes (opensearch-project#4422) (opensearch-project#4423)
  [SQL/PPL] Fix the `count(*)` and `dc(field)` to be capped at MAX_INTEGER opensearch-project#4416 (opensearch-project#4418)
  Change the default search sort tiebreaker to `_shard_doc` for PIT search (opensearch-project#4378)
  [Enhancement] Add error handling for known limitation of sql `JOIN` (opensearch-project#4344)
  Bugfix: SQL type mapping for legacy JDBC output (opensearch-project#3613)
  Version bump: 3.3 (opensearch-project#4417)
  Add max/min eval functions (opensearch-project#4333)
  Support time modifiers in search command  (opensearch-project#4224)
  Fix numbered token bug and make it optional output in patterns command (opensearch-project#4402)
  refactor span (opensearch-project#4334)
  Move release notes categories (opensearch-project#3818)
  [Doc] Enable doctest with Calcite (opensearch-project#4379)
  Mod function should return decimal instead of float when handle the operands are decimal literal (opensearch-project#4407)
  Scale of decimal literal should always be positive in Calcite (opensearch-project#4401)
  Enable Calcite by default and implicit fallback the unsupported commands (opensearch-project#4372)
  ...
LantaoJin added a commit to LantaoJin/search-plugins-sql that referenced this pull request Oct 15, 2025
* Partially backport opensearch-project#4378

Signed-off-by: Lantao Jin <ltjin@amazon.com>

---------

Signed-off-by: Lantao Jin <ltjin@amazon.com>
(cherry picked from commit 3e95147)
yuancu pushed a commit that referenced this pull request Oct 16, 2025
…4573)

* Partially backport #4378

* Partially backport #4378

Signed-off-by: Lantao Jin <ltjin@amazon.com>

---------

Signed-off-by: Lantao Jin <ltjin@amazon.com>
(cherry picked from commit 3e95147)

* fix doctest

Signed-off-by: Lantao Jin <ltjin@amazon.com>

---------

Signed-off-by: Lantao Jin <ltjin@amazon.com>
@LantaoJin LantaoJin added the backport-manually Filed a PR to backport manually. label Oct 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-manually Filed a PR to backport manually. enhancement New feature or request v3.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Stabilize preferred SQL implicit sorting method

6 participants