Skip to content

Conversation

@airborne12
Copy link
Member

@airborne12 airborne12 commented Jan 12, 2026

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #59394

Problem Summary:
The search DSL should only recognize uppercase AND, OR, NOT as boolean operators in search lucene boolean mode. Previously, lowercase and, or, not were also treated as operators, which does not conform to the specification.

This PR makes the boolean operators case-sensitive:

  • Only uppercase AND, OR, NOT are recognized as operators
  • Lowercase and, or, not are now treated as regular search terms
  • Using lowercase operators in DSL will result in a parse error

Release note

Make search DSL boolean operators (AND/OR/NOT) case-sensitive in lucene boolean mode.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • Yes. Lowercase and, or, not are no longer recognized as operators in search DSL. Users must use uppercase AND, OR, NOT.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Jan 12, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@airborne12
Copy link
Member Author

run buildall

Per specification requirement apache#5, only uppercase AND/OR/NOT
should be recognized as operators in search DSL. Lowercase and/or/not
should be treated as regular terms, causing parse errors when used
as operators.

Changes:
- Update SearchLexer.g4 to only match uppercase keywords
- Update unit tests to expect parse errors for lowercase operators
- Update regression tests accordingly
- Add comprehensive DSL operator test cases

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@airborne12
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31207 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 4abe46d4c9305e95d94e6a7775dc86244f3b10a4, data reload: false

------ Round 1 ----------------------------------
q1	17839	4291	4102	4102
q2	2282	362	242	242
q3	10292	1261	701	701
q4	11057	1008	330	330
q5	7610	2005	1902	1902
q6	197	176	143	143
q7	919	791	656	656
q8	9310	1329	1198	1198
q9	4765	4550	4572	4550
q10	6854	1787	1427	1427
q11	1111	297	283	283
q12	1303	747	589	589
q13	18302	3795	3088	3088
q14	287	293	271	271
q15	691	507	495	495
q16	728	671	630	630
q17	687	805	464	464
q18	7738	6481	6237	6237
q19	2230	973	601	601
q20	393	367	247	247
q21	3165	2511	2091	2091
q22	1059	992	960	960
Total cold run time: 108819 ms
Total hot run time: 31207 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4155	4096	4049	4049
q2	326	383	314	314
q3	2072	2540	2179	2179
q4	1326	1750	1321	1321
q5	4061	3990	4294	3990
q6	234	191	145	145
q7	2130	1972	1851	1851
q8	2591	2422	2421	2421
q9	7291	7212	7007	7007
q10	2479	2708	2264	2264
q11	552	468	447	447
q12	723	739	642	642
q13	3561	4060	3250	3250
q14	277	307	270	270
q15	546	503	562	503
q16	724	730	624	624
q17	1153	1292	1382	1292
q18	7955	8086	7901	7901
q19	849	852	854	852
q20	2039	2084	1894	1894
q21	4787	4551	4266	4266
q22	1132	1032	1027	1027
Total cold run time: 50963 ms
Total hot run time: 48509 ms

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage `` 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants