Skip to content

Conversation

Copy link

Copilot AI commented Jan 22, 2026

Implements hybrid search combining vector similarity and full-text search via the DBMS_HYBRID_SEARCH API, supported on OceanBase 4.4.1+ and SeekDB.

Changes

  • Tool implementation (tools/hybrid_search.py)

    • Extracts vector/full-text columns from table schema
    • Embeds query text using Dify's text-embedding models
    • Builds Elasticsearch-compatible query body for DBMS_HYBRID_SEARCH.SEARCH()
    • Optional reranking (required for multi-table queries)
    • Returns JSON or Markdown formatted results
    • Supports Elasticsearch-compatible filter parameter for additional filtering conditions
  • Tool definition (tools/hybrid_search.yaml)

    • Parameters: table_names (required), query (required), top_k (default 10), embedding_model (required), rerank_model (conditional), filter (optional)
    • Validates tables have either vector or full-text indexes (or both)
  • Dependencies

    • Upgraded pyobvector from 0.2.16 to 0.2.22 for HybridSearch client
  • Documentation

    • Updated README with hybrid search capabilities and requirements

Usage

# Single table search (rerank optional)
table_names: "documents"
query: "machine learning algorithms"
embedding_model: text-embedding-3-small

# Multi-table search (rerank required)
table_names: "docs,articles,papers"
query: "neural networks"
embedding_model: text-embedding-3-small
rerank_model: cohere-rerank-v3

# With filter conditions
table_names: "documents"
query: "machine learning"
embedding_model: text-embedding-3-small
filter: '{"range": {"price": {"gte": 10, "lte": 100}}}'

Query body structure sent to DBMS_HYBRID_SEARCH:

{
  "query": {
    "hybrid": {
      "queries": [
        {"knn": {"field": "embedding", "query_vector": [...], "k": 10}},
        {"match": {"content": "query text"}}
      ]
    }
  },
  "size": 10
}

With filter:

{
  "query": {
    "bool": {
      "must": [
        {
          "hybrid": {
            "queries": [
              {"knn": {"field": "embedding", "query_vector": [...], "k": 10}},
              {"match": {"content": "query text"}}
            ]
          }
        }
      ],
      "filter": {"range": {"price": {"gte": 10, "lte": 100}}}
    }
  },
  "size": 10
}
Original prompt

This section details on the original issue you should resolve

<issue_title>[Feature]: Add hybrid search tool to execute hybrid search on OceanBase or SeekDB</issue_title>
<issue_description>### Describe your use case

Hybrid Search feature is supportted on OceanBase 4.4.1 and SeekDB, the plugin should add functionally support for it.

Describe the solution you'd like

Add a hybrid_search tool to call the DBMS_HYBRID_SEARCH API.

Describe alternatives you've considered

No response

Additional context

No response</issue_description>

<agent_instructions>Add a hybrid_search utility for performing hybrid searches on specified tables. Defined as follows:

Parameters:

  • table_names: The names of the tables to query. Required. Separate multiple tables with commas. An error will occur if the specified tables do not have a vector index or full-text index.

  • query: The text to search for.

  • top_k: The number of results to return. Defaults to 10. Note that when multiple table names are specified, a rerank model needs to be set.

  • embedding_model: The embedding model to use. Required.

  • rerank_model: The rerank model to use. Required when multiple table names are specified; optional when a single table is specified.

Execution Flow:

  1. Obtain the table structure of the specified table name, extracting the vector columns and full-text index columns.

  2. Use the embedding model to embed the query text.

  3. Perform a mixed search using pyobvector's mixed search function.

  4. If a rerank model is set, re-rank the results using the rerank model.

  5. Return the search results using a method similar to the execute_sql tool.

For tool development, you can refer to the documentation and other tools in the repository. Note that pyobvector needs to be upgraded to the latest version, and the project's README and yml need to be updated simultaneously.</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 2 commits January 22, 2026 03:09
Co-authored-by: whhe <27404407+whhe@users.noreply.github.com>
Co-authored-by: whhe <27404407+whhe@users.noreply.github.com>
Copilot AI changed the title [WIP] Add hybrid search tool for OceanBase and SeekDB Add hybrid_search tool for OceanBase/SeekDB Jan 22, 2026
Copilot AI requested a review from whhe January 22, 2026 03:14
@whhe whhe requested a review from xxsc0529 January 22, 2026 03:47
@whhe whhe marked this pull request as ready for review January 22, 2026 03:47
@whhe
Copy link
Member

whhe commented Jan 22, 2026

@copilot I would like to change the following definitions:

  • The table must contain either a full-text index or a vector index. If neither index is present, an error should be thrown.

  • Add a new input parameter filter for other filtering conditions. You need to refer to Elasticsearch's filter parameter to explain its usage to users.

Copilot AI and others added 2 commits January 22, 2026 06:19
Co-authored-by: whhe <27404407+whhe@users.noreply.github.com>
Co-authored-by: whhe <27404407+whhe@users.noreply.github.com>
Copy link
Author

Copilot AI commented Jan 22, 2026

@copilot I would like to change the following definitions:

  • The table must contain either a full-text index or a vector index. If neither index is present, an error should be thrown.
    ...

Done! Changes implemented in commit 07cd644:

  1. Index Validation: The validation logic already requires tables to have EITHER a vector index OR a full-text index. If neither is present, an error is thrown (line 158 in hybrid_search.py).

  2. Filter Parameter: Added new filter parameter that accepts Elasticsearch-compatible JSON format for additional filtering conditions. Supports:

    • Range filters: {"range": {"price": {"gte": 10, "lte": 100}}}
    • Term filters: {"term": {"category": "electronics"}}
    • Complex boolean queries: {"bool": {"must": [...], "should": [...]}}

The filter is properly integrated into the search body as a bool query with the hybrid search in the must clause.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Add hybrid search tool to execute hybrid search on OceanBase or SeekDB

2 participants