MB-27666: Hierarchy Search #2224

CascadingRadium · 2025-09-06T23:06:09Z

Add support for nested fields in indexing and querying

Parse and index nested JSON objects
Enable queries on nested fields
Preserve hierarchical relationships in index

Requires:

Resolves:

Copilot

Pull Request Overview

Add support for nested fields in indexing and querying to enable hierarchical document structures and queries on nested objects. This enhances Bleve's search capabilities by supporting complex nested JSON document structures.

Parse and index nested JSON objects with preserved hierarchical relationships
Enable conjunction queries on nested fields with proper document matching across hierarchical levels
Implement nested-aware collectors, searchers, and mapping functionality

Reviewed Changes

Copilot reviewed 27 out of 27 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
search_test.go	Comprehensive test suite for nested field querying scenarios
search_knn.go	Added nested-aware KNN and eligible collectors
search/util.go	Introduced FieldSet utility for field prefix matching
search/searcher/*.go	Updated searchers to use new ID comparison methods
search/searcher/search_conjunction_nested.go	New nested conjunction searcher implementation
search/search.go	Extended DocumentMatch with descendant tracking
search/scorer/scorer_conjunction_nested.go	New scorer for nested conjunction queries
search/query/*.go	Updated query types and field extraction for nested support
search/highlight/highlighter/simple/highlighter_simple.go	Modified to use new fragment addition method
search/explanation.go	Added merge utilities for explanations and score breakdowns
search/collector/*.go	Enhanced collectors with nested document support
mapping/*.go	Added nested mapping interfaces and field tracking
index_impl.go	Integrated nested-aware collectors and field loading
index/scorch/*.go	Updated index readers to support nested document hierarchy
document/document.go	Extended Document with nested document support

Comments suppressed due to low confidence (1)

search/searcher/search_conjunction_nested.go:1

Parameter name 'somOfK' appears to be a typo. It should likely be 'sumOfK' to match the usage pattern seen elsewhere in the codebase.

//  Copyright (c) 2025 Couchbase, Inc.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

search/searcher/search_numeric_range.go

search/scorer/scorer_conjunction_nested.go

document/document.go

index/scorch/snapshot_index.go

mapping/document.go

search/collector/topn.go

search/explanation.go

Copilot

Pull Request Overview

Copilot reviewed 31 out of 31 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

search_test.go

search_knn.go

search_test.go

search/search.go

ajroetker · 2025-11-25T19:27:28Z

It'd be awesome to be able to say which all of the nested documents matched as well ("highlighting" if you will for the nested docs). Useful if you we're wanting to chunk a large fields into a set of child documents and then ask which chunks specifically matched.

I think you'd need a new field to DocumentMatch that explicitly reports which nested documents matched:

type DocumentMatch struct {
    // ... existing fields ...

    // Reports which nested documents matched
    NestedMatches   NestedDocumentMatches  `json:"nested_matches,omitempty"` // NEW
}

// Maps field path -> array of matched nested document indices
type NestedDocumentMatches map[string][]NestedDocumentMatch

type NestedDocumentMatch struct {
    ArrayPositions []uint64              `json:"array_positions"`  // e.g., [0, 2] for chunks[0].sections[2]
    MatchedTerms   []string              `json:"matched_terms"`    // Terms that matched in this nested doc
    Score          float64               `json:"score,omitempty"`  // Optional: score contribution
    Fields         map[string]interface{} `json:"fields,omitempty"` // Optional: field values from this nested doc
}

And to add a flag to SearchRequest to request nested document match information:

type SearchRequest struct {
    // ... existing fields ...
   NestedMatches  *NestedMatchRequest  // NEW
}

type NestedMatchRequest struct {
    Fields         []string  // Which fields to report nested matches for
    MaxPerField    int       // Max nested docs to return per field
    IncludeFields  []string  // Optional: field values to include from matched nested docs
}

CascadingRadium · 2025-11-26T06:03:33Z

hi @ajroetker, this support is already added. The nested document matches will be present in the main document match. The Fields and Highlight elements will be present inside a NestedDocumentMatch object inside the a special _$nested key in the main document match's fields. PFA a sample hit from a nested query.

The main request's fields query can include any field from the nested document. Doesn't have to be in a separate struct.

{
  "index": "/var/folders/4r/qwn5bjb10mj_mv4znz0ypblw0000gp/T/bleve-testidx3986533520",
  "id": "2",
  "score": 1.8790196309110014,
  "locations": {
    "company.departments.employees.name": {
      "frank": [
        {
          "pos": 1,
          "start": 0,
          "end": 5,
          "array_positions": [
            1,
            0
          ]
        }
      ]
    },
    "company.departments.employees.role": {
      "manager": [
        {
          "pos": 1,
          "start": 0,
          "end": 7,
          "array_positions": [
            1,
            0
          ]
        }
      ]
    },
    "company.departments.name": {
      "engineering": [
        {
          "pos": 1,
          "start": 0,
          "end": 11,
          "array_positions": [
            1
          ]
        }
      ]
    },
    "company.locations.city": {
      "london": [
        {
          "pos": 1,
          "start": 0,
          "end": 6,
          "array_positions": [
            1
          ]
        }
      ]
    },
    "company.locations.country": {
      "uk": [
        {
          "pos": 1,
          "start": 0,
          "end": 2,
          "array_positions": [
            1
          ]
        }
      ]
    }
  },
  "sort": [
    "2"
  ],
  "decoded_sort": [
    "2"
  ],
  "fields": {
    "_$nested": [
      {
        "fields": {
          "company.departments.employees.name": "Frank",
          "company.departments.employees.role": "Manager"
        },
        "fragments": {
          "company.departments.employees.name": [
            "<mark>Frank</mark>"
          ],
          "company.departments.employees.role": [
            "<mark>Manager</mark>"
          ]
        }
      },
      {
        "fields": {
          "company.departments.budget": 800000,
          "company.departments.name": "Engineering"
        },
        "fragments": {
          "company.departments.name": [
            "<mark>Engineering</mark>"
          ]
        }
      },
      {
        "fields": {
          "company.locations.city": "London",
          "company.locations.country": "UK"
        },
        "fragments": {
          "company.locations.city": [
            "<mark>London</mark>"
          ],
          "company.locations.country": [
            "<mark>UK</mark>"
          ]
        }
      }
    ],
    "company.id": "c2",
    "company.name": "BizInc"
  }
}

ajroetker · 2025-11-26T08:37:14Z

That's awesome! Way cleaner too. Looking forward to this one, especially useful for chunking smaller documents into embeddable chunks without having to duplicate metadata for every one.

…2260) - When indexing multi-vector fields (e.g., `[[3,0,0], [0,4,0]]`) with `cosine` similarity, normalization was incorrectly applied to the entire flattened array instead of each sub-vector independently, resulting in degraded similarity scores. - Added `NormalizeMultiVector(vec, dims)` that normalizes each sub-vector separately, fixing scores for multi-vector documents (e.g., score now correctly returns 1.0 instead of 0.6 for exact matches).

…rrectly (#2260)" This reverts commit a233b67.

…2260) - When indexing multi-vector fields (e.g., `[[3,0,0], [0,4,0]]`) with `cosine` similarity, normalization was incorrectly applied to the entire flattened array instead of each sub-vector independently, resulting in degraded similarity scores. - Added `NormalizeMultiVector(vec, dims)` that normalizes each sub-vector separately, fixing scores for multi-vector documents (e.g., score now correctly returns 1.0 instead of 0.6 for exact matches).

CascadingRadium force-pushed the nested branch from 2e20e37 to 4c3fdfa Compare September 6, 2025 23:09

CascadingRadium changed the title ~~Nested Fields [WIP]~~ MB-27666: Nested Fields Sep 13, 2025

CascadingRadium requested review from Likith101, Thejas-bhat, abhinavdangeti, capemox, Copilot, maneuvertomars and steveyen and removed request for steveyen September 13, 2025 16:01

Copilot AI reviewed Sep 13, 2025

View reviewed changes

search/searcher/search_numeric_range.go Show resolved Hide resolved

search/scorer/scorer_conjunction_nested.go Outdated Show resolved Hide resolved

abhinavdangeti added this to the v2.6.0 milestone Sep 16, 2025

CascadingRadium force-pushed the nested branch from 82de66e to 8172f3b Compare October 14, 2025 09:27

Likith101 reviewed Oct 23, 2025

View reviewed changes

CascadingRadium force-pushed the nested branch from 12aa9e3 to 0182ae8 Compare October 23, 2025 12:07

CascadingRadium force-pushed the nested branch 2 times, most recently from 0d1f64e to 3aa46b4 Compare November 17, 2025 09:38

CascadingRadium requested review from Likith101 and Copilot November 17, 2025 09:39

Copilot started reviewing on behalf of CascadingRadium November 17, 2025 09:39 View session

Copilot finished reviewing on behalf of CascadingRadium November 17, 2025 09:41

Copilot AI reviewed Nov 17, 2025

View reviewed changes

search_test.go Show resolved Hide resolved

search_knn.go Outdated Show resolved Hide resolved

search_test.go Show resolved Hide resolved

search/search.go Show resolved Hide resolved

CascadingRadium force-pushed the nested branch from 5c4961b to a30a7a2 Compare November 20, 2025 12:53

CascadingRadium changed the title ~~MB-27666: Nested Fields~~ MB-27666: Hierarchy Search Nov 22, 2025

CascadingRadium added 3 commits November 27, 2025 22:09

rebase

16093d1

minor UT change

abaddc7

revert gomod change

f8f4061

CascadingRadium added 5 commits December 4, 2025 23:48

fix test

6b153a0

Merge branch 'knnDup' into cosineFix

9ac8392

use normalizeVector for base64

68760c2

fix merge conflict

99e2120

Merge branch 'cosineFix' into nested

b1f596c

CascadingRadium changed the base branch from master to cosineFix December 5, 2025 09:44

CascadingRadium added 4 commits December 5, 2025 17:19

Fix interface

8721d16

Merge branch 'knnDup' into nested

70798cc

fix KNN case

f3ed293

Base automatically changed from cosineFix to knnDup December 8, 2025 06:18

CascadingRadium and others added 8 commits December 8, 2025 13:37

revert

dd2422d

fix

f3540a6

fix

fcb0d76

Revert "MB-69655: Fix vector normalization to handle multi-vectors co…

fc32bb8

…rrectly (#2260)" This reverts commit a233b67.

revert again

0250c8f

fix test

d2faeb6

remove newline

dd7d1b2

CascadingRadium marked this pull request as draft December 11, 2025 12:01

CascadingRadium changed the base branch from knnDup to master December 11, 2025 12:02

CascadingRadium added 4 commits December 11, 2025 17:39

Merge branch 'master' into nested

2d00d0b

Merge remote-tracking branch 'origin/cosineFix' into nested

210416b

Merge branch 'knnDup' into nested

281e784

finally

d8aafea

CascadingRadium marked this pull request as ready for review December 11, 2025 14:53

fix test

835b142

CascadingRadium added this to Hierarchy Search Dec 26, 2025

github-project-automation bot moved this to Todo in Hierarchy Search Dec 26, 2025

CascadingRadium moved this from Todo to Done in Hierarchy Search Dec 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MB-27666: Hierarchy Search #2224

MB-27666: Hierarchy Search #2224

Uh oh!

CascadingRadium commented Sep 6, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ajroetker commented Nov 25, 2025

Uh oh!

CascadingRadium commented Nov 26, 2025 •

edited

Loading

Uh oh!

ajroetker commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

MB-27666: Hierarchy Search #2224

Are you sure you want to change the base?

MB-27666: Hierarchy Search #2224

Uh oh!

Conversation

CascadingRadium commented Sep 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ajroetker commented Nov 25, 2025

Uh oh!

CascadingRadium commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ajroetker commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

CascadingRadium commented Sep 6, 2025 •

edited

Loading

CascadingRadium commented Nov 26, 2025 •

edited

Loading