Skip to content

Conversation

@CascadingRadium
Copy link
Member

@CascadingRadium CascadingRadium commented Sep 6, 2025

Add support for nested fields in indexing and querying

  • Parse and index nested JSON objects
  • Enable queries on nested fields
  • Preserve hierarchical relationships in index

Requires:

Resolves:

@CascadingRadium CascadingRadium changed the title Nested Fields [WIP] MB-27666: Nested Fields Sep 13, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Add support for nested fields in indexing and querying to enable hierarchical document structures and queries on nested objects. This enhances Bleve's search capabilities by supporting complex nested JSON document structures.

  • Parse and index nested JSON objects with preserved hierarchical relationships
  • Enable conjunction queries on nested fields with proper document matching across hierarchical levels
  • Implement nested-aware collectors, searchers, and mapping functionality

Reviewed Changes

Copilot reviewed 27 out of 27 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
search_test.go Comprehensive test suite for nested field querying scenarios
search_knn.go Added nested-aware KNN and eligible collectors
search/util.go Introduced FieldSet utility for field prefix matching
search/searcher/*.go Updated searchers to use new ID comparison methods
search/searcher/search_conjunction_nested.go New nested conjunction searcher implementation
search/search.go Extended DocumentMatch with descendant tracking
search/scorer/scorer_conjunction_nested.go New scorer for nested conjunction queries
search/query/*.go Updated query types and field extraction for nested support
search/highlight/highlighter/simple/highlighter_simple.go Modified to use new fragment addition method
search/explanation.go Added merge utilities for explanations and score breakdowns
search/collector/*.go Enhanced collectors with nested document support
mapping/*.go Added nested mapping interfaces and field tracking
index_impl.go Integrated nested-aware collectors and field loading
index/scorch/*.go Updated index readers to support nested document hierarchy
document/document.go Extended Document with nested document support
Comments suppressed due to low confidence (1)

search/searcher/search_conjunction_nested.go:1

  • Parameter name 'somOfK' appears to be a typo. It should likely be 'sumOfK' to match the usage pattern seen elsewhere in the codebase.
//  Copyright (c) 2025 Couchbase, Inc.

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 31 out of 31 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@CascadingRadium CascadingRadium changed the title MB-27666: Nested Fields MB-27666: Hierarchy Search Nov 22, 2025
@ajroetker
Copy link
Contributor

It'd be awesome to be able to say which all of the nested documents matched as well ("highlighting" if you will for the nested docs). Useful if you we're wanting to chunk a large fields into a set of child documents and then ask which chunks specifically matched.

I think you'd need a new field to DocumentMatch that explicitly reports which nested documents matched:

type DocumentMatch struct {
    // ... existing fields ...

    // Reports which nested documents matched
    NestedMatches   NestedDocumentMatches  `json:"nested_matches,omitempty"` // NEW
}

// Maps field path -> array of matched nested document indices
type NestedDocumentMatches map[string][]NestedDocumentMatch

type NestedDocumentMatch struct {
    ArrayPositions []uint64              `json:"array_positions"`  // e.g., [0, 2] for chunks[0].sections[2]
    MatchedTerms   []string              `json:"matched_terms"`    // Terms that matched in this nested doc
    Score          float64               `json:"score,omitempty"`  // Optional: score contribution
    Fields         map[string]interface{} `json:"fields,omitempty"` // Optional: field values from this nested doc
}

And to add a flag to SearchRequest to request nested document match information:

type SearchRequest struct {
    // ... existing fields ...
   NestedMatches  *NestedMatchRequest  // NEW
}

type NestedMatchRequest struct {
    Fields         []string  // Which fields to report nested matches for
    MaxPerField    int       // Max nested docs to return per field
    IncludeFields  []string  // Optional: field values to include from matched nested docs
}

@CascadingRadium
Copy link
Member Author

CascadingRadium commented Nov 26, 2025

hi @ajroetker, this support is already added. The nested document matches will be present in the main document match. The Fields and Highlight elements will be present inside a NestedDocumentMatch object inside the a special _$nested key in the main document match's fields. PFA a sample hit from a nested query.

The main request's fields query can include any field from the nested document. Doesn't have to be in a separate struct.

{
  "index": "/var/folders/4r/qwn5bjb10mj_mv4znz0ypblw0000gp/T/bleve-testidx3986533520",
  "id": "2",
  "score": 1.8790196309110014,
  "locations": {
    "company.departments.employees.name": {
      "frank": [
        {
          "pos": 1,
          "start": 0,
          "end": 5,
          "array_positions": [
            1,
            0
          ]
        }
      ]
    },
    "company.departments.employees.role": {
      "manager": [
        {
          "pos": 1,
          "start": 0,
          "end": 7,
          "array_positions": [
            1,
            0
          ]
        }
      ]
    },
    "company.departments.name": {
      "engineering": [
        {
          "pos": 1,
          "start": 0,
          "end": 11,
          "array_positions": [
            1
          ]
        }
      ]
    },
    "company.locations.city": {
      "london": [
        {
          "pos": 1,
          "start": 0,
          "end": 6,
          "array_positions": [
            1
          ]
        }
      ]
    },
    "company.locations.country": {
      "uk": [
        {
          "pos": 1,
          "start": 0,
          "end": 2,
          "array_positions": [
            1
          ]
        }
      ]
    }
  },
  "sort": [
    "2"
  ],
  "decoded_sort": [
    "2"
  ],
  "fields": {
    "_$nested": [
      {
        "fields": {
          "company.departments.employees.name": "Frank",
          "company.departments.employees.role": "Manager"
        },
        "fragments": {
          "company.departments.employees.name": [
            "<mark>Frank</mark>"
          ],
          "company.departments.employees.role": [
            "<mark>Manager</mark>"
          ]
        }
      },
      {
        "fields": {
          "company.departments.budget": 800000,
          "company.departments.name": "Engineering"
        },
        "fragments": {
          "company.departments.name": [
            "<mark>Engineering</mark>"
          ]
        }
      },
      {
        "fields": {
          "company.locations.city": "London",
          "company.locations.country": "UK"
        },
        "fragments": {
          "company.locations.city": [
            "<mark>London</mark>"
          ],
          "company.locations.country": [
            "<mark>UK</mark>"
          ]
        }
      }
    ],
    "company.id": "c2",
    "company.name": "BizInc"
  }
}

@ajroetker
Copy link
Contributor

That's awesome! Way cleaner too. Looking forward to this one, especially useful for chunking smaller documents into embeddable chunks without having to duplicate metadata for every one.

@CascadingRadium CascadingRadium changed the base branch from master to cosineFix December 5, 2025 09:44
…2260)

- When indexing multi-vector fields (e.g., `[[3,0,0], [0,4,0]]`) with
`cosine` similarity, normalization was incorrectly applied to the entire
flattened array instead of each sub-vector independently, resulting in
degraded similarity scores.
- Added `NormalizeMultiVector(vec, dims)` that normalizes each
sub-vector separately, fixing scores for multi-vector documents (e.g.,
score now correctly returns 1.0 instead of 0.6 for exact matches).
Base automatically changed from cosineFix to knnDup December 8, 2025 06:18
CascadingRadium and others added 8 commits December 8, 2025 13:37
…2260)

- When indexing multi-vector fields (e.g., `[[3,0,0], [0,4,0]]`) with
`cosine` similarity, normalization was incorrectly applied to the entire
flattened array instead of each sub-vector independently, resulting in
degraded similarity scores.
- Added `NormalizeMultiVector(vec, dims)` that normalizes each
sub-vector separately, fixing scores for multi-vector documents (e.g.,
score now correctly returns 1.0 instead of 0.6 for exact matches).
@CascadingRadium CascadingRadium marked this pull request as draft December 11, 2025 12:01
@CascadingRadium CascadingRadium changed the base branch from knnDup to master December 11, 2025 12:02
@CascadingRadium CascadingRadium marked this pull request as ready for review December 11, 2025 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants