-
Notifications
You must be signed in to change notification settings - Fork 697
MB-27666: Hierarchy Search #2224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
2e20e37 to
4c3fdfa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Add support for nested fields in indexing and querying to enable hierarchical document structures and queries on nested objects. This enhances Bleve's search capabilities by supporting complex nested JSON document structures.
- Parse and index nested JSON objects with preserved hierarchical relationships
- Enable conjunction queries on nested fields with proper document matching across hierarchical levels
- Implement nested-aware collectors, searchers, and mapping functionality
Reviewed Changes
Copilot reviewed 27 out of 27 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| search_test.go | Comprehensive test suite for nested field querying scenarios |
| search_knn.go | Added nested-aware KNN and eligible collectors |
| search/util.go | Introduced FieldSet utility for field prefix matching |
| search/searcher/*.go | Updated searchers to use new ID comparison methods |
| search/searcher/search_conjunction_nested.go | New nested conjunction searcher implementation |
| search/search.go | Extended DocumentMatch with descendant tracking |
| search/scorer/scorer_conjunction_nested.go | New scorer for nested conjunction queries |
| search/query/*.go | Updated query types and field extraction for nested support |
| search/highlight/highlighter/simple/highlighter_simple.go | Modified to use new fragment addition method |
| search/explanation.go | Added merge utilities for explanations and score breakdowns |
| search/collector/*.go | Enhanced collectors with nested document support |
| mapping/*.go | Added nested mapping interfaces and field tracking |
| index_impl.go | Integrated nested-aware collectors and field loading |
| index/scorch/*.go | Updated index readers to support nested document hierarchy |
| document/document.go | Extended Document with nested document support |
Comments suppressed due to low confidence (1)
search/searcher/search_conjunction_nested.go:1
- Parameter name 'somOfK' appears to be a typo. It should likely be 'sumOfK' to match the usage pattern seen elsewhere in the codebase.
// Copyright (c) 2025 Couchbase, Inc.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
82de66e to
8172f3b
Compare
12aa9e3 to
0182ae8
Compare
0d1f64e to
3aa46b4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 31 out of 31 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
5c4961b to
a30a7a2
Compare
|
It'd be awesome to be able to say which all of the nested documents matched as well ("highlighting" if you will for the nested docs). Useful if you we're wanting to chunk a large fields into a set of child documents and then ask which chunks specifically matched. I think you'd need a new field to DocumentMatch that explicitly reports which nested documents matched: type DocumentMatch struct {
// ... existing fields ...
// Reports which nested documents matched
NestedMatches NestedDocumentMatches `json:"nested_matches,omitempty"` // NEW
}
// Maps field path -> array of matched nested document indices
type NestedDocumentMatches map[string][]NestedDocumentMatch
type NestedDocumentMatch struct {
ArrayPositions []uint64 `json:"array_positions"` // e.g., [0, 2] for chunks[0].sections[2]
MatchedTerms []string `json:"matched_terms"` // Terms that matched in this nested doc
Score float64 `json:"score,omitempty"` // Optional: score contribution
Fields map[string]interface{} `json:"fields,omitempty"` // Optional: field values from this nested doc
}And to add a flag to SearchRequest to request nested document match information: type SearchRequest struct {
// ... existing fields ...
NestedMatches *NestedMatchRequest // NEW
}
type NestedMatchRequest struct {
Fields []string // Which fields to report nested matches for
MaxPerField int // Max nested docs to return per field
IncludeFields []string // Optional: field values to include from matched nested docs
} |
|
hi @ajroetker, this support is already added. The nested document matches will be present in the main document match. The The main request's fields query can include any field from the nested document. Doesn't have to be in a separate struct. {
"index": "/var/folders/4r/qwn5bjb10mj_mv4znz0ypblw0000gp/T/bleve-testidx3986533520",
"id": "2",
"score": 1.8790196309110014,
"locations": {
"company.departments.employees.name": {
"frank": [
{
"pos": 1,
"start": 0,
"end": 5,
"array_positions": [
1,
0
]
}
]
},
"company.departments.employees.role": {
"manager": [
{
"pos": 1,
"start": 0,
"end": 7,
"array_positions": [
1,
0
]
}
]
},
"company.departments.name": {
"engineering": [
{
"pos": 1,
"start": 0,
"end": 11,
"array_positions": [
1
]
}
]
},
"company.locations.city": {
"london": [
{
"pos": 1,
"start": 0,
"end": 6,
"array_positions": [
1
]
}
]
},
"company.locations.country": {
"uk": [
{
"pos": 1,
"start": 0,
"end": 2,
"array_positions": [
1
]
}
]
}
},
"sort": [
"2"
],
"decoded_sort": [
"2"
],
"fields": {
"_$nested": [
{
"fields": {
"company.departments.employees.name": "Frank",
"company.departments.employees.role": "Manager"
},
"fragments": {
"company.departments.employees.name": [
"<mark>Frank</mark>"
],
"company.departments.employees.role": [
"<mark>Manager</mark>"
]
}
},
{
"fields": {
"company.departments.budget": 800000,
"company.departments.name": "Engineering"
},
"fragments": {
"company.departments.name": [
"<mark>Engineering</mark>"
]
}
},
{
"fields": {
"company.locations.city": "London",
"company.locations.country": "UK"
},
"fragments": {
"company.locations.city": [
"<mark>London</mark>"
],
"company.locations.country": [
"<mark>UK</mark>"
]
}
}
],
"company.id": "c2",
"company.name": "BizInc"
}
} |
|
That's awesome! Way cleaner too. Looking forward to this one, especially useful for chunking smaller documents into embeddable chunks without having to duplicate metadata for every one. |
…2260) - When indexing multi-vector fields (e.g., `[[3,0,0], [0,4,0]]`) with `cosine` similarity, normalization was incorrectly applied to the entire flattened array instead of each sub-vector independently, resulting in degraded similarity scores. - Added `NormalizeMultiVector(vec, dims)` that normalizes each sub-vector separately, fixing scores for multi-vector documents (e.g., score now correctly returns 1.0 instead of 0.6 for exact matches).
…2260) - When indexing multi-vector fields (e.g., `[[3,0,0], [0,4,0]]`) with `cosine` similarity, normalization was incorrectly applied to the entire flattened array instead of each sub-vector independently, resulting in degraded similarity scores. - Added `NormalizeMultiVector(vec, dims)` that normalizes each sub-vector separately, fixing scores for multi-vector documents (e.g., score now correctly returns 1.0 instead of 0.6 for exact matches).
Add support for nested fields in indexing and querying
Requires:
Resolves: