Skip to content

Conversation

@sam22ridhi
Copy link
Contributor

@sam22ridhi sam22ridhi commented Dec 24, 2025

This PR enhances the KnowledgeSpace search pipeline by improving both performance and query understanding.

The existing string-matching logic has been replaced with RapidFuzz for faster and more accurate fuzzy matching. In addition, an LLM-based query parsing step has been introduced to extract structured metadata filters from natural language queries. This allows the search system to move beyond raw keyword matching and return more relevant results from sources such as EBRAINS and OpenNeuro.

Key Changes

Switched to RapidFuzz for fuzzy matching

  • Replaced Python’s built-in difflib with RapidFuzz.
  • RapidFuzz provides faster, C++-optimized fuzzy matching with improved Levenshtein-based ranking.
  • This change improves scalability and prepares the system for larger dataset indexes.

LLM-based filter extraction

  • Added a new preprocessing step that uses the LLM to extract structured filters from natural language queries.
  • This enables intent-aware search rather than full-string matching.

Examples:

  • "EEG data"{ "technique": "EEG" }
  • "MRI data for Alzheimer's"{ "modality": "MRI", "disease": "Alzheimer's" }

Bug fix in KSSearchAgent

  • Fixed an edge case in KSSearchAgent that could result in incomplete results under certain input conditions.

Testing & Verification

  • Tested locally against KnowledgeSpace API endpoints.
  • Verified correct filter extraction and ranking behavior.

Test query: "EEG data"

  • Filters correctly extracted (technique = EEG).
  • Relevant datasets from EBRAINS and OpenNeuro ranked at the top.
  • End-to-end response time remained under 4 seconds with the added LLM step.

For testing purposes

Tested locally with queries like “EEG data”. Filters were correctly extracted and relevant EEG datasets from EBRAINS and OpenNeuro were ranked higher and within 4sec response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant