-
Notifications
You must be signed in to change notification settings - Fork 68
Description
Problem
How it's done right now - embeddings are useless.
And context for sure is not "relevant".
When using review_file - metis uses only filename for finding relevant embedding (gathering context):
retrieve_context: |-
You are a senior software engineer and your task is to explain what the following FILE does and what its purpose is.
Include any data flows, function calls, or dependencies that could impact security.
Make sure you flag any function input that is externally controlled.
FILE: {file_path}When using ask - metis just uses the unchanged search query for embeddings. When the search query is general - results are not "relevant" (e.g. "How does this codebase work?")
Proposed Solution
I propose a new way of providing "Relevant Context" - give LLM the ability to call a tool until it finds enough "Relevant Context".
So it's a RAG - Retrieval-Augmented Generation.
Instead of manually providing irrelevant, generic context - LLM would use the tool to search embeddings for relevant context using natural language.
When implemented, metis would use LLM to generate search queries that would provide relevant context.
When asked with How does this codebase work? LLM would be prompted with:
You are a security engineer tasked with doing security code review for a project.
You have access to embedding search tool.
User asks "How does this codebase work?.
What embedding search queries would you use to generate relevant context to best answer user question.
And then would call the embeddings search tool using natural language search queries:
- Architecture documentation
- Main entrypoint file
- Authorization endpoint
- env configuration api base url
...
The part of gathering different information could be implemented with tasks and subagents - and the results from all the subagents would be combined into a whole Relevant Context.
Then LLM would decide - "Do I have enough Relevant Context to answer a question?" or "Do I have all the required Context to do a security code review for the following file?"
And only after having enough context metis would approach doing a security code review.
Also, multiple tools could be provided for the LLM to use:
- embeddings search tool
- grep tool
- web search tool
...