Configurable prompts + reveal chat context#35
Conversation
WalkthroughThe changes involve modifications to several classes within the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant KnowledgeGraph
participant ChatSession
participant GraphQueryGenerationStep
participant QAStep
User->>KnowledgeGraph: chat_session(cypher_system_instruction, qa_system_instruction, cypher_gen_prompt, qa_prompt)
KnowledgeGraph->>ChatSession: Create with provided parameters
ChatSession->>GraphQueryGenerationStep: Run with cypher_prompt
ChatSession->>QAStep: Run with qa_prompt
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (6)
graphrag_sdk/steps/graph_query_step.py (1)
54-62: Fix indentation consistency.The indentation in the ternary expressions is inconsistent, which affects readability. Ensure consistent indentation throughout the nested expressions.
- cypher_prompt = ( - (CYPHER_GEN_PROMPT.format(question=question) - if self.last_answer is None - else CYPHER_GEN_PROMPT_WITH_HISTORY.format(question=question, last_answer=self.last_answer)) - if error is False - else CYPHER_GEN_PROMPT_WITH_ERROR.format( - question=question, error=error - ) - ) + cypher_prompt = ( + (CYPHER_GEN_PROMPT.format(question=question) + if self.last_answer is None + else CYPHER_GEN_PROMPT_WITH_HISTORY.format( + question=question, + last_answer=self.last_answer + )) + if error is False + else CYPHER_GEN_PROMPT_WITH_ERROR.format( + question=question, + error=error + ) + )graphrag_sdk/chat_session.py (3)
54-55: Add type hints and documentation for new attributes.The new prompt attributes would benefit from type hints and docstring documentation to improve code maintainability.
Add type hints and update the class docstring:
class ChatSession: + cypher_prompt: str | None + qa_prompt: str | None """ Represents a chat session with a Knowledge Graph. Args: model_config (KnowledgeGraphModelConfig): The model configuration to use. ontology (Ontology): The ontology to use. graph (Graph): The graph to query. + cypher_system_instruction (str, optional): Custom system instructions for Cypher generation. + qa_system_instruction (str, optional): Custom system instructions for QA. + cypher_gen_prompt (str, optional): Custom prompt template for Cypher generation. + qa_prompt (str, optional): Custom prompt template for QA.
63-63: Consider consistent instruction handling approaches.The QA system instruction handling uses a different approach compared to the Cypher system instruction handling. Consider using the same pattern for consistency.
- qa_system_instruction or GRAPH_QA_SYSTEM + if qa_system_instruction is None: + qa_system_instruction = GRAPH_QA_SYSTEM + self.qa_chat_session = model_config.qa.with_system_instruction(qa_system_instruction).start_chat()
82-82: Consider caching step instances.The steps are recreated for each message, which could be inefficient for long chat sessions since most parameters remain constant.
Consider creating the steps once in the constructor:
def __init__(self, ...): # ... existing code ... + self.cypher_step = GraphQueryGenerationStep( + graph=self.graph, + chat_session=self.cypher_chat_session, + ontology=self.ontology, + cypher_prompt=self.cypher_prompt, + ) + self.qa_step = QAStep( + chat_session=self.qa_chat_session, + qa_prompt=self.qa_prompt, + ) def send_message(self, message: str): - cypher_step = GraphQueryGenerationStep(...) + self.cypher_step.last_answer = self.last_answer + (context, cypher) = self.cypher_step.run(message) # ... rest of the method ... - qa_step = QAStep(...) + answer = self.qa_step.run(message, cypher, context)Also applies to: 92-92
graphrag_sdk/kg.py (2)
137-141: Update method docstring to document new parametersThe method signature has been updated with new parameters, but the docstring hasn't been updated to reflect these changes. Please add parameter descriptions to help users understand the purpose of each new parameter.
Apply this diff:
def chat_session(self, cypher_system_instruction: str = None, qa_system_instruction: str = None, cypher_gen_prompt: str = None, qa_prompt: str = None) -> ChatSession: + """ + Create a new chat session with optional custom instructions and prompts. + + Parameters: + cypher_system_instruction (str, optional): Custom system instruction for Cypher query generation + qa_system_instruction (str, optional): Custom system instruction for question answering + cypher_gen_prompt (str, optional): Custom prompt template for Cypher query generation + qa_prompt (str, optional): Custom prompt template for question answering + + Returns: + ChatSession: A new chat session instance + """ chat_session = ChatSession(self._model_config, self.ontology, self.graph, cypher_system_instruction, qa_system_instruction, cypher_gen_prompt, qa_prompt) return chat_session
137-141: Consider adding parameter validation and default valuesThe method accepts optional string parameters but doesn't validate them or provide default values. Consider:
- Validating that provided strings are not empty
- Using the imported
GRAPH_QA_SYSTEMandCYPHER_GEN_SYSTEMas default valuesHere's a suggested implementation:
def chat_session(self, cypher_system_instruction: str = None, qa_system_instruction: str = None, cypher_gen_prompt: str = None, qa_prompt: str = None) -> ChatSession: + # Use default system instructions if not provided + cypher_system_instruction = cypher_system_instruction or CYPHER_GEN_SYSTEM + qa_system_instruction = qa_system_instruction or GRAPH_QA_SYSTEM + + # Validate non-empty strings if provided + if cypher_gen_prompt is not None and not cypher_gen_prompt.strip(): + raise ValueError("cypher_gen_prompt cannot be empty") + if qa_prompt is not None and not qa_prompt.strip(): + raise ValueError("qa_prompt cannot be empty") + chat_session = ChatSession(self._model_config, self.ontology, self.graph, cypher_system_instruction, qa_system_instruction, cypher_gen_prompt, qa_prompt) return chat_session
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (4)
graphrag_sdk/chat_session.py(4 hunks)graphrag_sdk/kg.py(1 hunks)graphrag_sdk/steps/graph_query_step.py(1 hunks)graphrag_sdk/steps/qa_step.py(1 hunks)
🔇 Additional comments (5)
graphrag_sdk/steps/qa_step.py (2)
20-20: LGTM! Clean implementation of configurable prompts
The addition of the optional qa_prompt parameter with proper type hinting and None default maintains backward compatibility while enabling prompt customization.
Also applies to: 24-24
27-32: Verify format string compatibility with custom prompts
The code assumes that any custom qa_prompt will contain the format placeholders {context}, {cypher}, and {question}. If a custom prompt is provided without these placeholders, it will raise a KeyError at runtime.
Let's verify the format string requirements are documented:
Consider adding validation for required placeholders:
def run(self, question: str, cypher: str, context: str):
graph_qa_prompt = self.qa_prompt or GRAPH_QA_SYSTEM
+ required_placeholders = ["{context}", "{cypher}", "{question}"]
+ if self.qa_prompt and not all(p in self.qa_prompt for p in required_placeholders):
+ raise ValueError(
+ f"Custom qa_prompt must contain all placeholders: {required_placeholders}"
+ )
qa_prompt = graph_qa_prompt.format(
context=context, cypher=cypher, question=question
)graphrag_sdk/steps/graph_query_step.py (1)
36-36: LGTM! Clean implementation of the configurable prompt.
The addition of the optional cypher_prompt parameter and its corresponding instance variable is well-implemented, following Python's best practices.
Also applies to: 43-43
graphrag_sdk/chat_session.py (2)
28-30: LGTM! Well-structured parameter additions.
The new optional parameters are well-named and properly typed, maintaining backward compatibility while adding configurability.
28-30: Verify consistent parameter usage across the codebase.
The new parameters should be consistently handled in related classes and methods.
✅ Verification successful
Let me gather more specific information about the parameter usage.
Parameters are correctly propagated and used throughout the codebase
The verification shows that:
- The new parameters are properly handled in
KnowledgeGraph.chat_session()method which correctly forwards them toChatSession ChatSessionhas proper fallback logic forcypher_system_instructionandqa_system_instruction- The parameters are used in the expected way with
GraphQueryGenerationStepandQAStepclasses - No inconsistencies found in parameter usage across different model implementations (Gemini, OpenAI, Azure, Ollama)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for related class instantiations and verify parameter handling
echo "Searching for ChatSession instantiations..."
rg -l "ChatSession\("
echo "\nSearching for related step classes..."
rg -l "class.*Step"
echo "\nChecking GraphQueryGenerationStep and QAStep parameter handling..."
ast-grep --pattern 'class GraphQueryGenerationStep {
$$$
def __init__($$$) {
$$$
}
$$$
}'
ast-grep --pattern 'class QAStep {
$$$
def __init__($$$) {
$$$
}
$$$
}'
Length of output: 1564
Script:
#!/bin/bash
# Check ChatSession instantiations and their parameters
echo "Checking ChatSession instantiations..."
rg "ChatSession\(" -A 5
echo "\nChecking GraphQueryGenerationStep implementation..."
rg "class GraphQueryGenerationStep" -A 10 graphrag_sdk/steps/graph_query_step.py
echo "\nChecking QAStep implementation..."
rg "class QAStep" -A 10 graphrag_sdk/steps/qa_step.py
echo "\nChecking where cypher_system_instruction and qa_system_instruction are used..."
rg "cypher_system_instruction|qa_system_instruction"
Length of output: 6099
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (1)
graphrag_sdk/steps/graph_query_step.py (1)
45-45: Replace unused loop variable.The loop control variable
iis not used within the loop body.- for i in range(retries): + for _ in range(retries):🧰 Tools
🪛 Ruff
45-45: Loop control variable
inot used within loop bodyRename unused
ito_i(B007)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
graphrag_sdk/steps/graph_query_step.py(1 hunks)
🧰 Additional context used
🪛 Ruff
graphrag_sdk/steps/graph_query_step.py
45-45: Loop control variable i not used within loop body
Rename unused i to _i
(B007)
🔇 Additional comments (1)
graphrag_sdk/steps/graph_query_step.py (1)
34-34: LGTM! Clean addition of configurable prompt parameter.
The new optional parameter and its initialization follow Python best practices.
Also applies to: 41-41
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (3)
graphrag_sdk/chat_session.py (3)
28-30: Update docstring to include new parametersThe constructor's docstring needs to be updated to document the new parameters:
cypher_system_instructionqa_system_instructioncypher_gen_promptqa_prompt
54-55: Add input validation for prompt parametersConsider adding validation for the prompt parameters to ensure they are not empty strings when provided. Empty prompts could lead to unexpected behavior.
+ if cypher_gen_prompt is not None and not cypher_gen_prompt.strip(): + raise ValueError("cypher_gen_prompt cannot be empty") + if qa_prompt is not None and not qa_prompt.strip(): + raise ValueError("qa_prompt cannot be empty") self.cypher_prompt = cypher_gen_prompt self.qa_prompt = qa_prompt
59-59: Standardize system instruction handlingThe code handles cypher and qa system instructions differently:
- Cypher instruction is modified with ontology information
- QA instruction uses simple fallback to
GRAPH_QA_SYSTEMConsider standardizing this approach for better maintainability.
Also applies to: 63-64
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
graphrag_sdk/chat_session.py(4 hunks)
🔇 Additional comments (2)
graphrag_sdk/chat_session.py (2)
49-52: Previous review comment about standardizing ontology injection is still applicable
82-82: Verify step handling of None prompts
Please ensure that both GraphQueryGenerationStep and QAStep handle None prompts gracefully. Consider documenting the expected behavior when prompts are not provided.
Also applies to: 92-92
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (11)
graphrag_sdk/steps/graph_query_step.py (2)
30-31: Add input validation for prompt parameters.While the new prompt parameters are well-structured, consider adding validation to ensure they contain the required format placeholders (
{question}and{last_answer}).def __init__( self, graph: Graph, ontology: Ontology, chat_session: GenerativeModelChatSession, config: dict = None, last_answer: str = None, cypher_prompt: str = None, cypher_prompt_with_history: str = None, ) -> None: self.ontology = ontology self.config = config or {} self.graph = graph self.chat_session = chat_session self.last_answer = last_answer + if cypher_prompt and '{question}' not in cypher_prompt: + raise ValueError("cypher_prompt must contain {question} placeholder") + if cypher_prompt_with_history and ('{question}' not in cypher_prompt_with_history or + '{last_answer}' not in cypher_prompt_with_history): + raise ValueError("cypher_prompt_with_history must contain both {question} and {last_answer} placeholders") self.cypher_prompt = cypher_prompt self.cypher_prompt_with_history = cypher_prompt_with_historyAlso applies to: 38-39
43-43: Rename unused loop variable.The loop variable
iis not used within the loop body. Following Python conventions, rename it to_ito indicate it's intentionally unused.- for i in range(retries): + for _i in range(retries):🧰 Tools
🪛 Ruff (0.7.0)
43-43: Loop control variable
inot used within loop bodyRename unused
ito_i(B007)
graphrag_sdk/chat_session.py (1)
50-53: Add type hints and validate prompt stringsThe new prompt attributes lack type hints and input validation. Also, there are unnecessary empty lines.
Consider adding type hints and validation:
- - self.cypher_prompt = cypher_gen_prompt - self.qa_prompt = qa_prompt - self.cypher_prompt_with_history = cypher_gen_prompt_history - + # Add type hints + self.cypher_prompt: str = self._validate_prompt(cypher_gen_prompt, "cypher_gen_prompt") + self.qa_prompt: str = self._validate_prompt(qa_prompt, "qa_prompt") + self.cypher_prompt_with_history: str = self._validate_prompt( + cypher_gen_prompt_history, "cypher_gen_prompt_history" + ) + + @staticmethod + def _validate_prompt(prompt: str, name: str) -> str: + """Validates that the prompt is a non-empty string.""" + if not isinstance(prompt, str) or not prompt.strip(): + raise ValueError(f"{name} must be a non-empty string") + return prompt.strip()README.md (2)
194-217: Enhance prompt documentation with examples and validation detailsThe documentation of configurable prompts could be improved by:
- Adding example values for each prompt type
- Highlighting required placeholders more prominently
- Including validation warnings for missing placeholders
Consider adding a code block with actual prompt examples:
# Example prompt values cypher_system_instruction = """ Given the following ontology: #ONTOLOGY Generate Cypher queries to answer questions about the data. """ cypher_gen_prompt = """ Based on the question: {question} Generate a Cypher query to retrieve relevant information. """ # Add similar examples for other prompt types...🧰 Tools
🪛 Markdownlint (0.35.0)
200-200: Expected: asterisk; Actual: dash
Unordered list style(MD004, ul-style)
201-201: Expected: asterisk; Actual: dash
Unordered list style(MD004, ul-style)
204-204: Expected: asterisk; Actual: dash
Unordered list style(MD004, ul-style)
207-207: Expected: asterisk; Actual: dash
Unordered list style(MD004, ul-style)
208-208: Expected: asterisk; Actual: dash
Unordered list style(MD004, ul-style)
211-211: Expected: asterisk; Actual: dash
Unordered list style(MD004, ul-style)
212-212: Expected: asterisk; Actual: dash
Unordered list style(MD004, ul-style)
215-215: Expected: asterisk; Actual: dash
Unordered list style(MD004, ul-style)
216-216: Expected: asterisk; Actual: dash
Unordered list style(MD004, ul-style)
199-216: Maintain consistent markdown list styleThe unordered lists in the documentation use dash (-) markers inconsistently. Consider using asterisks (*) throughout for better consistency.
Apply this change to the list items:
- - System instructions for the Cypher generation step. + * System instructions for the Cypher generation step.🧰 Tools
🪛 Markdownlint (0.35.0)
200-200: Expected: asterisk; Actual: dash
Unordered list style(MD004, ul-style)
201-201: Expected: asterisk; Actual: dash
Unordered list style(MD004, ul-style)
204-204: Expected: asterisk; Actual: dash
Unordered list style(MD004, ul-style)
207-207: Expected: asterisk; Actual: dash
Unordered list style(MD004, ul-style)
208-208: Expected: asterisk; Actual: dash
Unordered list style(MD004, ul-style)
211-211: Expected: asterisk; Actual: dash
Unordered list style(MD004, ul-style)
212-212: Expected: asterisk; Actual: dash
Unordered list style(MD004, ul-style)
215-215: Expected: asterisk; Actual: dash
Unordered list style(MD004, ul-style)
216-216: Expected: asterisk; Actual: dash
Unordered list style(MD004, ul-style)
graphrag_sdk/fixtures/prompts.py (6)
Line range hint
1-2: Fix syntax error due to extra quotation mark inCREATE_ONTOLOGY_SYSTEMThere is an extra double-quote character at the end of the line defining
## 1. Overview, which may cause a syntax error.Apply this diff to fix the syntax error:
- CREATE_ONTOLOGY_SYSTEM = """ - ## 1. Overview\n" + CREATE_ONTOLOGY_SYSTEM = """ + ## 1. Overview\n
Line range hint
58-58: Correct unbalanced quotation marks in entity labeling instructionsIn the instructions under Consistency, the quotation marks are unbalanced around the example terms.
Apply this diff to correct the quotation marks:
- Avoid using more specific terms "like 'mathematician' or 'scientist'" + Avoid using more specific terms like 'mathematician' or 'scientist'.
Line range hint
47-47: Remove duplicate 'unique' key in the example JSONIn the example JSON, the
uniquekey is duplicated for theageattribute of thePersonentity.Apply this diff to fix the duplication:
- {"name":"age","type":"number","unique":false,"unique":false} + {"name":"age","type":"number","unique":false,"required":false}
Line range hint
221-221: Correct singular 'relation' to plural 'relations' for consistencyIn
EXTRACT_DATA_SYSTEM, change 'relation' to 'relations' for grammatical consistency.Apply this diff:
- Use only the provided entities, relation, and attributes in the ontology. + Use only the provided entities, relations, and attributes in the ontology.
Line range hint
163-166: Clarify instructions regarding attributes conversion inCREATE_ONTOLOGY_PROMPTThe instruction "Attributes should be extracted as entities or relations whenever possible" may be confusing. Attributes are typically properties of entities or relationships, and converting them into entities or relations might not always be appropriate.
Consider rephrasing for clarity, such as:
- Attributes should be extracted as entities or relations whenever possible. + Consider representing attributes as entities or relations if they have significant importance or require further detail.
Line range hint
373-374: Update example inORCHESTRATOR_SYSTEMto match available agentsIn the example provided within
ORCHESTRATOR_SYSTEM, agents likeBaggageAgentandSpecialItemsAgentare mentioned, which may not be relevant to the context of this SDK.Consider updating the example to match agents relevant to the knowledge graph domain for better clarity.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (6)
README.md(2 hunks)graphrag_sdk/chat_session.py(5 hunks)graphrag_sdk/fixtures/prompts.py(2 hunks)graphrag_sdk/kg.py(5 hunks)graphrag_sdk/steps/graph_query_step.py(2 hunks)graphrag_sdk/steps/qa_step.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- graphrag_sdk/kg.py
- graphrag_sdk/steps/qa_step.py
🧰 Additional context used
🪛 Markdownlint (0.35.0)
README.md
200-200: Expected: asterisk; Actual: dash
Unordered list style
(MD004, ul-style)
201-201: Expected: asterisk; Actual: dash
Unordered list style
(MD004, ul-style)
204-204: Expected: asterisk; Actual: dash
Unordered list style
(MD004, ul-style)
207-207: Expected: asterisk; Actual: dash
Unordered list style
(MD004, ul-style)
208-208: Expected: asterisk; Actual: dash
Unordered list style
(MD004, ul-style)
211-211: Expected: asterisk; Actual: dash
Unordered list style
(MD004, ul-style)
212-212: Expected: asterisk; Actual: dash
Unordered list style
(MD004, ul-style)
215-215: Expected: asterisk; Actual: dash
Unordered list style
(MD004, ul-style)
216-216: Expected: asterisk; Actual: dash
Unordered list style
(MD004, ul-style)
🪛 Ruff (0.7.0)
graphrag_sdk/steps/graph_query_step.py
43-43: Loop control variable i not used within loop body
Rename unused i to _i
(B007)
🔇 Additional comments (5)
graphrag_sdk/chat_session.py (1)
80-81: Verify step parameter names
Please ensure that the parameter names match those expected by the GraphQueryGenerationStep and QAStep classes.
Also applies to: 91-91
README.md (1)
173-175: Verify consistent terminology across documentation
The rename from "Knowledge Graph" to "Knowledge Graph Agent" better reflects the agent-based architecture. However, we should ensure this terminology is consistent across all documentation and code comments.
✅ Verification successful
Terminology is consistently used across the codebase
The term "Knowledge Graph" and its variations (including "Knowledge Graph Agent") are used consistently throughout the codebase. The README change aligns with the existing implementation where:
- The core class is named
KnowledgeGraph - The agent implementation exists in
agents/kg_agent.pyasKGAgent - Documentation consistently uses "Knowledge Graph (KG)" terminology
- The new "Knowledge Graph Agent" section accurately reflects the agent-based architecture while maintaining backward compatibility
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for inconsistent terminology in documentation and code comments
rg -i "knowledge\s+graph(?!\s+agent)" --type md --type py
Length of output: 309
Script:
#!/bin/bash
# Search for "Knowledge Graph" terminology variations in documentation and code comments
rg -i "knowledge.*graph" --type md --type py
Length of output: 9943
graphrag_sdk/fixtures/prompts.py (3)
Line range hint 373-374: Ensure placeholders like #BACKSTORY and #AGENTS are appropriately handled
In ORCHESTRATOR_SYSTEM, placeholders such as #BACKSTORY and #AGENTS are present. Ensure these placeholders are properly replaced or managed when the prompts are used to prevent confusion.
445-445:
Remove unintended line number annotation in CYPHER_GEN_PROMPT
There is an unintended line number annotation 445~ within the CYPHER_GEN_PROMPT constant.
Apply this diff to remove the annotation:
- 445~
If you cannot generate a OpenCypher statement for any reason, return an empty response.Likely invalid or redundant comment.
486-486:
Remove unintended line number annotation in GRAPH_QA_SYSTEM
There is an unintended line number annotation 486~ within the GRAPH_QA_SYSTEM constant.
Apply this diff to remove the annotation:
- 486~
Here is an example:Likely invalid or redundant comment.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (6)
tests/test_multi_agent.py (6)
1-16: Consider adjusting logging level and adding module documentation.A few suggestions for improvement:
- Consider using a less verbose logging level for tests (e.g., INFO instead of DEBUG)
- Add a module docstring explaining the purpose of these tests
Add this documentation at the beginning of the file:
+""" +Integration tests for the multi-agent system, validating the interaction between +KnowledgeGraph agents and the Orchestrator for complex query scenarios. +"""
Line range hint
18-21: Add class documentation.The test class should have a docstring explaining its purpose and test coverage.
Add this documentation:
class TestMultiAgent(unittest.TestCase): + """ + Test suite for validating multi-agent interactions in complex scenarios. + + This class tests the collaboration between restaurant and attraction agents + orchestrated to generate comprehensive travel itineraries. + """
Line range hint
23-156: Refactor ontology setup to reduce code duplication.The ontology setup contains significant duplication between restaurants and attractions. Consider extracting common entities and attributes into helper methods.
Here's a suggested refactoring approach:
def create_base_ontology(): """Create a base ontology with common entities.""" ontology = Ontology() ontology.add_entity( Entity( label="Country", attributes=[ Attribute( name="name", attr_type=AttributeType.STRING, required=True, unique=True, ), ], ) ) ontology.add_entity( Entity( label="City", attributes=[ Attribute(name="name", attr_type=AttributeType.STRING, required=True, unique=True), Attribute(name="weather", attr_type=AttributeType.STRING), Attribute(name="population", attr_type=AttributeType.NUMBER), ], ) ) ontology.add_relation( Relation( label="IN_COUNTRY", source="City", target="Country", ) ) return ontology @classmethod def setUpClass(cls): # Create base ontologies cls.restaurants_ontology = create_base_ontology() cls.attractions_ontology = create_base_ontology() # Add specific entities cls.restaurants_ontology.add_entity( Entity( label="Restaurant", attributes=[ Attribute(name="name", attr_type=AttributeType.STRING, required=True, unique=True), Attribute(name="description", attr_type=AttributeType.STRING), Attribute(name="rating", attr_type=AttributeType.NUMBER), Attribute(name="food_type", attr_type=AttributeType.STRING), ], ) ) cls.restaurants_ontology.add_relation( Relation( label="IN_CITY", source="Restaurant", target="City", ) ) cls.attractions_ontology.add_entity( Entity( label="Attraction", attributes=[ Attribute(name="name", attr_type=AttributeType.STRING, required=True, unique=True), Attribute(name="description", attr_type=AttributeType.STRING), Attribute(name="type", attr_type=AttributeType.STRING), ], ) ) cls.attractions_ontology.add_relation( Relation( label="IN_CITY", source="Attraction", target="City", ) )
Line range hint
158-208: Enhance agent configuration and add error handling.A few suggestions for improvement:
- The agent introductions could be more descriptive and specific
- Add error handling for model initialization
Consider these improvements:
- cls.model = GeminiGenerativeModel("gemini-1.5-flash-001") + try: + cls.model = GeminiGenerativeModel("gemini-1.5-flash-001") + except Exception as e: + logger.error(f"Failed to initialize Gemini model: {e}") + raise cls.restaurants_agent = KGAgent( agent_id="restaurants_agent", kg=cls.restaurants_kg, introduction=""" - I'm a restaurant agent, specialized in finding the best restaurants for you. + I'm a restaurant recommendation specialist with detailed knowledge of dining + establishments, their cuisines, ratings, and locations. I can help you find + the perfect dining spots based on your preferences and location. """, ) cls.attractions_agent = KGAgent( agent_id="attractions_agent", kg=cls.attractions_kg, introduction=""" - I'm an attractions agent, specialized in finding the best attractions for you. + I'm a tourist attractions expert with comprehensive knowledge of landmarks, + museums, historical sites, and entertainment venues. I can help you discover + the most interesting places to visit based on your interests and location. """, )
Line range hint
210-301: Add error handling and data validation for data import.The data import method lacks error handling for file operations and data validation.
Consider these improvements:
+ def validate_city_data(self, city: dict) -> bool: + """Validate city data structure.""" + required_fields = {"name", "weather", "population", "country"} + return all(field in city for field in required_fields) def import_data( self, restaurants_kg: KnowledgeGraph, attractions_kg: KnowledgeGraph, ): + """ + Import test data from JSON files into knowledge graphs. + + Args: + restaurants_kg: Knowledge graph for restaurants + attractions_kg: Knowledge graph for attractions + + Raises: + FileNotFoundError: If data files are missing + ValueError: If data validation fails + """ - with open("tests/data/cities.json") as f: - cities = loads(f.read()) + try: + with open("tests/data/cities.json") as f: + cities = loads(f.read()) + except FileNotFoundError: + logger.error("Cities data file not found") + raise + except json.JSONDecodeError: + logger.error("Invalid JSON in cities data file") + raise + # Validate data structure + for city in cities: + if not self.validate_city_data(city): + raise ValueError(f"Invalid city data structure: {city}")
Line range hint
303-316: Enhance test coverage and assertions.The current test method could be more comprehensive:
- Add more specific assertions for response content
- Include negative test cases
- Test different scenarios
Consider adding these test methods:
def test_multi_agent_invalid_city(self): """Test behavior with invalid city input.""" response = self.orchestrator.ask( "Write me a two-day itinerary for a trip to NonExistentCity." ) assert "no information available" in response.output.lower() def test_multi_agent_specific_requirements(self): """Test with specific dining and attraction requirements.""" response = self.orchestrator.ask( "Write me a two-day itinerary for Rome focusing on historical sites and Italian restaurants." ) assert "historical" in response.output.lower() assert "italian" in response.output.lower() def test_multi_agent_validation(self): """Test response structure and content validation.""" response = self.orchestrator.ask( "Write me a two-day itinerary for Rome." ) # Verify response structure assert "day 1" in response.output.lower() assert "day 2" in response.output.lower() # Verify it includes both restaurants and attractions assert any(r["name"] in response.output for r in self.restaurants) assert any(a["name"] in response.output for a in self.attractions)
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
tests/test_kg_ollama.py (1)
Line range hint
93-99: Consider making the test more robust and deterministic.The current test has several areas that could be improved:
- The assertion threshold (>5) seems arbitrary and might not reflect the actual data
- The regex pattern assumes the answer contains digits, but doesn't validate the response format
- There's no verification that the count matches the actual data in the source file
Consider this improved implementation:
- answer = chat.send_message("Count the number of actors acted in a movie?") - - logger.info(f"Answer: {answer}") - - actors_count = re.findall(r'\d+', answer) - num_actors = 0 if len(actors_count) == 0 else int(actors_count[0]) - - assert num_actors > 5, "The number of actors found should be greater than 5" + # First, get the ground truth + count_query = """ + MATCH (a:Actor)-[:ACTED_IN]->(:Movie) + RETURN COUNT(DISTINCT a) as actor_count + """ + expected_count = self.kg.query(count_query)[0]["actor_count"] + + # Then test the chat response + answer = chat.send_message("Count the number of actors acted in a movie?") + logger.info(f"Answer: {answer}") + + # Extract number from response + actors_count = re.findall(r'\b\d+\b', answer) + self.assertTrue(actors_count, f"Expected numeric count in answer: {answer}") + actual_count = int(actors_count[0]) + + # Verify against ground truth + self.assertEqual( + actual_count, + expected_count, + f"Actor count {actual_count} doesn't match actual count {expected_count}" + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
tests/test_kg_ollama.py(1 hunks)
🔇 Additional comments (2)
tests/test_kg_ollama.py (2)
93-93: LGTM! The rephrased question is clearer and more directive.
The change from "How many" to "Count the number of" makes the question more explicit and imperative, which could potentially lead to more consistent responses from the model.
93-93: Consider testing the new configurable prompts feature.
Given that the ChatSession now supports configurable prompts (as mentioned in the AI summary), this test could be extended to verify this new functionality.
Let's verify the new prompt parameters:
Consider adding test cases for:
- Custom system instructions
- Different prompt templates
- Edge cases with empty/invalid prompts
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (2)
graphrag_sdk/chat_session.py (2)
51-53: Add type hints for prompt attributesThe new prompt attributes should have type hints for better code maintainability and IDE support.
- self.cypher_prompt = cypher_gen_prompt - self.qa_prompt = qa_prompt - self.cypher_prompt_with_history = cypher_gen_prompt_history + self.cypher_prompt: str = cypher_gen_prompt + self.qa_prompt: str = qa_prompt + self.cypher_prompt_with_history: str = cypher_gen_prompt_history
64-64: Implement context management to prevent unbounded growthThe context list grows without bounds as new messages are processed. This could lead to memory issues in long-running chat sessions.
Consider:
- Implementing a maximum size limit for the context list
- Using a circular buffer or removing older entries when the limit is reached
- Adding configuration parameter for context size limit
Example implementation:
+ MAX_CONTEXT_SIZE = 100 # or make this configurable via constructor def __init__(...): - self.context = [] + self.context: list[dict] = [] def send_message(self, message: str): + if len(self.context) >= self.MAX_CONTEXT_SIZE: + self.context.pop(0) # Remove oldest entry self.context.append({"message": message, "cypher": cypher, "context": context})Also applies to: 89-90
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
graphrag_sdk/chat_session.py(4 hunks)graphrag_sdk/kg.py(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- graphrag_sdk/kg.py
🔇 Additional comments (1)
graphrag_sdk/chat_session.py (1)
27-29: Verify that all callers are updated with the new parameters
The constructor signature change requires updates to all code that creates ChatSession instances.
✅ Verification successful
Based on the search results, I can see that ChatSession is only instantiated in one place - in the chat_session() method of the kg.py file, and this caller is already updated with all the required parameters:
chat_session = ChatSession(self._model_config, self.ontology, self.graph, self.cypher_system_instruction,
self.qa_system_instruction, self.cypher_gen_prompt, self.qa_prompt, self.cypher_gen_prompt_history)All other occurrences of "ChatSession" in the codebase refer to the abstract base class GenerativeModelChatSession from the models package, which is a different class.
Changes are properly integrated
The only instantiation point of ChatSession has been updated with all the required parameters.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Find all potential instantiations of ChatSession
rg -l "ChatSession" | while read -r file; do
if [[ "$file" != "graphrag_sdk/chat_session.py" ]]; then
echo "=== $file ==="
rg -A 5 "ChatSession" "$file"
fi
done
Length of output: 12159
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (5)
graphrag_sdk/chat_session.py (1)
81-82: Document prompt requirements and add validationThe prompts are passed to steps without documentation about their expected format or structure.
- Add docstring updates to document prompt requirements:
def send_message(self, message: str): """ Sends a message to the chat session. Args: message (str): The message to send. Returns: str: The response to the message. Note: The cypher_prompt should follow the format: ... The qa_prompt should follow the format: ... The cypher_prompt_with_history should include: ... """
- Add runtime validation in the steps to ensure prompts meet the requirements.
Also applies to: 94-94
README.md (2)
194-232: Consider adding information about default promptsThe documentation for configurable prompts is clear and well-structured. However, it would be helpful to:
- Indicate whether these configurations are optional
- Document the default prompts that are used if custom ones aren't provided
- Provide example prompt templates for each type
Would you like me to help draft this additional documentation?
🧰 Tools
🪛 Markdownlint (0.35.0)
200-200: Expected: asterisk; Actual: dash
Unordered list style(MD004, ul-style)
201-201: Expected: asterisk; Actual: dash
Unordered list style(MD004, ul-style)
204-204: Expected: asterisk; Actual: dash
Unordered list style(MD004, ul-style)
207-207: Expected: asterisk; Actual: dash
Unordered list style(MD004, ul-style)
208-208: Expected: asterisk; Actual: dash
Unordered list style(MD004, ul-style)
211-211: Expected: asterisk; Actual: dash
Unordered list style(MD004, ul-style)
212-212: Expected: asterisk; Actual: dash
Unordered list style(MD004, ul-style)
215-215: Expected: asterisk; Actual: dash
Unordered list style(MD004, ul-style)
216-216: Expected: asterisk; Actual: dash
Unordered list style(MD004, ul-style)
200-216: Maintain consistent list styleFor better markdown consistency, use asterisks (
*) instead of dashes (-) for unordered lists throughout the document.Apply this change to the list items in the Configurable Prompts section:
- - System instructions for the Cypher generation step. + * System instructions for the Cypher generation step. - - **Note:** Ensure your prompt includes `{ontology}`. + * **Note:** Ensure your prompt includes `{ontology}`.(Apply similar changes to other list items)
🧰 Tools
🪛 Markdownlint (0.35.0)
200-200: Expected: asterisk; Actual: dash
Unordered list style(MD004, ul-style)
201-201: Expected: asterisk; Actual: dash
Unordered list style(MD004, ul-style)
204-204: Expected: asterisk; Actual: dash
Unordered list style(MD004, ul-style)
207-207: Expected: asterisk; Actual: dash
Unordered list style(MD004, ul-style)
208-208: Expected: asterisk; Actual: dash
Unordered list style(MD004, ul-style)
211-211: Expected: asterisk; Actual: dash
Unordered list style(MD004, ul-style)
212-212: Expected: asterisk; Actual: dash
Unordered list style(MD004, ul-style)
215-215: Expected: asterisk; Actual: dash
Unordered list style(MD004, ul-style)
216-216: Expected: asterisk; Actual: dash
Unordered list style(MD004, ul-style)
graphrag_sdk/fixtures/prompts.py (2)
Line range hint
1-1: Fix syntax error due to mismatched quotation markThere is an extra quotation mark at the end of the first line in the
CREATE_ONTOLOGY_SYSTEMstring assignment, which may cause a syntax error.Apply this diff to fix the syntax error:
CREATE_ONTOLOGY_SYSTEM = """ -## 1. Overview\n" +## 1. Overview\n You are a top-tier algorithm designed for extracting ontologies in structured formats to build a knowledge graph from raw texts.
Line range hint
24-24: Fix unbalanced quotation marks in theCREATE_ONTOLOGY_SYSTEMpromptIn the "Consistency" section, there is an unbalanced quotation mark before "like 'mathematician' or 'scientist'". This could lead to issues when processing the string.
Apply this diff to correct the quotation marks:
Avoid using more specific terms "like 'mathematician' or 'scientist'" +Avoid using more specific terms like 'mathematician' or 'scientist'
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (4)
README.md(2 hunks)graphrag_sdk/chat_session.py(4 hunks)graphrag_sdk/fixtures/prompts.py(3 hunks)graphrag_sdk/kg.py(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- graphrag_sdk/kg.py
🧰 Additional context used
🪛 Markdownlint (0.35.0)
README.md
200-200: Expected: asterisk; Actual: dash
Unordered list style
(MD004, ul-style)
201-201: Expected: asterisk; Actual: dash
Unordered list style
(MD004, ul-style)
204-204: Expected: asterisk; Actual: dash
Unordered list style
(MD004, ul-style)
207-207: Expected: asterisk; Actual: dash
Unordered list style
(MD004, ul-style)
208-208: Expected: asterisk; Actual: dash
Unordered list style
(MD004, ul-style)
211-211: Expected: asterisk; Actual: dash
Unordered list style
(MD004, ul-style)
212-212: Expected: asterisk; Actual: dash
Unordered list style
(MD004, ul-style)
215-215: Expected: asterisk; Actual: dash
Unordered list style
(MD004, ul-style)
216-216: Expected: asterisk; Actual: dash
Unordered list style
(MD004, ul-style)
🔇 Additional comments (2)
graphrag_sdk/chat_session.py (1)
27-29:
Breaking change: Update constructor signature and docstring
The constructor changes introduce several issues:
- Making new parameters required is a breaking change that could affect existing clients
- The docstring is incomplete, missing documentation for the new parameters
- Parameters should be optional with sensible defaults for backward compatibility
Apply this diff to address the issues:
def __init__(self, model_config: KnowledgeGraphModelConfig, ontology: Ontology, graph: Graph,
- cypher_system_instruction: str, qa_system_instruction: str,
- cypher_gen_prompt: str, qa_prompt: str, cypher_gen_prompt_history: str):
+ cypher_system_instruction: str | None = None,
+ qa_system_instruction: str | None = None,
+ cypher_gen_prompt: str | None = None,
+ qa_prompt: str | None = None,
+ cypher_gen_prompt_history: str | None = None):
"""
Initializes a new ChatSession object.
Args:
model_config (KnowledgeGraphModelConfig): The model configuration.
ontology (Ontology): The ontology object.
graph (Graph): The graph object.
+ cypher_system_instruction (str, optional): Custom system instruction for Cypher generation.
+ qa_system_instruction (str, optional): Custom system instruction for QA.
+ cypher_gen_prompt (str, optional): Custom prompt for Cypher generation.
+ qa_prompt (str, optional): Custom prompt for QA.
+ cypher_gen_prompt_history (str, optional): Custom prompt history for Cypher generation.
"""Likely invalid or redundant comment.
README.md (1)
173-175: LGTM! Clear section renaming
The rename from "Knowledge Graph" to "Knowledge Graph Agent" better reflects the agent-based nature of the system and its Q&A capabilities.
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (8)
tests/test_kg_openai.py (3)
88-88: Consider testing the full response structureThe change suggests that
send_messagenow returns a richer response object with additional context. While extracting the 'response' is correct, we should also verify the structure and content of the full response object to ensure all expected context is present.Consider adding assertions to verify the response structure:
answer = chat.send_message("How many actors acted in a movie?") +assert isinstance(answer, dict), "Response should be a dictionary" +assert "response" in answer, "Response should contain 'response' key" +# Add assertions for other expected keys in the response answer = answer['response']
Line range hint
93-97: Consider strengthening the assertion logicThe current regex pattern
\d+might match any number in the response text, not necessarily the actor count. This could lead to false positives if the response contains other numbers.Consider using a more specific regex or response parsing:
-actors_count = re.findall(r'\d+', answer) -num_actors = 0 if len(actors_count) == 0 else int(actors_count[0]) +# Look for specific patterns like "X actors" or "found X actors" +actors_count = re.findall(r'(\d+)\s+actors?', answer) +num_actors = 0 if len(actors_count) == 0 else int(actors_count[0]) +assert num_actors > 0, "No actor count found in response" assert num_actors > 10, "The number of actors found should be greater than 10"
Missing error handling for chat response structure
Based on the code analysis:
- The test directly accesses
answer['response']without any error handling- The underlying implementations (
send_messageandparse_generate_content_response) don't have explicit error handling for malformed responses- The
GenerationResponseclass expects specific fields but there's no validation to ensure they existThe test should be updated to include:
- Error handling for missing 'response' key
- Validation of the response structure before accessing it
- Test cases for error scenarios
🔗 Analysis chain
Line range hint
87-92: Verify error handling for chat responsesThe code assumes the 'response' key will always be present. We should verify how the test handles cases where the response might be malformed or missing.
Let's check if there's error handling in the chat implementation:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for error handling in chat session implementation ast-grep --pattern 'def send_message($$$) { try { $$$ } catch ($$$) { $$$ } }' # Search for response structure validation rg -A 5 "send_message.*response.*"Length of output: 131
Script:
#!/bin/bash # Let's try a broader search for the chat implementation and error handling # First, find the chat implementation file fd "kg_openai.py" # Search for the send_message method implementation rg -A 10 "def send_message" # Search for any error handling related to response rg -B 2 -A 5 "try.*response"Length of output: 4827
Script:
#!/bin/bash # Let's check the implementation of parse_generate_content_response and error handling rg -B 2 -A 5 "parse_generate_content_response" # Let's also check the GenerationResponse type definition fd "generation_response" --type f # And check its content rg -A 5 "class GenerationResponse"Length of output: 5320
Script:
#!/bin/bash # Let's check for try-catch blocks in the OpenAI implementation since that's what's being used in the test rg -B 5 -A 5 "try" graphrag_sdk/models/openai.py # Also check the test file itself for any error handling rg -B 5 -A 5 "try" tests/test_kg_openai.py # And check the chat session implementation for error handling rg -B 5 -A 5 "try" graphrag_sdk/chat_session.pyLength of output: 139
tests/test_kg_gemini.py (1)
93-94: Add error handling and type hints for the chat response.While the change correctly handles the new dictionary response format, it could benefit from better error handling and type hints.
Consider applying this improvement:
- answer = chat.send_message("How many actors acted in a movie?") - answer = answer['response'] + response_dict = chat.send_message("How many actors acted in a movie?") + if not isinstance(response_dict, dict) or 'response' not in response_dict: + raise ValueError("Unexpected response format from chat session") + answer = response_dict['response']graphrag_sdk/agents/kg_agent.py (1)
139-139: Document the output structure and consider exposing contextWhile extracting just the 'response' works, the output dictionary contains valuable context ('question', 'context', 'cypher') that might be useful for debugging or advanced use cases. Consider:
- Documenting the complete output structure in the method's docstring
- Providing a way to access the full context when needed
Here's a suggested improvement:
def run(self, params: dict) -> str: """ Ask the agent a question. Args: params (dict): The parameters for the agent. Returns: - str: The agent's response. + str: The agent's response. Note: The full output dictionary contains + additional context including 'question', 'context', and 'cypher' + which can be accessed by setting return_context=True. """ - output = self.chat_session.send_message(params["prompt"]) - return output['response'] + output = self.chat_session.send_message(params["prompt"]) + return output['response'] if not params.get('return_context') else outputgraphrag_sdk/chat_session.py (1)
101-106: Add type hints and fix formattingConsider:
- Adding a TypedDict for the return type
- Removing trailing whitespace
+from typing import TypedDict + +class ChatResponse(TypedDict): + question: str + response: str + context: list[str] + cypher: str + def send_message(self, message: str) -> ChatResponse: # ... return { - "question": message, - "response": answer, - "context": context, - "cypher": cypher - } + "question": message, + "response": answer, + "context": context, + "cypher": cypher + }graphrag_sdk/kg.py (2)
Line range hint
33-54: Enhance parameter documentation with examplesThe parameter documentation is clear but would benefit from example values to help users understand the expected format and placeholders.
Consider adding examples like:
""" Example: cypher_system_instruction: "Using the following ontology: {ontology}..." qa_prompt: "Question: {question}\nContext: {context}\nCypher: {cypher}\nAnswer:" """
185-187: Improve readability by breaking long line into multiple linesThe chat session initialization line is quite long. Consider reformatting for better readability:
- chat_session = ChatSession(self._model_config, self.ontology, self.graph, self.cypher_system_instruction, - self.qa_system_instruction, self.cypher_gen_prompt, self.qa_prompt, self.cypher_gen_prompt_history) + chat_session = ChatSession( + self._model_config, + self.ontology, + self.graph, + self.cypher_system_instruction, + self.qa_system_instruction, + self.cypher_gen_prompt, + self.qa_prompt, + self.cypher_gen_prompt_history + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (6)
graphrag_sdk/agents/kg_agent.py(1 hunks)graphrag_sdk/chat_session.py(5 hunks)graphrag_sdk/kg.py(5 hunks)tests/test_kg_gemini.py(1 hunks)tests/test_kg_ollama.py(1 hunks)tests/test_kg_openai.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/test_kg_ollama.py
🧰 Additional context used
🪛 Ruff (0.7.0)
graphrag_sdk/kg.py
73-73: No explicit stacklevel keyword argument found
(B028)
90-90: No explicit stacklevel keyword argument found
(B028)
98-98: No explicit stacklevel keyword argument found
(B028)
🔇 Additional comments (3)
tests/test_kg_gemini.py (1)
93-94: Verify consistent response handling across test files.
Since this change is part of a broader update to chat response handling, let's verify that similar changes are applied consistently in other test files.
✅ Verification successful
Response handling is consistent across test files
The verification shows that all test files (test_kg_gemini.py, test_kg_openai.py, and test_kg_ollama.py) handle chat responses in exactly the same way:
- They all use
answer = chat.send_message(...)to send a message - They all extract the response using
answer = answer['response'] - They all follow the same pattern of logging the answer afterward
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if other test files handle the chat response similarly
# Expected: Similar response extraction in test_kg_openai.py and test_kg_ollama.py
echo "Checking chat response handling in other test files:"
rg -A 2 "chat.send_message" "tests/test_kg_*.py"
Length of output: 222
Script:
#!/bin/bash
# Let's first find all test files related to knowledge graph (kg) testing
fd "test_kg.*\.py$" tests/
# Then examine the content of these files focusing on chat response handling
echo "Checking chat response handling patterns:"
rg -B 2 -A 2 "send_message|response" tests/test_kg*
Length of output: 1362
graphrag_sdk/agents/kg_agent.py (1)
Line range hint 39-39: Add required parameters to chat_session initialization
According to the changes in the codebase, ChatSession now requires four new parameters for custom instructions and prompts. These parameters should be passed through the chat_session() method call.
Let's verify the required parameters for ChatSession:
graphrag_sdk/chat_session.py (1)
48-48: 🛠️ Refactor suggestion
Validate format string placeholders
The switch to using format() is good, but ensure the cypher_system_instruction contains the correct placeholder {ontology}. Consider adding validation:
+ if "{ontology}" not in cypher_system_instruction:
+ raise ValueError("cypher_system_instruction must contain {ontology} placeholder")
cypher_system_instruction = cypher_system_instruction.format(ontology=str(ontology.to_json()))There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (2)
graphrag_sdk/chat_session.py (2)
51-53: Add type hints for prompt attributesAdd type hints to improve code maintainability and IDE support.
+ self.cypher_prompt: str = cypher_gen_prompt + self.qa_prompt: str = qa_prompt + self.cypher_prompt_with_history: str = cypher_gen_prompt_history
91-96: Define constants for response dictionary keysDefine constants for the dictionary keys to prevent typos and improve maintainability.
+ # Class-level constants for response keys + QUESTION_KEY = "question" + RESPONSE_KEY = "response" + CONTEXT_KEY = "context" + CYPHER_KEY = "cypher" + def send_message(self, message: str): # ... if not cypher or len(cypher) == 0: return { - "question": message, - "response": "I am sorry, I could not find the answer to your question", - "context": None, - "cypher": None + self.QUESTION_KEY: message, + self.RESPONSE_KEY: "I am sorry, I could not find the answer to your question", + self.CONTEXT_KEY: None, + self.CYPHER_KEY: None } # ... return { - "question": message, - "response": answer, - "context": context, - "cypher": cypher + self.QUESTION_KEY: message, + self.RESPONSE_KEY: answer, + self.CONTEXT_KEY: context, + self.CYPHER_KEY: cypher }Also applies to: 106-111
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (12)
ontology.json (5)
19-19: Add meaningful descriptions for entitiesEach entity has an empty description field. Adding clear descriptions would improve schema documentation and help users understand the purpose of each entity.
Example improvements:
- "description": "" + "description": "Represents a fighter or participant in combat sports events"- "description": "" + "description": "Represents a combat sports event that contains multiple bouts"- "description": "" + "description": "Represents a single fight/match between two fighters"Also applies to: 43-43, 85-85
31-35: Consider using proper date typeThe
dateattribute is defined as a string. Consider using a proper date type and specifying the expected format (e.g., ISO 8601) to ensure data consistency.{ "name": "date", - "type": "string", + "type": "date", "unique": false, - "required": true + "required": true, + "format": "ISO8601" },
73-77: Add validation for time_formatThe
time_formatattribute should have constraints to ensure valid values (e.g., "3x5" for three five-minute rounds).{ "name": "time_format", "type": "string", "unique": false, - "required": true + "required": true, + "pattern": "^\\d+x\\d+$", + "description": "Format: NxM where N is number of rounds and M is minutes per round" },
49-53: Add enumeration for weight_classThe
weight_classattribute should have predefined values to ensure consistency.{ "name": "weight_class", "type": "string", "unique": false, - "required": true + "required": true, + "enum": [ + "Flyweight", + "Bantamweight", + "Featherweight", + "Lightweight", + "Welterweight", + "Middleweight", + "Light Heavyweight", + "Heavyweight" + ] },
105-109: Add range constraints for numerical statisticsNumerical attributes like
knockdowns,significant_strikes, andtakedownsshould have minimum values to prevent negative numbers.{ "name": "knockdowns", "type": "number", "unique": false, - "required": false + "required": false, + "minimum": 0 },Similar constraints should be added to other numerical statistics.
Also applies to: 111-115, 141-145
examples/movies/demo-movies.ipynb (5)
Line range hint
45-45: Avoid commenting out sensitive credentials in example codeThe commented API key line could lead to accidental credential exposure if users uncomment it. Consider replacing it with a placeholder or environment variable reference.
-# OPENAI_API_KEY = "sk-"# OpenAI API key +# Set OPENAI_API_KEY in your .env file or environment variables
Line range hint
62-73: Consider externalizing the URL listTo improve maintainability and allow for easier updates, consider moving the URLs to a separate configuration file or providing them as a parameter.
-urls = ["https://www.rottentomatoes.com/m/side_by_side_2012", -"https://www.rottentomatoes.com/m/matrix", -"https://www.rottentomatoes.com/m/matrix_revolutions", -"https://www.rottentomatoes.com/m/matrix_reloaded", -"https://www.rottentomatoes.com/m/speed_1994", -"https://www.rottentomatoes.com/m/john_wick_chapter_4"] +# Load URLs from a configuration file or environment +from config import DEMO_URLS +urls = DEMO_URLS
Line range hint
91-95: Consider making boundaries configurableThe ontology boundaries are hardcoded in the example. Consider making them configurable to demonstrate flexibility in ontology creation.
-boundaries = """ - Extract only the most relevant information about all the movies, actors, and directors over the text. - Avoid creating entities for details that can be expressed as attributes. -""" +# Load boundaries from configuration or allow as parameter +boundaries = get_ontology_boundaries() # or pass as parameter to the notebook
Line range hint
116-122: Add error handling for source processingConsider adding try-catch blocks to handle potential errors during source processing and provide meaningful feedback.
-kg.process_sources(sources) +try: + kg.process_sources(sources) +except Exception as e: + print(f"Error processing sources: {e}") + raise
Line range hint
139-157: Improve output readability and add error handlingThe chat session output could be more readable, and error handling should be added for robustness.
-print(chat.send_message("Who is the director of the movie The Matrix?")) +def format_response(response): + """Format the chat response for better readability""" + return f""" +Question: {response['question']} +Answer: {response['response']} +Context: {response['context']} +Cypher: {response['cypher']} +""" + +try: + response = chat.send_message("Who is the director of the movie The Matrix?") + print(format_response(response)) +except Exception as e: + print(f"Error in chat session: {e}")examples/ufc/demo-ufc.ipynb (2)
Line range hint
95-100: Ensuresampled_sourcesis not empty to prevent errorsWhen calculating
sampled_sources, iflen(sources) * percentrounds down to zero,random.samplewill raise aValueError. To avoid this error, ensure that at least one source is selected, especially when the dataset is small or thepercentvalue is low.Consider modifying the code to guarantee a minimum sample size:
sample_size = max(1, round(len(sources) * percent)) -sampled_sources = random.sample(sources, round(len(sources) * percent)) +sampled_sources = random.sample(sources, sample_size)
Model name "gpt-4o" appears to be incorrect and needs to be updated
The codebase consistently uses an invalid model name "gpt-4o" in multiple files, while the test files correctly use valid OpenAI model names like "gpt-3.5-turbo" and "gpt-3.5-turbo-0125". The model name "gpt-4o" is not a valid OpenAI model identifier.
Locations requiring updates:
- README.md (multiple occurrences)
- examples/ufc/demo-ufc.ipynb
- examples/movies/demo-movies.ipynb
- examples/trip/demo_orchestrator_trip.ipynb
The model name should be changed to a valid OpenAI model identifier such as "gpt-4" or "gpt-3.5-turbo".
🔗 Analysis chain
Line range hint
98-98: Verify the correctness of the model name inOpenAiGenerativeModelThe model name
"gpt-4o"may be incorrect or unsupported. Please verify that it corresponds to a valid OpenAI model name. Common valid model names include"gpt-3.5-turbo"or"gpt-4".Run the following script to check for valid OpenAI model names used in the codebase:
This script will list all the model names used with
OpenAiGenerativeModelin the codebase, helping you verify the correct model names.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Search for OpenAI model names used in the codebase. # Search for instances where OpenAiGenerativeModel is initialized. rg 'OpenAiGenerativeModel\(model_name=["\'](.*?)["\']\)' -or '$1' | sort | uniqLength of output: 262
Script:
#!/bin/bash # Search for OpenAI model names using simpler patterns rg -i "model_name.*['\"]gpt" -A 2Length of output: 1112
Script:
#!/bin/bash # Search for any OpenAI model references rg -i "gpt-[34]" -A 2Length of output: 1948
Script:
#!/bin/bash # Look for OpenAiGenerativeModel usage rg "OpenAiGenerativeModel" -A 3Length of output: 6815
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
examples/movies/demo-movies.ipynb(7 hunks)examples/ufc/demo-ufc.ipynb(10 hunks)ontology.json(1 hunks)
🔇 Additional comments (4)
ontology.json (1)
1-199: Verify ontology integration with chat sessions
The AI summary indicates that this ontology is used in chat sessions and knowledge graph initialization. Let's verify this integration.
✅ Verification successful
Ontology integration is properly implemented and actively used
The ontology file is correctly integrated and utilized throughout the codebase:
- Used in chat sessions (
chat_session.py) to format system instructions and handle queries - Core component in knowledge graph operations (
kg.py) for validating nodes, relations, and attributes - Actively used in data extraction (
extract_data_step.py) with proper placeholder replacement - Supports ontology creation and fixes through
create_ontology_step.py
The implementation shows proper validation, error handling, and integration with both chat and graph operations, confirming the AI summary's claims about its usage in chat sessions and knowledge graph initialization.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify how the ontology is used in the codebase
# Check for ontology usage in chat sessions and knowledge graph
echo "Checking for ontology usage in code..."
rg -l "ontology" --type py
# Check for specific placeholder mentioned in the summary
echo "Checking for #ONTOLOGY placeholder..."
rg "#ONTOLOGY" --type py
Length of output: 913
Script:
#!/bin/bash
# Let's check how the ontology is actually used in key files
# Check ontology usage in chat_session.py
echo "=== chat_session.py usage ==="
rg -A 5 "ontology" graphrag_sdk/chat_session.py
# Check ontology usage in kg.py
echo "=== kg.py usage ==="
rg -A 5 "ontology" graphrag_sdk/kg.py
# Check create_ontology_step.py to understand initialization
echo "=== create_ontology_step.py usage ==="
rg -A 5 "ontology" graphrag_sdk/steps/create_ontology_step.py
# Check extract_data_step.py since it uses #ONTOLOGY placeholder
echo "=== extract_data_step.py usage ==="
rg -A 5 "ontology" graphrag_sdk/steps/extract_data_step.py
Length of output: 11870
examples/movies/demo-movies.ipynb (2)
146-149: Changes align well with PR objectives
The updated output format successfully demonstrates the "reveal chat context" objective by including context and Cypher queries in the responses.
87-90:
Fix the model name typo
The model name "gpt-4o" appears to be incorrect. It should probably be "gpt-4".
-model = OpenAiGenerativeModel(model_name="gpt-4o")
+model = OpenAiGenerativeModel(model_name="gpt-4")Likely invalid or redundant comment.
examples/ufc/demo-ufc.ipynb (1)
386-387: Verify the exposure of internal details in the chat responses
The chat responses include detailed context and Cypher queries, which may expose internal implementation details or sensitive information.
Consider whether this level of detail is appropriate for end-users. If not, you might want to modify the response handling to exclude or anonymize the context and cypher information.
To verify where the response structure is defined and adjust it accordingly, run:
This script helps locate the code responsible for the response formatting so you can make necessary adjustments.
Summary by CodeRabbit
Summary by CodeRabbit
New Features
KnowledgeGraphclass to support custom instructions and prompts for Cypher and QA sessions.ChatSessionto return structured responses, including context and question.ontology.jsonfor a combat sports context.Bug Fixes
GraphQueryGenerationStepandQAStepto streamline prompt generation.Documentation