This project demonstrates how to build an agentic system using Large Language Models (LLMs) that can interact with multiple databases and utilize various tools. It highlights the use of SQL agents to efficiently query large databases. The key frameworks used in this project include OpenAI, LangChain, LangGraph, LangSmith, and Gradio. The end product is an end-to-end chatbot, designed to perform these tasks, with LangSmith used to monitor the performance of the agents.
The following diagram illustrates the complete workflow of the AgentGraph system, showing how different components interact to process user queries and generate responses:
The system architecture consists of three main processing paths:
- RAG Tool Path (Top): Handles unstructured document queries using vector embeddings and similarity search
- SQL Agent Tool Path (Middle): Processes structured database queries using intelligent SQL generation
- Internet Search Tool Path (Bottom): Performs web searches using Tavily for external information retrieval
The Primary Agent (GPT-4o) orchestrates these tools recursively until the user request is completely fulfilled, with all interactions monitored through LangSmith for performance tracking.
- Operating System: Linux or Windows (Tested on Windows 11 with Python 3.9.11 or above)
- OpenAI API Key: Required for GPT functionality.
- Tavily Credentials: Required for search tools (Free from your Tavily profile).
- LangChain Credentials: Required for LangSmith (Free from your LangChain profile).
- Dependencies: The necessary libraries are provided in
requirements.txtfile.
To set up the project, follow these steps:
-
Clone the repository:
git clone <repo_address>
-
Install Python and create a virtual environment:
python -m venv venv
-
Activate the virtual environment:
- On Windows:
venv\Scripts\activate
- On Linux/macOS:
source venv/bin/activate
- On Windows:
-
Install the required dependencies:
pip install -r requirements.txt
-
Download a sql database from and paste it into the
datafolder. -
Prepare the
.envfile and add yourOPEN_AI_API_KEY,TAVILY_API_KEY, andLANGCHAIN_API_KEY. -
Run
prepare_vector_db.pymodule once to prepare the vector database.python src\prepare_vector_db.py -
Run the app:
python src\app.py
Open the Gradio URL generated in the terminal and start chatting.
Sample questions are available in SampleQuestions.txt.
To use your own data:
- Place your data in the
datafolder. - Update the configurations in
tools_config.yml. - Load the configurations in
src\agent_graph\load_tools_config.py.
For unstructured data using Retrieval-Augmented Generation (RAG):
- Run the following command with your data directory's configuration:
python src\prepare_vector_db.py
All configurations are managed through YAML files in the configs folder, loaded by src\chatbot\load_config.py and src\agent_graph\load_tools_config.py. These modules are used for a clean distribution of configurations throughout the project.
Once your databases are ready, you can either connect the current agents to the databases or create new agents.
- LangChain: Introduction
- LangGraph
- LangSmith
- Gradio: Documentation
- OpenAI: Developer Quickstart
- Tavily Search
