Skip to content
/ LawBot Public

A simple yet effective framework designed for the specialized legal domain using Retrieval-Augmented Generation (RAG).

Notifications You must be signed in to change notification settings

yix8/LawBot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LawBot: Enhancing LLMs with RAG for Legal Precision

Group Project of COMP0087: Statistical Natural Language Processing

Introduction

LawBot is a simple yet effective framework tailored to specialized legal domains, operational without training. Utilizing Chinese legal and regulatory documents as the knowledge base, LawBot enhances the breadth of retrieval through multi-query generation and hybrid search strategies. It increases precision with metadata filtering and confirms the plausibility of knowledge through context-based reranking. Remarkably, all these procedures are conducted via zero-shot prompting, making LawBot broadly applicable even when LLMs are accessible only through a black-box API.

LawBot Pipeline

Installation

Follow these steps to set up the LawBot environment on your local machine:

  1. Clone the repository:
    git clone https://github.com/yix8/LawBot.git
  2. Navigate to the LawBot directory:
    cd LawBot
  3. Install the required packages:
    pip install -r requirements.txt
  4. Build the general vector store:
    python framework/embed_laws.py
  5. Build the specific vector store:
    python framework/finetune_data/embed_query.py

Configuration

Set up the necessary API keys in the .env file located in the framework folder:

  • OPENAI_API_KEY: Your OpenAI API key.
  • COHERE_RERANK_KEY: Your Cohere rerank API key.
  • LANGCHAIN_API_KEY: Your LangChain API key.
  • LANGCHAIN_PROJECT: Your LangChain project identifier.

Running LawBot

To interact with the model via a web interface:

python framework/App.py

Interface

China Law Query Synthetic

We also proposed an open-source QA dataset, the Chinese Legal Question Answering dataset (CLQS) which can be utilized as an instruction dataset.

About

A simple yet effective framework designed for the specialized legal domain using Retrieval-Augmented Generation (RAG).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages