Group Project of COMP0087: Statistical Natural Language Processing
LawBot is a simple yet effective framework tailored to specialized legal domains, operational without training. Utilizing Chinese legal and regulatory documents as the knowledge base, LawBot enhances the breadth of retrieval through multi-query generation and hybrid search strategies. It increases precision with metadata filtering and confirms the plausibility of knowledge through context-based reranking. Remarkably, all these procedures are conducted via zero-shot prompting, making LawBot broadly applicable even when LLMs are accessible only through a black-box API.
Follow these steps to set up the LawBot environment on your local machine:
- Clone the repository:
git clone https://github.com/yix8/LawBot.git
- Navigate to the LawBot directory:
cd LawBot - Install the required packages:
pip install -r requirements.txt
- Build the general vector store:
python framework/embed_laws.py
- Build the specific vector store:
python framework/finetune_data/embed_query.py
Set up the necessary API keys in the .env file located in the framework folder:
OPENAI_API_KEY: Your OpenAI API key.COHERE_RERANK_KEY: Your Cohere rerank API key.LANGCHAIN_API_KEY: Your LangChain API key.LANGCHAIN_PROJECT: Your LangChain project identifier.
To interact with the model via a web interface:
python framework/App.pyWe also proposed an open-source QA dataset, the Chinese Legal Question Answering dataset (CLQS) which can be utilized as an instruction dataset.

