Learning Sequence:
-
RAG_Reformulate.py -
RAG_Hybrid_Retrieval.py -
RAG_Re-ranking.py -
RAG-Part2/1-RAG-PDF-Split -
RAG-Part2/2-RAG-LLM -
RAG-Part2/3-RAG-Eval
1-3 are basic RAG rechniques.
4-6 We chunk the PDF, then the retrieved text are as the context inputted into the LLM (qwen2-0.5) to get the response/answer, and finally evaluate this RAG using metrics such as hit rate.
1.Reformulate: Use BART model to reformulate.
python RAG_Reformulate.py
2.Hybrid Retrieval:sparse (BM25) + dense (FAISS + Sentence Transformer Embeddings)
pip install faiss-gpu
python RAG_Hybrid_Retrieval.py
3.Re-ranking: Use cross-encoder (ms-marco-MiniLM-L-6-v2) to count the scores of the pair data (query + initial retrieval results), and then rerank based on the scores.
python RAG_Re-ranking.py


