Project Overview

This project is a Retrieval-Augmented Generation (RAG) system designed for research assistance and data retrieval. It integrates advanced retriever and generator models, supports vector store management, and provides an interactive interface for users. Below is an overview of the key components, features, and directories in the project.

Setup

Clone the repository:

git clone https://github.com/darkivist/FolkRAG.git

Install required dependencies:
```
pip install -r requirements.txt
```

Set up AWS Bedrock credentials in the config/config.ini file:

[BedRock_LLM_API]
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_SESSION_TOKEN=your_session_token

Set up AWS Bedrock credentions in the config/.env file:
```
CONFIG_FILE=config.ini
```
Preconfigured vector stores should be available with the following names and stores in the ../data/vector_stores directory. Vector stores not available in the repository will not function in the application unless they are constructed and housed locally or in a hosted environment.
- vectorstore_all_250_instruct
- vectorstore_all_1000_instruct
- vectorstore_all_250_titan
- vectorstore_sample_250_instruct (available in the repo)
- vectorstore_sample_1000_instruct (available in the repo)

Key Components

1. Application Files

generator.py: Manages text generation processes using AWS Bedrock's Claude.
retriever.py: Handles data retrieval using vector stores, embedding models (instructor-xl, amazon/titan), hypothetical document generation, and reranking techniques.
app.py: Contains the main logic for initializing and running the application.

2. Vector Stores

Vector stores are utilized for efficient data retrieval and are stored in the /data/vector_stores directory. Two smaller vector stores for testing are included in the repository. Larger vector stores can be processed on a virtual or local machine due to size constraints.

3. Streamlit Application

Interactive UI: The application provides a user-friendly interface built with Streamlit.
Features:
- Query processing and document retrieval.
- Generative responses based on retrieved data.
- Parameter customization for top-k documents and vector store selection.

To launch the application, use:

streamlit run src/main.py

Features

Retriever Details

Vector Stores: Supports multiple vector store configurations using embedding models.
HyDE Generator: Generates hypothetical documents for enhanced query matching.
TF-IDF-based Reranking: Combines relevance, freshness, and keyword coverage for optimal document selection.

Generator Details

Prompt Formulation: Combines query context and metadata for precise responses.
Metadata Integration: Dynamically incorporates metadata into prompts to enhance specificity.
AWS Bedrock Integration: Leverages Claude models for text generation.

Customization

To customize the directory structure, update the paths in src/main.py:

data_dir = os.path.join(project_root, 'data')
demo_dir = os.path.join(project_root, 'demo')
vstore_dir = os.path.join(demo_dir, 'vector_stores')
config_dir = os.path.join(project_root, 'config')
src_dir = os.path.join(project_root, 'src')
components_dir = os.path.join(src_dir, 'components')

For adding new components, place files in src/components/ and ensure proper imports.

License

This project is licensed under the MIT License. See the LICENSE file in the root directory for details.

Name		Name	Last commit message	Last commit date
Latest commit History 398 Commits
config		config
data		data
demo		demo
full_report		full_report
presentation		presentation
progress report		progress report
research_paper		research_paper
src		src
tests		tests
visuals		visuals
README.md		README.md
license.txt		license.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Overview

Setup

Key Components

1. Application Files

2. Vector Stores

3. Streamlit Application

Features

Retriever Details

Generator Details

Customization

License

About

Uh oh!

Releases

Packages

Languages

License

darkivist/FolkRAG

Folders and files

Latest commit

History

Repository files navigation

Project Overview

Setup

Key Components

1. Application Files

2. Vector Stores

3. Streamlit Application

Features

Retriever Details

Generator Details

Customization

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages