Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,38 @@

A web application that provides an AI-powered chatbot interface for dataset discovery, using Google Gemini API on the backend and a React-based frontend.

## Architecture Overview

The KnowledgeSpace AI Agent is a Retrieval-Augmented Generation (RAG) system designed for neuroscience dataset discovery. It combines keyword search, semantic retrieval, and large language model reasoning to surface relevant datasets across heterogeneous sources.

The system consists of the following core components:

- **React Frontend**
- Provides a chat-based interface for interacting with the dataset discovery agent.

- **FastAPI Backend**
- Orchestrates LLM-based reasoning (Gemini), keyword search (Elasticsearch), and semantic search (Vertex AI Matching Engine).

- **Data Processing Pipeline**
- Scrapes and normalizes neuroscience metadata, stores structured records in BigQuery, and indexes vector embeddings in Vertex AI for retrieval.

## System Flow (High-Level)

The following diagram shows the high-level request and data flow through the system:

```mermaid
flowchart LR
User --> Frontend
Frontend --> Backend
Backend -->|Keyword Search| Elasticsearch
Backend -->|Semantic Search| VertexAI
Backend -->|LLM Reasoning| Gemini
Elasticsearch --> Backend
VertexAI --> Backend
Gemini --> Backend
Backend --> Frontend
```

## Table of Contents

- [Prerequisites](#prerequisites)
Expand Down Expand Up @@ -65,6 +97,8 @@ Create a file named `.env` in the project root based on `.env.template`. You can
**Option 1: Google API Key (Recommended for development)**

- Set `GOOGLE_API_KEY` in your `.env` file
> You can generate a Gemini API key from **Google AI Studio**:
> https://aistudio.google.com/app/apikey

**Option 2: Vertex AI (Recommended for production)**

Expand Down Expand Up @@ -139,6 +173,10 @@ The backend requires specific environment variables to connect to **Google Cloud


## Running the Application
> ⚠️ **Note on Port Configuration**
>
> - Local development: The React development server runs on port **5000**
> - Docker / Nginx deployment: The containerized frontend is exposed on port **3000**

#### Backend (port 8000)

Expand Down