FHIR-AgentBench

This repository contains the code and dataset for FHIR-AgentBench.

📁 Project Structure

FHIR-AgentBench/
├── scripts/                          # Bash scripts for data setup, agent inference, and evaluation
├── agent/                            # Multiple agent implementations
├── tools/                            # Tools for agents
├── utils/                            # Utility modules
├── config.py                         # Configuration settings and constants
├── config.yml                        # YAML configuration file
├── create_db.py                      # Creates database for Q&A conversion to FHIR
├── create_question_answer_dataset.py # Creates Q&A dataset from EHRSQL
├── create_question_fhir_dataset.py   # Creates FHIR-compatible question dataset
├── evaluation_metrics.py             # Main evaluation script
├── fhir_client.py                    # FHIR client for Google Cloud Healthcare API
├── run_agent.py                      # Main script to run agents on datasets
├── question_fixes_complete.json      # Hard-coded question fixes
├── value_mapping_valid_natural.json  # Natural language value mappings 
├── requirements.txt                  # Python package dependencies
└── images/                           # Documentation images

🚀 Getting Started

Prerequisites

Install required packages:

# Create a conda environment
conda create -n fhir-agentbench python=3.11
conda activate fhir-agentbench

# Install dependencies
pip install -r requirements.txt

Data Preparation

1. Upload the MIMIC-IV FHIR data to a GCP FHIR store

Download MIMIC-IV Clinical Database Demo on FHIR from PhysioNet and extract the .gz files.
Create a GCP account, then in the Google Cloud Console search for FHIR Viewer.
Click Browser on the left, then Create dataset.
Next, click Create data store to prepare for the data upload.
For Configure your FHIR store, select R4 as the FHIR Version. Keep other settings as default and click Create.
Separately, in Cloud Storage, upload your unzipped folder containing the MIMIC-IV FHIR data (*.ndjson) to a bucket.
Back in the FHIR store, click Actions in the upper right and choose Import.
Select the folder you uploaded. Under FHIR Import Settings, choose Resource for Content Structure. Click Import and grant permissions if prompted.
Open the Import operation to confirm success. It usually completes in about 10 minutes.

2. Enable APIs and authenticate with gcloud

You can enable the required APIs and verify access using the gcloud CLI. This is often the fastest way to confirm your setup before running code.

Log in

# Authenticate with your Google account
gcloud auth login

# Set up Application Default Credentials (ADC)
gcloud auth application-default login --no-launch-browser

Check or set the current project and project number

# List all available projects to find your PROJECT_ID
gcloud projects list

# Set the quota project for ADC (to handle billing and quotas)
gcloud auth application-default set-quota-project <YOUR_PROJECT_ID>

# Set the default project for gcloud CLI
gcloud config set project <YOUR_PROJECT_ID>

# Get the current project ID and project number
PROJECT_ID="$(gcloud config get-value project)"
PROJECT_NUMBER="$(gcloud projects describe "$PROJECT_ID" --format="value(projectNumber)")"

# Print them for confirmation
echo "$PROJECT_ID"
echo "$PROJECT_NUMBER"

Enable required APIs

# Enable the Cloud Healthcare API (for FHIR, DICOM, HL7v2 resources)
gcloud services enable healthcare.googleapis.com --project="$PROJECT_ID"

# Enable the Cloud Asset API (needed for dataset and store discovery)
gcloud services enable cloudasset.googleapis.com --project="$PROJECT_ID"

# Enable the Cloud Resource Manager API (needed for project and resource management)
gcloud services enable cloudresourcemanager.googleapis.com --project="$PROJECT_ID"

# Enable the Service Usage API (needed to enable and check other APIs)
gcloud services enable serviceusage.googleapis.com --project="$PROJECT_ID"

Automatically discover dataset, FHIR store, and location

# Find the dataset ID and location
read DATASET_ID LOCATION <<<$(gcloud asset search-all-resources \
--scope="projects/$PROJECT_NUMBER" \
--asset-types="healthcare.googleapis.com/Dataset" \
--format="value(name.basename(), location)")

echo "LOCATION=$LOCATION"
echo "DATASET_ID=$DATASET_ID"

# Find the FHIR store ID
STORE_ID="$(gcloud healthcare fhir-stores list \
--dataset="$DATASET_ID" --location="$LOCATION" --project="$PROJECT_ID" \
--format="value(name.basename())")"

echo "STORE_ID=$STORE_ID"

Grant IAM permissions to your user (if not already granted)

# Get the current logged-in user
USER="$(gcloud config get-value account)"

# Grant FHIR resource read access
gcloud healthcare datasets add-iam-policy-binding "$DATASET_ID" \
--location="$LOCATION" --project="$PROJECT_ID" \
--member="user:$USER" \
--role="roles/healthcare.fhirResourceReader"

# Grant FHIR store viewer access
gcloud healthcare datasets add-iam-policy-binding "$DATASET_ID" \
--location="$LOCATION" --project="$PROJECT_ID" \
--member="user:$USER" \
--role="roles/healthcare.fhirStoreViewer"

Project configuration

Create a file named config.yml in the project root:

OPENAI_API_KEY: "your-api-key"
GEMINI_API_KEY: "your-api-key"
FHIR_CONFIG:
   PROJECT_ID: "your-gcp-project-id"
   LOCATION: "your-fhir-dataset-location"
   DATASET_ID: "your-dataset-id"
   STORE_ID: "fhir-store-id (usually the same as dataset_id)"

3. (Optional) Run the script to download and prepare the dataset:

If final_dataset/questions_answers_sql_fhir.csv already exists, you can skip this stage.

bash scripts/setup_data.sh
python create_question_answer_dataset.py
python create_question_fhir_dataset.py

🤖 Agent Execution

The project includes several agent implementations:

# Single-turn agents
bash scripts/run_single_turn_request_agent.sh       # Single-turn FHIR RESTful API generation and retrieval → Natural language reasoning
bash scripts/run_single_turn_resource_agent.sh      # Single-turn FHIR resource retrieval → Natural language reasoning
bash scripts/run_single_turn_code_resource_agent.sh # Single-turn FHIR resource retrieval → Code-based reasoning

# Multi-turn agents
bash scripts/run_multi_turn_resource_agent.sh       # Multi-turn/iterative resource retrieval → Natural language reasoning
bash scripts/run_multi_turn_code_resource_agent.sh  # Multi-turn/iterative resource retrieval → Code-based reasoning

To use open-source models locally with vLLM, start the vLLM server and set base_url to http://localhost:<port>/v1.

CUDA_VISIBLE_DEVICES=<gpu_ids> python -m vllm.entrypoints.openai.api_server --model <model> --load-format safetensors --max-model-len 32768 --tensor-parallel-size <num_gpus> --port <port> --enable-auto-tool-choice --tool-call-parser llama3_json

📊 Evaluation

Run the following command to normalize, evaluate answers, and visualize performance (FHIR resource retrieval recall/precision, answer correctness):

python evaluation_metrics.py --input <agent_output_json_file_path>

Authorship

FHIR-AgentBench is a joint research effort between Verily Life Sciences, Korea Advanced Institute of Science & Technology (KAIST), and Massachusetts Institute of Technology (MIT).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FHIR-AgentBench

📁 Project Structure

🚀 Getting Started

Prerequisites

Data Preparation

1. Upload the MIMIC-IV FHIR data to a GCP FHIR store

2. Enable APIs and authenticate with gcloud

3. (Optional) Run the script to download and prepare the dataset:

🤖 Agent Execution

📊 Evaluation

Authorship

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
agent		agent
final_dataset		final_dataset
images		images
scripts		scripts
tools		tools
utils		utils
LICENSE		LICENSE
README.md		README.md
config.py		config.py
config.yml		config.yml
create_db.py		create_db.py
create_question_answer_dataset.py		create_question_answer_dataset.py
create_question_fhir_dataset.py		create_question_fhir_dataset.py
evaluation_metrics.py		evaluation_metrics.py
fhir_client.py		fhir_client.py
question_fixes_complete.json		question_fixes_complete.json
requirements.txt		requirements.txt
run_agent.py		run_agent.py
value_mapping_valid_natural.json		value_mapping_valid_natural.json

Folders and files

Latest commit

History

Repository files navigation

FHIR-AgentBench

📁 Project Structure

🚀 Getting Started

Prerequisites

Data Preparation

1. Upload the MIMIC-IV FHIR data to a GCP FHIR store

2. Enable APIs and authenticate with gcloud

3. (Optional) Run the script to download and prepare the dataset:

🤖 Agent Execution

📊 Evaluation

Authorship

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages