tabular-data

Benchmark

We use Poetry for dependency management. Please make sure that you have installed Poetry and set up the environment correctly before starting development.

setup environment

Install dependencies from the lock file: poetry install -E semantic -E claude
Use the environment: You can either run commands directly with poetry run <command> or open a shell with poetry shell and then run commands directly.

prepare environment variables

copy .env.template and rename to .env
in .env, set api key and model for the desired LLM (OpenAI, Gemini or Claude), such as

GEMINI_15_API_KEY=AIxxx
GEMINI_15_MODEL=gemini-1.5-pro-latest

prepare assessing LLM

in benchmark/conftest.py, select desired assessing LLM

@pytest.fixture
def client():    
    return GeminiClient() # GptClient, GeminiClient and ClaudeClient are available

run benchmark

Run PK summary benchmark with Semantic assessment

Prepare baseline data in ./benchmark/data/pk-summary/baseline and target data ./benchmark/data/pk-summary/{target}
Set SysVar TARGET

export TARGET={target}

run benchmark

poetry run pytest benchmark/test_pk_summary_benchmark_with_semantic.py

After benchmark completed, we can find the results in ./benchmark/result/pk-summary/{target}/result.log

Run PK summary benchmark with LLM assessment

Prepare baseline data in ./benchmark/data/pk-summary/baseline and target data ./benchmark/data/pk-summary/{target}
Set SysVar TARGET

export TARGET={target}

run benchmark

poetry run pytest benchmark/test_pk_summary_benchmark_with_llm.py

After benchmark completed, we can find the results in ./benchmark/result/pk-summary/{target}/result.log

Run PE benchmark with Semantic assemssment

Prepare baseline data in ./benchmark/data/pe/baseline and target data ./benchmark/data/pe/{target}
Set SysVar TARGET

export TARGET={target}

run benchmark

poetry run pytest benchmark/test_pe_benchmark_with_semantic.py

After benchmark completed, we can find the results in ./benchmark/result/pe/{target}/result.log

Run PE benchmark with LLM assessment

Prepare baseline data in ./benchmark/data/pk-summary/baseline and target data ./benchmark/data/pk-summary/{target}
Set SysVar TARGET

export TARGET={target}

run benchmark

poetry run pytest benchmark/test_pe_benchmark_with_llm.py

After benchmark completed, we can find the results in ./benchmark/result/pe/{target}/result.log

Streamlit UI

The Streamlit app provides an interactive workflow to retrieve papers, extract tables, and curate PK/PE/CT outputs.

Start the app: poetry run streamlit run app.py
In the sidebar, use Access Article to load content by PMID/PMCID or paste raw HTML. Only PMC-hosted articles are retrievable by PMID/PMCID.
In Curation Settings, select the article and task (PK/PE/CT), then click Start Curation.
Optional: use One Click Curation to automatically run multiple pipeline types on the selected article.
Results appear in the main pane, including Article Preview, Curation Result, Follow-up Chat, and Manage Records.

Extract Data From Papers (CLI)

Use the scripts below for batch extraction without the UI. Outputs are written as CSV files per PMID, with error details saved alongside when applicable. Ensure .env is configured before running.

Multi-pipeline extraction with a single model:

poetry run python app_script.py -i 29943508 -o ./out -m gpt4o

Batch extraction from a CSV file:

poetry run python app_script.py -f ./data/pmids.csv -o ./out -m gemini25flash

PK summary only:

poetry run python app_script_pk_summary.py -i 29943508 -o ./out -m gpt4o

bump version

This package employs bump2version to bump version

bump2version {major, minor or patch}

Curate data from literature

See Streamlit UI for interactive curation and Extract Data From Papers (CLI) for batch workflows.

Name		Name	Last commit message	Last commit date
Latest commit History 264 Commits
.github/workflows		.github/workflows
.streamlit		.streamlit
TabFuncFlow		TabFuncFlow
benchmark		benchmark
components		components
data		data
extractor		extractor
images		images
logs		logs
prompts		prompts
scripts		scripts
system_tests		system_tests
tests		tests
.bumpversion.cfg		.bumpversion.cfg
.dockerignore		.dockerignore
.env.template		.env.template
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
Dockerfile		Dockerfile
LICENSE		LICENSE
PROMPTS_PIPELINES.md		PROMPTS_PIPELINES.md
README.md		README.md
app.py		app.py
app_script.py		app_script.py
app_script_pk_summary.py		app_script_pk_summary.py
app_script_pmids.py		app_script_pmids.py
docker-compose.yml		docker-compose.yml
pk_prompt_list.md		pk_prompt_list.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
version.py		version.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tabular-data

Benchmark

setup environment

prepare environment variables

prepare assessing LLM

run benchmark

Streamlit UI

Extract Data From Papers (CLI)

bump version

Curate data from literature

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

tabular-data

Benchmark

setup environment

prepare environment variables

prepare assessing LLM

run benchmark

Streamlit UI

Extract Data From Papers (CLI)

bump version

Curate data from literature

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages