GitHub - Edisonhimself/MediSee: [ACM Multimedia 2025🎉] The project for the paper titled "MediSee: Reasoning-based Pixel-level Perception in Medical Images"

MediSee: Reasoning-based Pixel-level Perception in Medical Images
(ACM Multimedia 2025)

Qinyue Tong¹ · Ziqian Lu^2* · Jun Liu¹
Yangming Zheng¹ · Zhe-ming Lu¹
¹Zhejiang University, ²Zhejiang Sci-Tech University, ^*Corresponding author
🧑‍💼 Project Leader: Prof. Zhe-ming Lu

📣 News

[2025/11/18] We released the Training Code of MediSee and GPT prompts used in our paper💡.
[2025/11/16] We released the 🔍Evaluation Code🔍 to help users conveniently test MediSee.
[2025/11/16] We released the MediSee 🔥Model Weights🔥.
[2025/11/6] We released the ✨Demo Code✨ of MediSee.
[2025/7/6] Our MediSee has been accepted by ACM Multimedia 2025 🎉🎉🎉!
[2025/4/25] Video demo 📷 is live now!
[2025/4/24] We’ve uploaded our paper MediSee: Reasoning-based Pixel-level Perception in Medical Images to arXiv and set up this repository! Welcome to watch 👀 this repository for the latest updates.

📷 Video Demo

Getting Started and Installation

1. Prepare the code and the environment

Git clone our repository, creating a python environment and activate it via the following command

git clone https://github.com/Edisonhimself/MediSee.git
cd MediSee
conda env create -f environment.yml
conda activate medisee

2. Prepare the pretrained MLLM weights

MediSee is based on llava-med-v1.5-mistral-7b. Please first download the MLLM weights from the following huggingface space: Download.

3. Prepare the pretrained MedSAM weights

MediSee uses MedSAM as the segmentation head. Please first download the version of medsam_vit_b from the following space: Download.

4. Prepare the pretrained CLIP weights

Due to frequent disconnections from Hugging Face, we recommend manually downloading the clip-vit-large-patch14-336 model from the following huggingface space: Download.

5. Prepare our MediSee weights

Download the MediSee pretrained model checkpoints at Download.

Quick Start for the MediSee Demo

To facilitate a quick hands-on experience with Medisee, we provide a demo script for rapid start-up.

Please set the image you want to test here and set your query here. Next, fill in the paths of the downloaded models in the script in order. Specifically:

Put the path of llava-med here
Put the path of medsam here
Put the path of clip here
Put the path of MediSee here

Finally, run:

bash quick_demo.sh

Evaluate MediSee

We provide a test script that supports batch evaluation of MediSee’s performance.

First, following the instructions in the previous section, fill in the paths of all pretrained weights in order in both evaluate.py and evaluate.sh.

To run batch evaluation or tests in MediSee, you need to prepare a .jsonl test file.
Each line in the file represents one test sample and should follow the format below:

{"image": "", "mask": "", "class_text": "", "long": "", "short": "", "bbox": []}
{"image": "", "mask": "", "class_text": "", "long": "", "short": "", "bbox": []}

Then, fill in the path to your .jsonl test file in utils/dataset.py at line 948.

Finally, run:

bash evaluate.sh

We also provide a script for rapid testing on a single image, allowing you to quickly verify MediSee's performance without running a full batch evaluation. Specifically, set your input image in inference_one_image.py at line 141, the ground-truth mask at line 142, your query at line 143, and the ground-truth bounding box at line 144.

Finally, run:

bash inference_one_image.sh

Train MediSee

You can use our training code to adapt MediSee to your own data.

First, similar to the operations in the previous section, you need to fill in the corresponding model weight paths in both train_ds.py and train.sh.

Next, you need to construct data with a structure similar to the example below, and fill in your own data paths in utils/seg_med_2d_dataset.py at line 22:

{
    "image_path_1": {
        "mask_path_1": "class_1",
        "mask_path_2": "class_2"
    },
    "image_path_2": {
        "mask_path_3": "class_3",
        "mask_path_4": "class_4"
    }
}

Finally, run:

train.sh

Supplementary Content of the Paper

Here, we present some of the prompts mentioned in the paper to provide additional inspiration and reference for further research.

👏 Acknowledgements

This project is developed on the codebase of LISA and data from SA-Med2D-20M Dataset. We appreciate their valuable contributions!

🤟 Citation

If you find our paper is helpful for your research, please consider citing:

@article{tong2025medisee,
  title={MediSee: Reasoning-based Pixel-level Perception in Medical Images},
  author={Tong, Qinyue and Lu, Ziqian and Liu, Jun and Zheng, Yangming and Lu, Zheming},
  journal={arXiv preprint arXiv:2504.11008},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MediSee: Reasoning-based Pixel-level Perception in Medical Images
(ACM Multimedia 2025)

📣 News

📷 Video Demo

Getting Started and Installation

Quick Start for the MediSee Demo

Evaluate MediSee

Train MediSee

Supplementary Content of the Paper

👏 Acknowledgements

🤟 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
images		images
model		model
prompts		prompts
utils		utils
README.md		README.md
environment.yml		environment.yml
evaluate.py		evaluate.py
evaluate.sh		evaluate.sh
inference_one_image.py		inference_one_image.py
inference_one_image.sh		inference_one_image.sh
medisee-demo.gif		medisee-demo.gif
merge_lora_weights_and_save_hf_model.py		merge_lora_weights_and_save_hf_model.py
quick_demo.py		quick_demo.py
quick_demo.sh		quick_demo.sh
train.sh		train.sh
train_ds.py		train_ds.py

Folders and files

Latest commit

History

Repository files navigation

MediSee: Reasoning-based Pixel-level Perception in Medical Images (ACM Multimedia 2025)

📣 News

📷 Video Demo

Getting Started and Installation

Quick Start for the MediSee Demo

Evaluate MediSee

Train MediSee

Supplementary Content of the Paper

👏 Acknowledgements

🤟 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

MediSee: Reasoning-based Pixel-level Perception in Medical Images
(ACM Multimedia 2025)

Packages