Qinyue Tong1
Β·
Ziqian Lu2*
Β·
Jun Liu1
Yangming Zheng1
Β·
Zhe-ming Lu1
1Zhejiang University, 2Zhejiang Sci-Tech University, *Corresponding author
π§βπΌ Project Leader: Prof. Zhe-ming Lu
- [2025/11/18] We released the Training Code of MediSee and GPT prompts used in our paperπ‘.
- [2025/11/16] We released the πEvaluation Codeπ to help users conveniently test MediSee.
- [2025/11/16] We released the MediSee π₯Model Weightsπ₯.
- [2025/11/6] We released the β¨Demo Codeβ¨ of MediSee.
- [2025/7/6] Our MediSee has been accepted by ACM Multimedia 2025 πππ!
- [2025/4/25] Video demo π· is live now!
- [2025/4/24] Weβve uploaded our paper MediSee: Reasoning-based Pixel-level Perception in Medical Images to arXiv and set up this repository! Welcome to watch π this repository for the latest updates.
1. Prepare the code and the environment
Git clone our repository, creating a python environment and activate it via the following command
git clone https://github.com/Edisonhimself/MediSee.git
cd MediSee
conda env create -f environment.yml
conda activate medisee2. Prepare the pretrained MLLM weights
MediSee is based on llava-med-v1.5-mistral-7b. Please first download the MLLM weights from the following huggingface space: Download.
3. Prepare the pretrained MedSAM weights
MediSee uses MedSAM as the segmentation head. Please first download the version of medsam_vit_b from the following space: Download.
4. Prepare the pretrained CLIP weights
Due to frequent disconnections from Hugging Face, we recommend manually downloading the clip-vit-large-patch14-336 model from the following huggingface space: Download.
5. Prepare our MediSee weights
Download the MediSee pretrained model checkpoints at Download.
To facilitate a quick hands-on experience with Medisee, we provide a demo script for rapid start-up.
Please set the image you want to test here and set your query here. Next, fill in the paths of the downloaded models in the script in order. Specifically:
- Put the path of llava-med here
- Put the path of medsam here
- Put the path of clip here
- Put the path of MediSee here
Finally, run:
bash quick_demo.shWe provide a test script that supports batch evaluation of MediSeeβs performance.
First, following the instructions in the previous section, fill in the paths of all pretrained weights in order in both evaluate.py and evaluate.sh.
To run batch evaluation or tests in MediSee, you need to prepare a .jsonl test file.
Each line in the file represents one test sample and should follow the format below:
{"image": "", "mask": "", "class_text": "", "long": "", "short": "", "bbox": []}
{"image": "", "mask": "", "class_text": "", "long": "", "short": "", "bbox": []}
Then, fill in the path to your .jsonl test file in utils/dataset.py at line 948.
Finally, run:
bash evaluate.shWe also provide a script for rapid testing on a single image, allowing you to quickly verify MediSee's performance without running a full batch evaluation.
Specifically, set your input image in inference_one_image.py at line 141, the ground-truth mask at line 142, your query at line 143, and the ground-truth bounding box at line 144.
Finally, run:
bash inference_one_image.shYou can use our training code to adapt MediSee to your own data.
First, similar to the operations in the previous section, you need to fill in the corresponding model weight paths in both train_ds.py and train.sh.
Next, you need to construct data with a structure similar to the example below, and fill in your own data paths in utils/seg_med_2d_dataset.py at line 22:
{
"image_path_1": {
"mask_path_1": "class_1",
"mask_path_2": "class_2"
},
"image_path_2": {
"mask_path_3": "class_3",
"mask_path_4": "class_4"
}
}Finally, run:
train.shHere, we present some of the prompts mentioned in the paper to provide additional inspiration and reference for further research.

This project is developed on the codebase of LISA and data from SA-Med2D-20M Dataset. We appreciate their valuable contributions!
If you find our paper is helpful for your research, please consider citing:
@article{tong2025medisee,
title={MediSee: Reasoning-based Pixel-level Perception in Medical Images},
author={Tong, Qinyue and Lu, Ziqian and Liu, Jun and Zheng, Yangming and Lu, Zheming},
journal={arXiv preprint arXiv:2504.11008},
year={2025}
}
