Prompt then Refine: Prompt-Free SAM-Enhanced Collaborative Learning Network for Detecting Salient Objects in Underwater Images
The USOD10k training set can be downloaded from the publisher of USOD10k. “USOD10K: A New Benchmark Dataset for Underwater Salient Object Detection”--https://github.com/LinHong-HIT/USOD10K.
The SAM fine-tuning framework is available on the release site of MDSAM. “Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection” Due to the limited equipment of our model, SAM_B can only be trained at 256 x 256 and SAM_L can only be trained at 224 x 224. If you have good equipment to reproduce this code and get better weights, such as using a higher size or a more powerful SAM original weight.
Abstract— RGB–depth underwater salient object detection (USOD) poses considerable challenges, such as uneven lighting, visual interference, and image blur, which limit the effectiveness of traditional approaches. The segment anything model (SAM), known for its robust segmentation capabilities, offers a promising alternative. However, SAM depends on prompt labels (e.g., points, boxes, masks) to perform effective resources typically unavailable in USOD datasets. To address this, we propose SAM-CLNet, a prompt-free, SAM-enhanced collaborative learning network comprising three main components: (1) SAM, (2) a mask prompt generator (MPG), and (3) a region-aware attention collaborative learning loss (RCL). In our framework, pseudo-mask prompts generated by MPG were used as input prompts for SAM, helping to offset performance degradation due to the absence of manual labels. Simultaneously, RCL leveraged high-quality SAM predictions to refine MPG, enhancing its feature extraction while minimizing the impact of low-quality pseudo-prompts on SAM. This cyclic feedback mechanism facilitated mutual improvement in detection accuracy. In addition, we introduced a U-Adapter module to adapt SAM for underwater imagery and incorporated a frequency cross-attention fusion module in MPG to integrate RGB and depth information. The region-aware attention in RCL further targeted challenging regions by comparing SAM’s predictions with MPG’s pseudo-mask. Experiments on the USOD10K and USOD datasets demonstrated that SAM-CLNet outperformed existing methods and generalized effectively across five public salient object detection benchmarks.
The diagram of our model is as follows:

The results of our comparison method are as follows:

The weights will be uploaded in a timely manner once the paper is accepted.
Refer to requirements.txt for the environment configuration file.