Skip to content

BeibeiIsFreshman/SAM-USOD

Repository files navigation

Prompt then Refine: Prompt-Free SAM-Enhanced Collaborative Learning Network for Detecting Salient Objects in Underwater Images

Data

The USOD10k training set can be downloaded from the publisher of USOD10k. “USOD10K: A New Benchmark Dataset for Underwater Salient Object Detection”--https://github.com/LinHong-HIT/USOD10K.

SAM

The SAM fine-tuning framework is available on the release site of MDSAM. “Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection” Due to the limited equipment of our model, SAM_B can only be trained at 256 x 256 and SAM_L can only be trained at 224 x 224. If you have good equipment to reproduce this code and get better weights, such as using a higher size or a more powerful SAM original weight.

Eaasy

Abstract— RGB–depth underwater salient object detection (USOD) poses considerable challenges, such as uneven lighting, visual interference, and image blur, which limit the effectiveness of traditional approaches. The segment anything model (SAM), known for its robust segmentation capabilities, offers a promising alternative. However, SAM depends on prompt labels (e.g., points, boxes, masks) to perform effective resources typically unavailable in USOD datasets. To address this, we propose SAM-CLNet, a prompt-free, SAM-enhanced collaborative learning network comprising three main components: (1) SAM, (2) a mask prompt generator (MPG), and (3) a region-aware attention collaborative learning loss (RCL). In our framework, pseudo-mask prompts generated by MPG were used as input prompts for SAM, helping to offset performance degradation due to the absence of manual labels. Simultaneously, RCL leveraged high-quality SAM predictions to refine MPG, enhancing its feature extraction while minimizing the impact of low-quality pseudo-prompts on SAM. This cyclic feedback mechanism facilitated mutual improvement in detection accuracy. In addition, we introduced a U-Adapter module to adapt SAM for underwater imagery and incorporated a frequency cross-attention fusion module in MPG to integrate RGB and depth information. The region-aware attention in RCL further targeted challenging regions by comparing SAM’s predictions with MPG’s pseudo-mask. Experiments on the USOD10K and USOD datasets demonstrated that SAM-CLNet outperformed existing methods and generalized effectively across five public salient object detection benchmarks.

The diagram of our model is as follows: image

The results of our comparison method are as follows: image image

Weight

The weights will be uploaded in a timely manner once the paper is accepted.

Environment

Refer to requirements.txt for the environment configuration file.

About

SAM-USOD

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages