Joy Caption is a ComfyUI node using the LLaVA model to generate stylized image captions, supporting batch processing and GGUF models.
-
Updated
Dec 24, 2025 - Python
Joy Caption is a ComfyUI node using the LLaVA model to generate stylized image captions, supporting batch processing and GGUF models.
Cross-platform FlashAttention-2 Triton implementation for Turing+ GPUs with custom configuration mode
Visual-Grounding-Anything is a comprehensive suite of applications designed for precise object detection, pointing, and tracking in both images and videos. Leveraging the Polaris-VGA-4B model, the system provides high-accuracy spatial reasoning and temporal association across various visual tasks.
Add a description, image, and links to the transfromers topic page so that developers can more easily learn about it.
To associate your repository with the transfromers topic, visit your repo's landing page and select "manage topics."