[ICLR 2026] Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
-
Updated
Mar 11, 2026 - Python
[ICLR 2026] Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
[ICLR 2026] The official implementation associated with the paper "3DGEER: 3D Gaussian Rendering Made Exact and Efficient for Generic Cameras"
[ICLR'26] AutoGEO: a Generative Engine Optimization framework to automatically learn generative engine preferences, and rewrite web contents for more traction.
[ICLR 2026] Code for "gen2seg: Generative Models Enable Generalizable Instance Segmentation"
[ICLR 2026] Official code of "Segment any Events with Language"
ICLR 2026-MVAR: Visual Autoregressive Modeling with Scale and Spatial Markovian Conditioning
[ICLR 2026] Meta-RL Induces Exploration in Language Agents
[ICLR 2026] Learning to Parallel: Accelerating Diffusion Large Language Models via Learnable Parallel Decoding
[ICLR 2026] This is the official PyTorch implementation of "QVGen: Pushing the Limit of Quantized Video Generative Models".
[ICLR 2026] Official implementation of the paper "Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs"
LoongRL: Reinforcement Learning for Advanced Reasoning over Long Contexts (ICLR 2026 Oral)
[ICLR 2026] MergeMix: A Unified Augmentation Paradigm for Visual and Multi-Modal Understanding
🦅 FALCON: an effective vision-language-action model injects rich 3D spatial tokens into the action head, enabling robust spatial understanding and SOTA performance across diverse manipulation tasks. || Accepted at ICLR 2026.
[ICLR 2026] Official implementation (Claude Agent reproduce supported) of paper "mtLoRA: Scalable Multi-Task Low-Rank Model Adaptation" +2.3% over SOTA with 47% fewer parameters
[ICLR 2026] StableToken: A state-of-the-art noise-robust semantic speech tokenizer featuring Voting-LFQ for resilient SpeechLLMs.
The official code of "TripleSumm: Adaptive Triple-Modality Fusion for Video Summarization" (ICLR 2026)
[ICLR'26] Official code for "A-TPT: Angular Diversity Calibration Properties for Test-Time Prompt Tuning of Vision-Language Models"
[ICLR 2026] Official code of PPE: Positional Preservation Embedding for Token Compression in Multimodal Large Language Models.
Official Code for ICLR'26 Work LapFlow
Add a description, image, and links to the iclr2026 topic page so that developers can more easily learn about it.
To associate your repository with the iclr2026 topic, visit your repo's landing page and select "manage topics."