fix: add modality defaults to prevent API errors when reading PDFs and other media#1982
Merged
tanzhenxin merged 9 commits intomainfrom Mar 2, 2026
Merged
fix: add modality defaults to prevent API errors when reading PDFs and other media#1982tanzhenxin merged 9 commits intomainfrom
tanzhenxin merged 9 commits intomainfrom
Conversation
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Contributor
📋 Review SummaryThis PR introduces a modality defaults system to prevent API errors when reading PDFs and other media types. It adds automatic detection of model input modalities based on model names, preventing unsupported media types from being sent to incompatible models. The changes are well-structured with comprehensive tests and address critical API error issues. 🔍 General Feedback
🎯 Specific Feedback🟡 High
🟢 Medium
🔵 Low
✅ Highlights
|
Contributor
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
This was referenced Feb 28, 2026
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Add AuthDisplayType enum and helper for Coding Plan detection - Remove formatAuthType/titleizeAuthType functions - Update tests for new auth types Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Add i18n keys for modality types and status labels - Update ModelDialog to use t() for user-facing strings Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Update error messages for unsupported image/PDF inputs with clearer guidance - Add `modalities` setting to override auto-detected input modalities - Document `modalities` config in model-providers.md and settings.md - Update converter tests to match new error message format This provides users with actionable alternatives when their selected model doesn't support certain input types, and allows manual modality overrides for models not recognized by auto-detection. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> - Add 5MB limit for image files to prevent API errors - Add 10MB limit for PDF files based on provider constraints - Return FILE_TOO_LARGE error with clear message when limits exceeded - Add tests for both image and PDF size limit enforcement This prevents errors when attempting to process large binary files that exceed provider API limits.
- Removed exact match assertion for help text that changed in UI - Test now only verifies the dialog title renders correctly The help text changed from 'Enter to select · Esc to close' to 'Enter to select, ↑↓ to navigate, Esc to close', causing the test to fail unnecessarily. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
…heck - Reduce general file size limit from 20MB to 10MB (using 9.9MB threshold) - Remove per-type size limits (5MB images, 10MB PDFs) - Add base64 encoding size check for PDFs to prevent data URI limit errors - Update all tests to reflect new 10MB limit This fixes issue #1880 where large PDFs could exceed API data URI limits after base64 encoding, causing errors. The 9.9MB threshold provides margin for encoding overhead. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> Replace generic skill suggestion with specific guidance to install the document-skills extension for PDF processing.
This was referenced Mar 1, 2026
This was referenced Mar 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TLDR
This PR fixes a critical bug where reading PDF files (and other media types) would cause unrecoverable API 400 errors like:
Invalid value: file. Supported values are: 'text','image_url','video_url' and 'video'.The root cause was that models were being sent media types they don't support. This fix introduces a
modalityDefaultssystem that assigns appropriate input modalities to each model based on its name, along with default token limits for context window sizes.Dive Deeper
The issue occurred because Qwen Code would attempt to send PDF content to models that don't support it, resulting in unrecoverable API errors that required a full restart (issues #1832, #1888, #1803).
Key Changes:
New
modalityDefaults.tsmodule: Defines default input modalities (image, pdf, audio, video) for different model families using regex patterns. Unknown models default to text-only to prevent API errors.Enhanced
tokenLimits.ts: Updated with accurate context window size defaults for various model families, used when models don't explicitly configure them.Model Registry Integration: The
ModelRegistrynow automatically assigns:modalitiesbased on model ID if not explicitly configuredcontextWindowSizebased on model ID if not explicitly configuredUpdated Model Dialog: The
/modeldialog now displays:Updated Model Types: Added
modalities,baseUrl, andenvKeyfields to model configuration types.Comprehensive Test Coverage: Added unit tests for the new modality defaults system.
Model Support Matrix (Input Modalities):
Default Context Window Sizes:
Reviewer Test Plan
Test reading PDF files with different model types:
Test reading images with vision models vs non-vision models
Verify no API 400 errors occur when switching between model types
Test the
/modeldialog:/modelcommandLinked issues / bugs
🤖 Generated with Qwen Code