Migrate to litellm for model compatability#24
Migrate to litellm for model compatability#24trevormells wants to merge 10 commits intoPickle-Pixel:mainfrom
Conversation
|
addresses a wide range of model compatibility issues that have been surfaced in issues |
- Add AgentBackend abstraction for Claude and OpenCode - Implement backend detection and preference logic - Add MCP server management for both backends - Maintain compatibility with PR Pickle-Pixel#24 LiteLLM integration - Update scoring/tailoring/wizard to use new backend system
Adds LLM_STREAMING_MODE environment variable to enable streaming mode for LLM proxies that require it. When enabled, uses LiteLLM with stream=True and accumulates chunks into plain text response.
|
I created a PR to incorporate a fix I required to make this litellm branch work for me. I've been testing the combination of this litellm migration, my opencode support, greenhouse api support, and an improved iterative tailoring workflow in my dev integration branch. Trying to extract each feature out separately. |
- Add AgentBackend abstraction for Claude and OpenCode - Implement backend detection and preference logic - Add MCP server management for both backends - Maintain compatibility with PR Pickle-Pixel#24 LiteLLM integration - Update scoring/tailoring/wizard to use new backend system
fix(llm): Add LLM_STREAMING_MODE option for custom endpoints
Issue Pickle-Pixel#4 (upstream): Gemini 2.5 thinking-token models silently consume the max_tokens budget before generating output, causing the tailoring stage to exhaust all retries with truncated/empty JSON. Community confirmed fix: raise limits substantially. - tailor.py: 2048 -> 8192 for generation, 512 -> 1024 for judge - These were also too low for long academic/research CVs added in the previous commit. From upstream PR Pickle-Pixel#24 (selective, without the full LiteLLM migration): - llm.py: add ANTHROPIC_API_KEY detection and _chat_native_anthropic() handler using the Anthropic Messages API format (x-api-key header, top-level system field, content[0]["text"] response extraction) - config.py: include ANTHROPIC_API_KEY in get_tier() and check_tier() - .env.example: document ANTHROPIC_API_KEY option - cli.py: route per-attempt tailor/cover logs to ~/.applypilot/logs/ instead of the terminal (reduces noise; details still available on disk) Skipped from PR Pickle-Pixel#24: full LiteLLM migration (tight ~=1.63.0 pin on a fast-moving package, replaces working custom HTTP logic unnecessarily). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@rothnic thanks for this. Sort of new to this world but with some programming experience. How does it work to get a new version of ApplyPilot with your changes? Do we need to wait for your PR to be accepted and merged into a new release version? |
|
Hi Team, Can we get the fix merged here? Also having the same issue, thx! |
Summary
This PR migrates ApplyPilot’s LLM layer to a LiteLLM-based adapter and standardizes provider/model configuration across CLI, wizard, docs, and runtime checks. It reduces provider-specific logic, adds Anthropic support, and tightens test coverage around LLM resolution/client behavior.
What Changed
llm.py.resolve_llm_config()contract for provider/model/api key resolution.GEMINI_API_KEYOPENAI_API_KEYANTHROPIC_API_KEYLLM_URL(OpenAI-compatible local endpoint)LLM_API_KEY(generic key fallback)openai/gpt-4o-mini,gemini-3.0-flash)LLM_MODELis not setclient.chat(..., max_output_tokens=...)and removedask()usage.applypilot initflow to allow saving multiple provider credentials and explicitLLM_MODEL..env.examplefor new provider options and model format.Tests
Added
test_llm_resolution.pytest_llm_client.pytest_gemini_smoke.py(optional smoke:@pytest.mark.smoke)Suggested commands
Notes