-
Notifications
You must be signed in to change notification settings - Fork 350
Open
Description
Summary
This proposal suggests switching the vllm_router from using httpx to aiohttp for handling asynchronous HTTP requests. The goal is to improve responsiveness and reliability, especially when the system is under heavy load.
Motivation
Right now, vllm_router uses httpx to send asynchronous HTTP requests.
However, benchmarks show that aiohttp performs better than httpx in high-concurrency situations. For example, in a benchmark with 1000 parallel requests:
httpx: around 10.22 secondsaiohttp: around 3.79 seconds
Source
Goals
- Speed up request handling in
vllm_routerwhen under load. - Make routing more reliable and reduce timeouts or failures.
- Keep all existing features working the same as before.
Non-Goals
- No changes will be made to the routing logic or algorithms.
- The public interfaces and APIs exposed by
vllm_routerwill remain unchanged.
Proposed Changes
Implementation Details
- Swap out all usage of
httpx.AsyncClientwithaiohttp.ClientSession. - Ensure request behavior (timeouts, retries, headers, etc.) stays the same.
- Re-test core features like health checks and request forwarding to ensure nothing breaks.
Architecture / Components
- Affected Component: The
vllm_routermodule in thesrcdirectory.
Interface Changes
- No changes to any public-facing API or configuration.
Performance Considerations
- Expect lower response latency and better performance under high concurrency.
Resource Constraints
- This should reduce CPU and memory usage slightly due to
aiohttp’s efficient I/O model.
Test Plans
Unit Tests
- Update unit tests to support
aiohttp. - Add any new tests needed to cover changes in request handling.
Integration/E2E Tests
- Run full integration tests to confirm
vllm_routerbehaves correctly with the new client.
Negative Tests
- Simulate network issues and timeouts to ensure proper error handling with
aiohttp.
Drawbacks
- API Differences:
aiohttphas a different style and error model compared tohttpx, so code will need to be adjusted carefully. - Maintenance Overhead: If the upstream project does not accept this change, we may need to manage and test it separately.
Alternatives
- Stick With
httpx: Keep usinghttpxand try tuning its configuration for better performance. - Use Aiohttp As Transport for Httpx: A more balanced option is to keep
httpxbut replace its transport layer withaiohttpusing a customAiohttpTransport. This gives some of the performance benefits while preserving thehttpxAPI.
More details: link
Implementation Timeline / Phases
- Week 1: Start implementation and testing of the
aiohttpintegration invllm_router.
References
Metadata
Metadata
Assignees
Labels
No labels