Conversation
There was a problem hiding this comment.
Code Review
This pull request implements a mechanism to bypass weight caching by using timestamped symlinks for sampler checkpoints and ensures the prefix cache is reset upon loading new weights. It also increases HTTP timeouts and simplifies the sampler initialization to exclusively use vLLMSampler. Feedback was provided regarding the removal of the sampler_type branching logic, which leaves a dead parameter, and suggestions were made to improve symlink robustness by using relative paths and higher-resolution timestamps.
There was a problem hiding this comment.
Pull request overview
Fixes issues around sampler weight saving/loading and GRPO-related sampling behavior by adjusting checkpoint naming, cache invalidation, and request timeouts.
Changes:
- Increase HTTP client request timeouts for Twinkle client requests.
- Change sampler checkpoint saving to always write weights under
sampler_weights/latestwhile returning a per-save timestamp path (via symlink) to avoid path-based caching. - Reset vLLM prefix cache when new sampler weights/adapters are used, and simplify sampler deployment initialization to always use vLLM.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
src/twinkle_client/http/http_utils.py |
Increases default request timeout for GET/POST/DELETE helpers. |
src/twinkle/server/utils/checkpoint_base.py |
Adjusts sampler checkpoint naming/storage and adds timestamp symlink return path; improves sampler-weight cleanup for symlinks. |
src/twinkle/server/sampler/twinkle_handlers.py |
Resets sampler prefix cache when an adapter URI is provided for Twinkle sampling requests. |
src/twinkle/server/sampler/tinker_handlers.py |
Resets sampler prefix cache during Tinker sampling flow. |
src/twinkle/server/sampler/app.py |
Removes sampler-type branching and always initializes vLLMSampler. |
src/twinkle/server/model/twinkle_handlers.py |
Ensures sampler weight files are saved under latest/ to match checkpoint manager behavior. |
src/twinkle/server/model/tinker_handlers.py |
Ensures sampler weight files are saved under latest/ for Tinker sampler checkpoints. |
src/twinkle/sampler/vllm_sampler/vllm_sampler.py |
Adds logging when attempting to load LoRA from a path during sampling. |
src/twinkle/sampler/vllm_sampler/vllm_engine.py |
Adds logging when returning a cached LoRA request. |
PR type
PR information
Write the detail information belongs to this PR.
Experiment results
Paste your experiment result here(if needed).