Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
|
||
| messages = body["messages"] | ||
|
|
||
| # HACK: tiny-agents sends requests ending with assistant message — skip |
There was a problem hiding this comment.
Maybe worth elaborating this comment a bit more in the future
There was a problem hiding this comment.
yeah i kept it as it was there in the orignal serve.py but i didn't do any testing. Not sure if it is worth keeping
There was a problem hiding this comment.
why do we need an additional file for this btw? IMO we should focus on the existing test file, with the addition of other tests there. I would still make sure all existing tests in the existing module pass.
There was a problem hiding this comment.
same as serve.py. I kept the original file to make sure that all the tests are covered as I iterate through the different features. I will update the name at the end of the refactor !
CI ResultsCommit Info
Model CI Report❌ 3 new failed tests from this PR 😭
|
|
run-slow: cli |
|
This comment contains models: ["cli"] |
|
run-slow: cli |
|
This comment contains models: ["cli"] |
24fb171 to
052cbc7
Compare
* new serve file * app * model_manager done * update serve * style * poc done * renaming * fix * new tests * update metrics and processor * hardcode n_batch for now * add response api + compile * more tests * add it for now but we will move it * remove cache impl * add back load_model * fix naming * add transcription * tool calls better ! * vlm support for both response and chat endpoints * update bench * fix vl test * first iteration of cb * cb tests * typing + review * update test * better benchmark * better stream * update bench * fix * serve refactored * merge * update * fix * style * simpler * style * update warmup * remove llamacpp integration for now * styke * styke * style again * remove annoattion * review ! * style * much cleaner * renamed * remove bench for now * batch output * style * type * better tests * update test * queue draining * some logs * readd nathan feature + some minor fixes * fix * guard transcription * better now * fix * adding lock to see if this helps * remove locks * lock again * update bench and remove lock for now
* new serve file * app * model_manager done * update serve * style * poc done * renaming * fix * new tests * update metrics and processor * hardcode n_batch for now * add response api + compile * more tests * add it for now but we will move it * remove cache impl * add back load_model * fix naming * add transcription * tool calls better ! * vlm support for both response and chat endpoints * update bench * fix vl test * first iteration of cb * cb tests * typing + review * update test * better benchmark * better stream * update bench * fix * serve refactored * merge * update * fix * style * simpler * style * update warmup * remove llamacpp integration for now * styke * styke * style again * remove annoattion * review ! * style * much cleaner * renamed * remove bench for now * batch output * style * type * better tests * update test * queue draining * some logs * readd nathan feature + some minor fixes * fix * guard transcription * better now * fix * adding lock to see if this helps * remove locks * lock again * update bench and remove lock for now
* new serve file * app * model_manager done * update serve * style * poc done * renaming * fix * new tests * update metrics and processor * hardcode n_batch for now * add response api + compile * more tests * add it for now but we will move it * remove cache impl * add back load_model * fix naming * add transcription * tool calls better ! * vlm support for both response and chat endpoints * update bench * fix vl test * first iteration of cb * cb tests * typing + review * update test * better benchmark * better stream * update bench * fix * serve refactored * merge * update * fix * style * simpler * style * update warmup * remove llamacpp integration for now * styke * styke * style again * remove annoattion * review ! * style * much cleaner * renamed * remove bench for now * batch output * style * type * better tests * update test * queue draining * some logs * readd nathan feature + some minor fixes * fix * guard transcription * better now * fix * adding lock to see if this helps * remove locks * lock again * update bench and remove lock for now
What does this PR do?
This PR refactors transformers serve so that it is not in a single file. We split it into multiple files with clear responsabilities. There were 2,293 lines initially in the serve.py file.
I've added and fix some features that there missing or broken from the orignal serve. It was easier to fix those in this refactor as it helped me find the right design.
For CB to work correctly, we need to first merge this PR: #45063 -> merged !
I'll add benchmarks in a follow-up as this PR is already big enough !