Problem
I'm playing with DeepSeek-R1-Distill-Llama-70B with Page Assist, when I ask LLM a question, it just generated a bunch of text without splitting it into thinking part and answer part. However, when I use other backends like vllm, it first started to think in a folded thinking box, and then gave the answer outside the box.
Solution
I have looked through endpoint/OAI, but I'm not familiar with async programming 🥲. The feature can be implemented by adding a reasoning content parser function, which detects <think> or </think> tags in LLM's response, and separate them into reasoning_content and content. And I have to mention that <think> and </think> tags are not necessarily in perfect pairs. DeepSeek-R1-Distill models have <think> tag in chat template, so the model only generate </think> tag. In that case, we can just divide content by </think> tag.
Alternatives
No response
Explanation
As DeepSeek released their new R1 models, reasoning_content field is added to their API reference. Enable reasoning content parsing is very helpful for chat frontends, because reasoning content can be folded or unfolded to make the whole UI clean and let users focus on the answer. As far as I know, vllm and llama.cpp have already supported this feature.
Examples
No response
Additional context
Thank you for your reading, I would be grateful if you could add this feature to the repo.
Acknowledgements
Problem
I'm playing with DeepSeek-R1-Distill-Llama-70B with Page Assist, when I ask LLM a question, it just generated a bunch of text without splitting it into thinking part and answer part. However, when I use other backends like vllm, it first started to think in a folded thinking box, and then gave the answer outside the box.
Solution
I have looked through
endpoint/OAI, but I'm not familiar withasyncprogramming 🥲. The feature can be implemented by adding a reasoning content parser function, which detects<think>or</think>tags in LLM's response, and separate them intoreasoning_contentandcontent. And I have to mention that<think>and</think>tags are not necessarily in perfect pairs. DeepSeek-R1-Distill models have<think>tag in chat template, so the model only generate</think>tag. In that case, we can just divide content by</think>tag.Alternatives
No response
Explanation
As DeepSeek released their new R1 models,
reasoning_contentfield is added to their API reference. Enable reasoning content parsing is very helpful for chat frontends, because reasoning content can be folded or unfolded to make the whole UI clean and let users focus on the answer. As far as I know, vllm and llama.cpp have already supported this feature.Examples
No response
Additional context
Thank you for your reading, I would be grateful if you could add this feature to the repo.
Acknowledgements