Skip to content

Conversation

@hankbeasley
Copy link

@hankbeasley hankbeasley commented Jan 31, 2025

Description

Some hosts for DeepSeek R1 return <Think> tags in the assistant responses. These reasoning tokens do not get classified by RooCode correctly.

I have noticed this behavior with both Azure and Kluster.ai.

This pull request parses these tokens correctly. I added one toggle to the UI to turn the parsing on or off although I think it could safely be used all the time unless someone needs to use <Think> tags for another purpose

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

I have tested with Kluster.ai

Needs review

Note that if a response has both delta.content and delta.reasoning_content, the behavior is changed with this pull request. Previously, the chunk would have been returned as "text." Now, it would be returned as "reasoning." I think its fine as I don't think it should ever have both properties.


Important

Adds parsing for <think> tags in OpenAiHandler to classify reasoning tokens, with a UI toggle to enable/disable this feature.

  • Behavior:
    • Parses <think> tags in responses to classify reasoning tokens in OpenAiHandler in openai.ts.
    • Changes behavior when both delta.content and delta.reasoning_content are present, now classifying as "reasoning".
  • Classes:
    • Adds ThinkingTokenSeparator and PassThroughTokenSeparator to handle token parsing in openai.ts.
  • UI:
    • Adds a toggle in ApiOptions.tsx to enable/disable <think> tag parsing.
  • Models:
    • Adds thinkTokensInResponse to ModelInfo in api.ts.

This description was created by Ellipsis for 3921e8e6eb0c9b3f1e8ba356ebcbe7694942925b. It will automatically update as commits are pushed.

@changeset-bot
Copy link

changeset-bot bot commented Jan 31, 2025

⚠️ No Changeset found

Latest commit: 74308e6

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider extracting common styles and repetitive logic for handling input changes into reusable components or functions. This will improve code readability and maintainability. Refer to our Development Standards: https://www.notion.so/Development-Standards-59febcf8ead647fd9c2ec3f60c22f3df?pvs=4#11869ad2d5818094a05ef707e188c0d5

@mrubens
Copy link
Collaborator

mrubens commented Jan 31, 2025

@Szpadel do you have time to take a look at this? You've been way closer to this than me. Thanks for opening the PR @hankbeasley!

@Mushoz
Copy link

Mushoz commented Jan 31, 2025

Will this work for the distilled models as well? They suffer from the same issue.

@hankbeasley
Copy link
Author

Will this work for the distilled models as well? They suffer from the same issue.

It should as they were also trained to respond with think tags.

@Szpadel
Copy link
Contributor

Szpadel commented Jan 31, 2025

@hankbeasley I believe this will only work when whole <think> is encoded as single token, but I'm not sure if that's always true.

I believe we should check if chunk contains <think> (then split chunk into reasoning chunk and text chunk) or ends with any part of it, like <thin, < etc., then save it to buffer and do not emit anything (so we can get next chunk and determine if that is expected tag)

Second, I wonder if some models output <thinking> instead. https://github.com/RooVetGit/Roo-Code/blob/e8f0b35860506ea6a43418a1da3e5f9596fabafc/src/core/Cline.ts#L1006-L1007
The code currently strips such tags in post process.

@hankbeasley hankbeasley reopened this Jan 31, 2025
@hankbeasley
Copy link
Author

hankbeasley commented Jan 31, 2025

I am pretty sure that <think> equals one token in the model and therefore would never be output in pieces. I thought about parsing pieces but I most likely would introduce a bug for edge cases on my first try and would need lots of unit tests.

Perhaps there is another model that outputs <thinking> instead of <think> ? Instead of a boolean to turn on the parsing on and off we could use a string for the tag name?

@hankbeasley I believe this will only work when whole <think> is encoded as single token, but I'm not sure if that's always true.

I believe we should check if chunk contains <think> (then split chunk into reasoning chunk and text chunk) or ends with any part of it, like <thin, < etc., then save it to buffer and do not emit anything (so we can get next chunk and determine if that is expected tag)

Second, I wonder if some models output <thinking> instead.

https://github.com/RooVetGit/Roo-Code/blob/e8f0b35860506ea6a43418a1da3e5f9596fabafc/src/core/Cline.ts#L1006-L1007

The code currently strips such tags in post process.

@hankbeasley
Copy link
Author

Second, I wonder if some models output <thinking> instead.

It looks like the <thinking> tags you are referring to are an attempt to get non-thinking models to reason via prompt engineering. See the prompt below.

[ ](https://github.com/RooVetGit/Roo-Code/blob/main/src/core/prompts/sections/objective.ts)

@Szpadel
Copy link
Contributor

Szpadel commented Jan 31, 2025

What do you think about doing it this way?
Szpadel@f3192d7

this should not contain any corner cases, independently if think is used as single token or

Why I believe this might not be using single token:
Distilled models are trained to use reasoning in their response, but their embeddings were kept as they were.
This means that there was no designed special token to start thinking, and most likely it will be constructed form multiple tokens.

@hankbeasley
Copy link
Author

hankbeasley commented Jan 31, 2025

Link to partial token parser created by o1. I don't trust it though without tests for edge cases. Including here for reference.

https://gist.github.com/hankbeasley/ebc474f66cc489625f496effc607ac80

@hankbeasley
Copy link
Author

What do you think about doing it this way? Szpadel@f3192d7

this should not contain any corner cases, independently if think is used as single token or

Why I believe this might not be using single token: Distilled models are trained to use reasoning in their response, but their embeddings were kept as they were. This means that there was no designed special token to start thinking, and most likely it will be constructed form multiple tokens.

Looks good to me. Good point about distilled models.

@Szpadel
Copy link
Contributor

Szpadel commented Jan 31, 2025

Fell free to rebase on top of my commit and add settings option, I do not want to steal contribution from you :)

@hankbeasley
Copy link
Author

Fell free to rebase on top of my commit and add settings option, I do not want to steal contribution from you :)

what do you think about where I put the UI option? I only started working with the project today so not sure if it's aligned with patterns.

@Szpadel
Copy link
Contributor

Szpadel commented Feb 1, 2025

I'm no expert on UI, so I won't comment on that.
While thinking about steam handling, there is one more corner case: when steam ends with any partial of think tag, incomplete response will be returned.

We need to flush buffer when steam is ended

I'll prepare code for this when I'll have access to computer

@Szpadel
Copy link
Contributor

Szpadel commented Feb 1, 2025

One more idea: could we assume any of this:
a) reasoning model always starts with <think> token
b) there can be only one <think> tag containing reasoning in the output.

With those we could greatly reduce chances of incorrect handling in case output should contain such tag.

We would be only affected in case there would be closing tag inside reasoning

@Szpadel
Copy link
Contributor

Szpadel commented Feb 1, 2025

Flushing: Szpadel@e0e2bbe

@Szpadel
Copy link
Contributor

Szpadel commented Feb 4, 2025

@hankbeasley could you pull my commit and rebase(or fix conflicts)?
it would be nice to merge this

@hankbeasley hankbeasley closed this Feb 6, 2025
@hankbeasley
Copy link
Author

hankbeasley commented Feb 6, 2025

@Szpadel I merged your flush change. I haven't had time to test these changes though. I am fine with you closing this pull request and reopening a new one.

@hankbeasley hankbeasley reopened this Feb 6, 2025
@jcbdev
Copy link
Contributor

jcbdev commented Feb 10, 2025

Don't know if it's still relevant to the conversation above but the the tag is a a specific token for the R1 distill models as you can see from the tokenizer.json on hugging face - https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B/blob/main/tokenizer.json

{
      "id": 128013,
      "content": "<think>",
      "single_word": false,
      "lstrip": false,
      "rstrip": false,
      "normalized": false,
      "special": false
    },
    {
      "id": 128014,
      "content": "</think>",
      "single_word": false,
      "lstrip": false,
      "rstrip": false,
      "normalized": false,
      "special": false
    },

@hannesrudolph hannesrudolph moved this to To triage in Roo Code Roadmap Mar 5, 2025
@hannesrudolph hannesrudolph moved this from To triage to PR - Needs Approval in Roo Code Roadmap Mar 6, 2025
@mrubens mrubens moved this from PR [Unverified] to PR [Deferred] in Roo Code Roadmap Mar 10, 2025
@mrubens
Copy link
Collaborator

mrubens commented Mar 14, 2025

Pretty sure #1566 covers this?

@mrubens mrubens closed this Mar 14, 2025
@github-project-automation github-project-automation bot moved this from PR [Deferred] to Done in Roo Code Roadmap Mar 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants