Request: Stop generating at new line

I've been trying to use koboldcpp with a 200 token limit, and I've noticed that every model defaults back to generating conversations with itself to fill the set limit, even when I have multiline responses disabled. It doesn't stop the generation, it only hides them from the ui, meaning I still have to wait through the entire imaginary conversation, and if the first line is only a few words, I only receive that output even if the wait time was like a minute, in addition to having to process the prompt that's like 1000-2000 tokens in my case every time, which results in huge wait times. 

I think it would be beneficial if the multiline replies option stopped the generation altogether instead of just hiding it, but not sure if that's possible so I figured I'd ask about it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request: Stop generating at new line #38

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Request: Stop generating at new line #38

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions