Skip to content

Request: Stop generating at new line #38

@Enferlain

Description

@Enferlain

I've been trying to use koboldcpp with a 200 token limit, and I've noticed that every model defaults back to generating conversations with itself to fill the set limit, even when I have multiline responses disabled. It doesn't stop the generation, it only hides them from the ui, meaning I still have to wait through the entire imaginary conversation, and if the first line is only a few words, I only receive that output even if the wait time was like a minute, in addition to having to process the prompt that's like 1000-2000 tokens in my case every time, which results in huge wait times.

I think it would be beneficial if the multiline replies option stopped the generation altogether instead of just hiding it, but not sure if that's possible so I figured I'd ask about it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions