Skip to content

[Feature request] Ability to cache context between runs for faster initial generation of the same history (after app restart) #445

@aleksusklim

Description

@aleksusklim

I mean, the context of "current" generation, that is super-fast for regeneration of latest action, and for minor editing of history.

I propose adding an option/param, that points to local binary file. If existed, koboldcpp should read the context from there.
While this option is active, koboldcpp should update this file with each new context after generation. (Read once at start, rewrite/append during runtime).

I can name two main reasons, why this will be extremely useful:

  • I have a long initial context (personality) in my template, that I want to play from. But for larger models (most notably Llama 2 70B) it takes for too long to digest it! After that, I can just cut the history to start the new story – and that would be instantly. Only the initial loading suffers. I believe, even if the cache-context file would be updated after each turn – that worth the speed for the fast start.
  • Sometimes, my computer crashes for various reasons while I'm playing with large history turn-by-turn. Each turn by itself is fast, because the context is cached. But after a crash – the browser remembers everything from history, but the first generation will be REALLY slow, much slower than my normal gameplay speed (sometimes taking up to 30 minutes if I was using a heavy model and my context was almost at its limit).

Yes, I understand that any accidental move – and I can easily destroy the cache (loading wrong history, adding a space from the start of text, etc.) which would ultimately lead to full regeneration. But! If I would play locally and alone, that would be only my own fault. Avoiding that, I can restart my system anytime, and continue playing instantly later.

Also, for use-case about big story templates, you might give an additional option for this context to be read-only, as in ggml-org#1640

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions