SimpleChat: Simple histogram/repeatMatching driven garbageTrimming, Settings UI, Streaming mode, OpenAi Compat (Model, Authorization Bearer), Save/Restore session, Auto Settings UI by hanishkvc · Pull Request #7548 · ggml-org/llama.cpp

hanishkvc · 2024-05-26T22:13:59Z

garbage trimming

Given the limited context size of local LLMs and , many a times when context gets filled between the prompt and the response, it can lead to repeating text garbage generation. And many a times setting penalty wrt repeatation leads to over-intelligent garbage repeatation with slight variations. These garbage inturn leads to overloading of the available model context, leading to less valuable response for subsequent prompts/queries, if chat history is sent to ai model.

So two simple minded garbage trimming logics are tried.

one based on progressively-larger-substring-based-repeat-matching-with-partial-skip and
another based on char-histogram/freq-driven garbage trimming.

The char-histogram driven one is a bit more flexible in that it allows for some variations in the repeatation. It tracks the chars and their frequency in a specified length of substring at the end of the generated text and inturn checks if moving further into the generated text from the end remains within the same char subset or goes beyond it and based on that either trims the string at the end or not. This allows to filter garbage at the end, including even if there are certain kind of small variations in the repeated text wrt position of seen chars.

The repeat-matching based trimming can be let loose on longer substring based probing, given its more well defined characteristics.

settings ui

a simple ui is added to change some of the behaviour, without needing to open the browser's devel-tool/console.

Setting the option ChatHistoryInCtxt to Last0, stops any chat history from being sent to the server/ai-model, thus ensuring that the response is purely based on the set system-prompt and the current-query.

Keeping ChatHistoryInCtxt to the default "Last1" mode and bTrimGarbage to the default enabled state, can many a times allow the user to recover/continue the previous large response with garbage at the end, from the part where the garbage is automatically removed, by requesting the ai to continue the last response like "please continue" or so.

Streaming mode

Allow user to set between oneshot (at the end) and streamed viewing (as it is being generated) of ai-model generative text response. The streaming mode pushes more packets over the network, but at the same time it allows one to view the response, as it is being generated. For long responses, this allows the user to view the response as it is becoming available, instead of having to wait till the end of response generation.

OpenAi Compat

The basic skeleton is implement to chat with a openai/equivalent (including llama.cpp's) server, at a basic level.

Save/Restore

Auto saves chat session locally using browser's localStorage, as the chat is occuring. Inturn on a fresh start, the option is given to restore a previously saved corresponding chat session.

ChatRequestOptions auto Settings UI

String/Numeric fields (including any added by user at runtime) in gMe.chatRequestOptions will get entries in Settings UI automatically.

cleanup/structure

Move to a multi file based js code structure, so that some of the helpers can be moved into their own files. Also bring in bit more of the request and response handling into SimpleChat class itself.

hanishkvc · 2024-05-28T05:49:23Z

@mofosyne

Streaming support has been added now, so that the end user can view the ai-model response as it is being generated, instead of having to wait till the end of the generation. User can toggle between the stream and oneshot mode, in settings.

As part of same, the code has been cleaned up and structured to match the flow better. The helpers for handling the server response has been moved into SimpleChat class itself.

mofosyne · 2024-05-28T10:39:22Z

Ah, good to hear. Is this still in draft?

hanishkvc · 2024-05-28T14:08:52Z

Ah, good to hear. Is this still in draft?

The current commits was a initial check wrt streaming, which seems to work sufficiently enough, as well as gave a reason to consolidate response handling into SimpleChat class itself to make things more cleaner from a overall structure perspective.

However I may cleanup the multipart/stream handling a bit, bcas I am not sure under all circumstences across all sw stacks (os+lib+browser) the data will come and bubble up as clean cut wrt maintaining line-data-block(s) boundary as what I seem to be seeing on my machine. As at core I am not a web developer and rather keep jumping btw Sys / Hw / Sw (asm-highlevel, preboot-apps), so I dont have enough data wrt how well behaved the different JS implementations are across different browsers and platforms, but can think of enough reasons why the clean boundries may not be maintained always, so may decouple that aspect a bit (in some ways similar to ChatParts in my ChatOn chat-templating PR).

hanishkvc · 2024-05-29T18:10:46Z

Make streaming mode data line handling more robust.

Added support for changing the server base url including port, openai/equivalent compat fields (model, authorization bearer) in settings ui, and equivalent enabling logic.

hanishkvc · 2024-05-30T00:54:38Z

Very long text generation, can result in no user interaction and machine going into power saving mode, which can result in network connection being stopped in some platforms and the raising of corresponding exception. Now the logic will trap the same in streaming mode, so that text generated till that moment is not lost.

hanishkvc · 2024-05-30T19:37:51Z

Add support for save and restore wrt chat session across browsing sessions, using browser localStorage.

mofosyne · 2024-05-30T23:58:47Z

@hanishkvc ah editor config caught some trailing whitespace you should delete

hanishkvc · 2024-05-31T09:36:43Z

Hi @mofosyne

Bit surprised with that editorconfig failure. It wasnt clear which readme file had the issue and the line number. I cross checked both the readmes' which I had modified. I didnt notice any issue.

Also created a simple script to auto check all files in this PR, it also didnt flag any ' ' or '\t' at end of the lines.

So I have rebased the code to the latest upstream/master (ggerganov/llama.cpp) and force pushed for now.

mofosyne · 2024-05-31T11:54:13Z

@hanishkvc if you are getting strange stuff like that readme.md editorconfig error that is not from stuff you edited, don't forget to check the main branch. I'm guessing the [no ci] policy we recently encouraged is letting documentation lints slip though. So I'll stop promoting that policy and see if there is at least a way to reduce the time taken for documentation changes only commits to be processed.

FYI: the latest branch seems to be having failing CIs, so keep going as it might not be your side. I'll make an assessment when ready if it's a CI issue on your side or master.

hanishkvc · 2024-05-31T13:09:43Z

Hi @mofosyne @ngxson @ggerganov

There seems to be some bug with the way things are setup wrt CI testing and server.

The examples/server folders deps.sh updates the public/index.js from the net, while at the same time the index.js which is bundled by default is commited part of the git repo.

And as of now, potentially one or more of the js modules downloaded by deps.sh has got updated on the net, so the CI test is failing, because the one it creates wont match what is commited to git

So potentially either the index.js commited needs to be updated, or better still this CI test related full structure itself needs to be modified, because just updating index.js from net and commiting it into git repo, wont stop from same issue occuring in future, when any of the js modules gets updated on net, which is outside this project's control

mofosyne · 2024-05-31T13:40:39Z

@hanishkvc ah if you are talking about js ci issues, gg #7670 thinks its something upstream. He's considering this PR to fix it.

hanishkvc · 2024-06-01T11:48:04Z

Hi @mofosyne, now that the issue with ci is fixed, those 4 failing ci tests should pass. This is ready for merging.

mofosyne · 2024-06-01T11:52:14Z

rerunning "Server / server (ADDRESS, RelWithDebInfo) (pull_request_target) In progress — This check has started..." test as it failed, just in case it's something else. Error doesn't make sense. If failed again, try rebasing against last known functional CI commit in main branch (a323ec6)

mofosyne · 2024-06-01T12:26:30Z

Ah, I see it's failing in the same position @hanishkvc and is actually the same error that gg corrected into the main branch already. Please rebase against a323ec6

Use it to bring in a simple trim garbage at end logic, which is used to trim received response. Also given that importmap assumes esm / standard js modules, so also global variables arent implicitly available outside the modules. So add it has a member of document for now

This ensures that if the user is running the server with a different port or wants to try connect to server on a different machine, then this can be used.

Inturn allow Authorization to be sent, if not empty.

use it to set placeholder wrt Authorization header. Also fix copy-paste oversight.

May help testing with openai/equivalent web services, if they require this field.

This can help ensure that data fetched till that point, can be made use of, rather than losing it. On some platforms, the time taken wrt generating a long response, may lead to the network connection being broken when it enters some user-no-interaction related power saving mode.

When the response handling was moved into SimpleChat, I had changed a flow bit unnecessarily and carelessly, which resulted in the non trim flow, missing out on retaining the ai assistant response. This has been fixed now.

This ensures that throwing the caught exception again for higher up logic, doesnt lose the response collated till that time. Go through theResp.assistant in catch block, just to keep simple consistency wrt backtracing just in case. Update the readme file.

This allows the settings ui to be cleaner structured.

Convert SystemPrompt into a textarea with 2 rows. Reduce user-input-textarea to 2 rows from 3, so that overall vertical space usage remains same. Shorten usage messages a bit, cleanup to sync with settings ui.

Inturn when ever a chat message (system/user/model) is added, the chat will be saved into browser's localStorage.

This is a temporary flow

This also allows being able to set the common system prompt ui element to loaded chat's system prompt.

…ettings UI, Streaming mode, OpenAi Compat (Model, Authorization Bearer), Save/Restore session, Auto Settings UI (ggml-org#7548) * SimpleChat:DU:BringIn local helper js modules using importmap Use it to bring in a simple trim garbage at end logic, which is used to trim received response. Also given that importmap assumes esm / standard js modules, so also global variables arent implicitly available outside the modules. So add it has a member of document for now * SimpleChat:DU: Add trim garbage at end in loop helper * SimpleChat:DU:TrimGarbage if unable try skip char and retry * SimpleChat:DU: Try trim using histogram based info TODO: May have to add max number of uniq chars in histogram at end of learning phase. * SimpleChat:DU: Switch trim garbage hist based to maxUniq simple Instead of blindly building histogram for specified substring length, and then checking if any new char within specified min garbage length limit, NOW exit learn state when specified maxUniq chars are found. Inturn there should be no new chars with in the specified min garbage length required limit. TODO: Need to track char classes like alphabets, numerals and special/other chars. * SimpleChat:DU: Bring in maxType to the mix along with maxUniq Allow for more uniq chars, but then ensure that a given type of char ie numerals or alphabets or other types dont cross the specified maxType limit. This allows intermixed text garbage to be identified and trimmed. * SimpleChat:DU: Cleanup debug log messages * SimpleChat:UI: Move html ui base helpers into its own module * SimpleChat:DU:Avoid setting frequence/Presence penalty Some models like llama3 found to try to be over intelligent by repeating garbage still, but by tweaking the garbage a bit so that it is not exactly same. So avoid setting these penalties and let the model's default behaviour work out, as is. Also the simple minded histogram based garbage trimming from end, works to an extent, when the garbage is more predictable and repeatative. * SimpleChat:UI: Add and use a para-create-append helper Also update the config params dump to indicate that now one needs to use document to get hold of gMe global object, this is bcas of moving to module type js. Also add ui.mjs to importmap * SimpleChat:UI: Helper to create bool button and use it wrt settings * SimpleChat:UI: Add Select helper and use it wrt ChatHistoryInCtxt * SimpleChat:UI:Select: dict-name-value, value wrt default, change Take a dict/object of name-value pairs instead of just names. Inturn specify the actual value wrt default, rather than the string representing that value. Trap the needed change event rather than click wrt select. * SimpleChat:UI: Add Div wrapped label+element helpers Move settings related elements to use the new div wrapped ones. * SimpleChat:UI:Add settings button and bring in settings ui * SimpleChat:UI:Settings make boolean button text show meaning * SimpleChat: Update a bit wrt readme and notes in du * SimpleChat: GarbageTrim enable/disable, show trimmed part ifany * SimpleChat: highlight trim, garbage trimming bitmore aggressive Make it easy for end user to identified the trimmed text. Make garbage trimming logic, consider a longer repeat garbage substring. * SimpleChat: Cleanup a bit wrt Api end point related flow Consolidate many of the Api end point related basic meta data into ApiEP class. Remove the hardcoded ApiEP/Mode settings from html+js, instead use the generic select helper logic, inturn in the settings block. Move helper to generate the appropriate request json string based on ApiEP into SimpleChat class itself. * SimpleChat:Move extracting assistant response to SimpleChat class so also the trimming of garbage. * SimpleChat:DU: Bring in both trim garbage logics to try trim * SimpleChat: Cleanup readme a bit, add one more chathistory length * SimpleChat:Stream:Initial handshake skeleton Parse the got stream responses and try extract the data from it. It allows for a part read to get a single data line or multiple data line. Inturn extract the json body and inturn the delta content/message in it. * SimpleChat: Move handling oneshot mode server response Move handling of the oneshot mode server response into SimpleChat. Also add plumbing for moving multipart server response into same. * SimpleChat: Move multi part server response handling in * SimpleChat: Add MultiPart Response handling, common trimming Add logic to call into multipart/stream server response handling. Move trimming of garbage at the end into the common handle_response helper. Add new global flag to control between oneshot and multipart/stream mode of fetching response. Allow same to be controlled by user. If in multipart/stream mode, send the stream flag to the server. * SimpleChat: show streamed generative text as it becomes available Now that the extracting of streamed generated text is implemented, add logic to show the same on the screen. * SimpleChat:DU: Add NewLines helper class To work with an array of new lines. Allow adding, appending, shifting, ... * SimpleChat:DU: Make NewLines shift more robust and flexible * SimpleChat:HandleResponseMultiPart using NewLines helper Make handle_response_multipart logic better and cleaner. Now it allows for working with the situation, where the delta data line got from server in stream mode, could be split up when recving, but still the logic will handle it appropriately. ALERT: Rather except (for now) for last data line wrt a request's response. * SimpleChat: Disable console debug by default by making it dummy Parallely save a reference to the original func. * SimpleChat:MultiPart/Stream flow cleanup Dont try utf8-decode and newlines-add_append if no data to work on. If there is no more data to get (ie done is set), then let NewLines instance return line without newline at end, So that we dont miss out on any last-data-line without newline kind of scenario. Pass stream flag wrt utf-8 decode, so that if any multi-byte char is only partly present in the passed buffer, it can be accounted for along with subsequent buffer. At sametime, bcas of utf-8's characteristics there shouldnt be any unaccounted bytes at end, for valid block of utf8 data split across chunks, so not bothering calling with stream set to false at end. LATER: Look at TextDecoder's implementation, for any over intelligence, it may be doing.. If needed, one can use done flag to account wrt both cases. * SimpleChat: Move baseUrl to Me and inturn gMe This should allow easy updating of the base url at runtime by the end user. * SimpleChat:UI: Add input element helper * SimpleChat: Add support for changing the base url This ensures that if the user is running the server with a different port or wants to try connect to server on a different machine, then this can be used. * SimpleChat: Move request headers into Me and gMe Inturn allow Authorization to be sent, if not empty. * SimpleChat: Rather need to use append to insert headers * SimpleChat: Allow Authorization header to be set by end user * SimpleChat:UI+: Return div and element wrt creatediv helpers use it to set placeholder wrt Authorization header. Also fix copy-paste oversight. * SimpleChat: readme wrt authorization, maybe minimal openai testing * SimpleChat: model request field for openai/equivalent compat May help testing with openai/equivalent web services, if they require this field. * SimpleChat: readme stream-utf-8 trim-english deps, exception2error * Readme: Add a entry for simplechat in the http server section * SimpleChat:WIP:Collate internally, Stream mode Trap exceptions This can help ensure that data fetched till that point, can be made use of, rather than losing it. On some platforms, the time taken wrt generating a long response, may lead to the network connection being broken when it enters some user-no-interaction related power saving mode. * SimpleChat:theResp-origMsg: Undo a prev change to fix non trim When the response handling was moved into SimpleChat, I had changed a flow bit unnecessarily and carelessly, which resulted in the non trim flow, missing out on retaining the ai assistant response. This has been fixed now. * SimpleChat: Save message internally in handle_response itself This ensures that throwing the caught exception again for higher up logic, doesnt lose the response collated till that time. Go through theResp.assistant in catch block, just to keep simple consistency wrt backtracing just in case. Update the readme file. * SimpleChat:Cleanup: Add spacing wrt shown req-options * SimpleChat:UI: CreateDiv Divs map to GridX2 class This allows the settings ui to be cleaner structured. * SimpleChat: Show Non SettingsUI config field by default * SimpleChat: Allow for multiline system prompt Convert SystemPrompt into a textarea with 2 rows. Reduce user-input-textarea to 2 rows from 3, so that overall vertical space usage remains same. Shorten usage messages a bit, cleanup to sync with settings ui. * SimpleChat: Add basic skeleton for saving and loading chat Inturn when ever a chat message (system/user/model) is added, the chat will be saved into browser's localStorage. * SimpleChat:ODS: Add a prefix to chatid wrt ondiskstorage key * SimpleChat:ODS:WIP:TMP: Add UI to load previously saved chat This is a temporary flow * SimpleChat:ODS:Move restore/load saved chat btn setup to Me This also allows being able to set the common system prompt ui element to loaded chat's system prompt. * SimpleChat:Readme updated wrt save and restore chat session info * SimpleChat:Show chat session restore button, only if saved session * SimpleChat: AutoCreate ChatRequestOptions settings to an extent * SimpleChat: Update main README wrt usage with server

github-actions Bot added examples server labels May 26, 2024

hanishkvc marked this pull request as draft May 26, 2024 22:18

hanishkvc mentioned this pull request May 26, 2024

SimpleChat Completion Mode flexibility and cleanup, Settings gMe, Optional sliding window #7480

Merged

mofosyne added the Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level label May 27, 2024

hanishkvc changed the title ~~SimpleChat: Simple histogram/freq driven garbage trimming, Settings UI~~ SimpleChat: Simple histogram and repeat matching driven garbage trimming, Settings UI May 27, 2024

mofosyne approved these changes May 28, 2024

View reviewed changes

hanishkvc changed the title ~~SimpleChat: Simple histogram and repeat matching driven garbage trimming, Settings UI~~ SimpleChat: Simple histogram and repeat matching driven garbage trimming, Settings UI, Streaming mode support May 28, 2024

hanishkvc marked this pull request as ready for review May 30, 2024 19:38

hanishkvc force-pushed the hkvc_webfrontend_simplechat_v3 branch from 47214ea to 48fc60e Compare May 31, 2024 09:26

hanishkvc added 3 commits June 1, 2024 18:18

SimpleChat:DU: Add trim garbage at end in loop helper

54802dc

SimpleChat:DU:TrimGarbage if unable try skip char and retry

6390f34

hanishkvc added 25 commits June 1, 2024 18:18

SimpleChat:UI: Add input element helper

ebf978d

SimpleChat: Add support for changing the base url

f54e000

This ensures that if the user is running the server with a different port or wants to try connect to server on a different machine, then this can be used.

SimpleChat: Move request headers into Me and gMe

dce4e6a

Inturn allow Authorization to be sent, if not empty.

SimpleChat: Rather need to use append to insert headers

c9559d2

SimpleChat: Allow Authorization header to be set by end user

af342b3

SimpleChat:UI+: Return div and element wrt creatediv helpers

7a0399e

use it to set placeholder wrt Authorization header. Also fix copy-paste oversight.

SimpleChat: readme wrt authorization, maybe minimal openai testing

85fd2d0

SimpleChat: model request field for openai/equivalent compat

0e7880a

May help testing with openai/equivalent web services, if they require this field.

SimpleChat: readme stream-utf-8 trim-english deps, exception2error

48f02e0

Readme: Add a entry for simplechat in the http server section

009563d

SimpleChat:theResp-origMsg: Undo a prev change to fix non trim

cdb4f6d

When the response handling was moved into SimpleChat, I had changed a flow bit unnecessarily and carelessly, which resulted in the non trim flow, missing out on retaining the ai assistant response. This has been fixed now.

SimpleChat:Cleanup: Add spacing wrt shown req-options

ec79b8d

SimpleChat:UI: CreateDiv Divs map to GridX2 class

803ee72

This allows the settings ui to be cleaner structured.

SimpleChat: Show Non SettingsUI config field by default

3d925cb

SimpleChat: Allow for multiline system prompt

1d7739b

Convert SystemPrompt into a textarea with 2 rows. Reduce user-input-textarea to 2 rows from 3, so that overall vertical space usage remains same. Shorten usage messages a bit, cleanup to sync with settings ui.

SimpleChat: Add basic skeleton for saving and loading chat

e2efcb4

Inturn when ever a chat message (system/user/model) is added, the chat will be saved into browser's localStorage.

SimpleChat:ODS: Add a prefix to chatid wrt ondiskstorage key

a15d4dc

SimpleChat:ODS:WIP:TMP: Add UI to load previously saved chat

5d40866

This is a temporary flow

SimpleChat:ODS:Move restore/load saved chat btn setup to Me

4abcfde

This also allows being able to set the common system prompt ui element to loaded chat's system prompt.

SimpleChat:Readme updated wrt save and restore chat session info

6ef57cc

SimpleChat:Show chat session restore button, only if saved session

bc68803

SimpleChat: AutoCreate ChatRequestOptions settings to an extent

bb0f0c8

SimpleChat: Update main README wrt usage with server

c4141a5

hanishkvc force-pushed the hkvc_webfrontend_simplechat_v3 branch from 5831ab5 to c4141a5 Compare June 1, 2024 12:48

mofosyne merged commit 2ac95c9 into ggml-org:master Jun 1, 2024

hanishkvc mentioned this pull request Jun 1, 2024

Server UI Improvement - New Try #7633

Merged

Conversation

hanishkvc commented May 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

garbage trimming

settings ui

Streaming mode

OpenAi Compat

Save/Restore

ChatRequestOptions auto Settings UI

cleanup/structure

Uh oh!

hanishkvc commented May 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mofosyne commented May 28, 2024

Uh oh!

hanishkvc commented May 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hanishkvc commented May 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hanishkvc commented May 30, 2024

Uh oh!

hanishkvc commented May 30, 2024

Uh oh!

mofosyne commented May 30, 2024

Uh oh!

hanishkvc commented May 31, 2024

Uh oh!

mofosyne commented May 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hanishkvc commented May 31, 2024

Uh oh!

mofosyne commented May 31, 2024

Uh oh!

hanishkvc commented Jun 1, 2024

Uh oh!

mofosyne commented Jun 1, 2024

Uh oh!

mofosyne commented Jun 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hanishkvc commented May 26, 2024 •

edited

Loading

hanishkvc commented May 28, 2024 •

edited

Loading

hanishkvc commented May 28, 2024 •

edited

Loading

hanishkvc commented May 29, 2024 •

edited

Loading

mofosyne commented May 31, 2024 •

edited

Loading

mofosyne commented Jun 1, 2024 •

edited

Loading