server : display token probabilities in the UI by jhen0409 · Pull Request #2489 · ggml-org/llama.cpp

jhen0409 · 2023-08-02T08:58:12Z

This is a simple implementation for probabilities of llama response.

It renders a popover for each token. The popover is based on preact-portal, it's short so I make some modifications and copy that into index.html.

Dark mode:

Light mode:

~~For bytes, I just add a bottom border line to split them: (https://github.com/ggerganov/llama.cpp/assets/3001525/ad92444e-58cc-445a-b8a9-44704236e285)~~

(Screenshots updated after 04b6f2c)

We can set More options -> Show Probabilities to use n_probs param.

ghost · 2023-08-02T14:43:47Z

Ah, this is fun. It works as expected on Android too.

jubruckne · 2023-08-04T08:25:01Z

Very nice! But maybe you could round the probabilities to 2 decimals or so?

jhen0409 · 2023-08-04T21:18:59Z

Very nice! But maybe you could round the probabilities to 2 decimals or so?

Sure, I'll do it later. Thanks! Maybe just change the numbers to percentage.

staviq · 2023-08-05T04:24:04Z

I just had the craziest idea.

I'm not requesting anything, just wondering about the viability of this idea.

Do you thing this could be used to create spelling suggestions while typing ? Like on Android keyboard ?

SlyEcho · 2023-08-07T06:30:22Z

If it was percentages, it would be cool.

The byte thing should be fixed in the API level, but at the time we added the probabilities, we couldn't figure it out.

jhen0409 · 2023-08-07T21:37:49Z

The byte thing should be fixed in the API level, but at the time we added the probabilities, we couldn't figure it out.

I have read PR #1962, and I'm a bit confused about this, shouldn't we improve it by convert bytes in the UI side?

I'm thinking maybe we can do the merge for bytes to get a readable result (also helpful for Chinese or other language users), but I'm not sure if it will have other problems.

jhen0409 · 2023-08-12T04:09:52Z

I'm thinking maybe we can do the merge for bytes to get a readable result (also helpful for Chinese or other language users), but I'm not sure if it will have other problems.

I've confirmed the another bytes pair is not able to decode successfully, so I just hide that. I see that Open AI playground also doing the same thing.

SlyEcho · 2023-08-12T19:36:39Z

There is some strange thing that happens sometimes: the probabilities disappear from the result.

jhen0409 · 2023-08-12T23:10:22Z

There is some strange thing that happens sometimes: the probabilities disappear from the result.

[image]

Not very sure why it happened, maybe the completion_probabilities of some partial responses is not an array, but as I know in the server.cpp, it should have ensured that it is an array.

I just removed the array check of completion_probabilities for messages, only check params.n_probs > 0 for that, it should be avoid this problem.

ghost · 2023-08-20T14:55:29Z

Still working as expected on Android! ♥️

Edit: Using Iceraven(Firefox)

jhen0409 · 2023-08-21T05:34:49Z

examples/server/server.cpp

+                    {
+                        // Always send partial response
+                        // so we can get the correct partial response of the last to_send in the client
+                        const json data = format_partial_response(llama, to_send, probs_output);


I also made the last to_send have a partial response, so we can correctly get the probabilities of last message. (the final response included all probabilities).

ghost · 2023-08-21T10:25:56Z

The problem should be fixed now, I was mistake for completion_probabilities usage, so previously it cannot show probabilities for this partial response:

Thank you for the fixes, my testing shows it's working.
For my own understanding, when you refer to a "partial response", do you mean how the tokenizer peices words together? i.e. with llama!, the tokenizer chops it in parts, is that what's meany by partial response?

Thank you.

slaren · 2023-08-21T11:30:57Z

I tested the same generation again, but now there is a missing token (ll):

The probs are still there, but hidden in the space:

jhen0409 · 2023-08-22T00:13:47Z

The problem should be fixed now, I was mistake for completion_probabilities usage, so previously it cannot show probabilities for this partial response:

Thank you for the fixes, my testing shows it's working. For my own understanding, when you refer to a "partial response", do you mean how the tokenizer peices words together? i.e. with llama!, the tokenizer chops it in parts, is that what's meany by partial response?

Thank you.

[screenshot]

Generally, the partial response is included a single llama_eval return value and send via event stream, so it will be an one token. Multibytes (like emoji) is also processed in server.cpp, so it will include 2~4 tokens in a partial response.

But things like llama probably shouldn't be a partial response, and I think I found the reason, you can see my next comment.

jhen0409 · 2023-08-22T00:21:16Z

I tested the same generation again, but now there is a missing token (ll): [image]

The probs are still there, but hidden in the space: [image]

Confirmed it was a problem from find_partial_stop_string in server.cpp, tokens like L or l are incorrectly considered as a partial stop word.

Log

token_text:  L
pos: 11
stop_pos: 18446744073709551615
final stop_pos: 1 (wrong)
to_send:  (empty)

token_text: l
pos: 12
stop_pos: 18446744073709551615
final stop_pos: 0 (wrong)
to_send: (empty)

token_text: ama
pos: 12
stop_pos: 18446744073709551615
final stop_pos: 0 (wrong)
to_send: (empty)

I'll fix this later.

UPDATED: The fix is here but it was problem with sent_token_probs_index, the above log is expected as we need to wait for possible stop words.

jhen0409 · 2023-08-22T01:22:54Z

examples/server/server.cpp

-                    const std::string to_send = llama.generated_text.substr(pos, stop_pos);
+                    const std::string to_send = stop_pos == std::string::npos
+                        ? llama.generated_text.substr(pos, std::string::npos)
+                        : ""; // just don't send anything if we're not done
+


Before this fix, the to_send is a whitespace when it gets stop_pos 1 from L, then the sent_token_probs_index will be incorrect.

jhen0409 · 2023-08-22T01:35:45Z

Also, I merged the master branch, so need to use GGUF models for testing here.

slaren · 2023-08-24T22:56:30Z

I noticed that each interaction has increasingly more vertical space. Also happens without token probabilities.

jhen0409 · 2023-08-24T23:08:27Z

I noticed that each interaction has increasingly more vertical space. Also happens without token probabilities.

[image]

I got the same in master, in this case the model responds content like \n\nUser: and we cut the antiprompt User:. It should be easy to improve.

jhen0409 · 2023-08-25T10:32:32Z

After fixing the newline issue, I think this can be merged. Thank you guys!

IgnacioFDM · 2023-08-25T13:20:57Z

Nice one. What I'd like to have in the future is a notebook mode (so basic completion instead of chat). Do you have any plans for that? I could maybe it hack it together a few weeks from now when I'm a bit less busy.

jhen0409 · 2023-08-26T00:45:54Z

Nice one. What I'd like to have in the future is a notebook mode (so basic completion instead of chat). Do you have any plans for that? I could maybe it hack it together a few weeks from now when I'm a bit less busy.

I'm also thinking about to have a pure text completion in web UI, but the plans is not very clear. Currently I'm using the vim plugin for that, but the web UI may be could provide more visual capabilities. It's a low priority for me, but interesting.

jhen0409 added 10 commits August 2, 2023 15:33

server : add n_probs param in chat UI

7862118

server : keep message data array & show in probabilites component

6a8b9c2

server : add simple popover component

2a5bab4

server : fix completion_probabilities undefined if not set n_probs

b9b6cd2

server : implement Probabilites

d37be8d

server : handle bytes

7f02fea

server : make n_probs max to 10 for easy scroll

368c41c

server : adjust for dark/light mode

7502997

server : Fix regenerated prompt

cc1ae32

server : update index.html.hpp

48bea64

jhen0409 added 2 commits August 5, 2023 07:10

Merge branch 'master' into server-probs

3e1e86d

server : convert prob to percentage + show original value as div title

04b6f2c

server : fix Probabilites not used if included empty str

236c838

server : skip byte pair in display probabilites

3409735

server : remove array check of completion_probabilities in messages

bffd3cd

jhen0409 added 2 commits August 14, 2023 17:13

Merge branch 'master' into server-probs

c2c1690

Merge branch 'master' into server-probs

006e74a

ggerganov added high priority Very important issue 🦙. labels Aug 17, 2023

Merge branch 'master' into server-probs

5c6aee6

jhen0409 commented Aug 21, 2023

View reviewed changes

fix content of format_final_response

af1ea58

jhen0409 added 2 commits August 22, 2023 06:41

refactor probs render & make pColor transparent if not found

91b8be0

Merge branch 'master' into server-probs

aad8ef4

jhen0409 added 2 commits August 22, 2023 09:18

Merge branch 'master' into server-probs

de8cd11

send empty string when got stop_pos in partial

e1911ac

jhen0409 commented Aug 22, 2023

View reviewed changes

jhen0409 added 5 commits August 23, 2023 09:26

Merge branch 'master' into server-probs

ec4a19c

Merge branch 'master' into server-probs

3fc1127

avoid unnecessary empty data event & send rest of partial tokens on stop

3f436ea

Merge branch 'master' into server-probs

db29d68

use <br /> for new line

343be7f

jhen0409 added 2 commits August 25, 2023 08:54

skip -1 tok in loop to avoid send '' on end

8cec440

trim last new lines on stop

607758a

jhen0409 force-pushed the server-probs branch from 094134f to 607758a Compare August 25, 2023 00:55

revert unnecessary change

8f86eb9

jhen0409 force-pushed the server-probs branch from 4f5683a to 8f86eb9 Compare August 25, 2023 00:56

jhen0409 merged commit 29674ab into ggml-org:master Aug 25, 2023

jhen0409 deleted the server-probs branch August 25, 2023 10:32

Conversation

jhen0409 commented Aug 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost commented Aug 2, 2023

Uh oh!

jubruckne commented Aug 4, 2023

Uh oh!

jhen0409 commented Aug 4, 2023

Uh oh!

staviq commented Aug 5, 2023

Uh oh!

SlyEcho commented Aug 7, 2023

Uh oh!

jhen0409 commented Aug 7, 2023

Uh oh!

jhen0409 commented Aug 12, 2023

Uh oh!

SlyEcho commented Aug 12, 2023

Uh oh!

jhen0409 commented Aug 12, 2023

Uh oh!

ghost commented Aug 20, 2023 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhen0409 Aug 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ghost commented Aug 21, 2023

Uh oh!

slaren commented Aug 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhen0409 commented Aug 22, 2023

Uh oh!

jhen0409 commented Aug 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhen0409 Aug 22, 2023

Choose a reason for hiding this comment

Uh oh!

jhen0409 commented Aug 22, 2023

Uh oh!

slaren commented Aug 24, 2023

Uh oh!

jhen0409 commented Aug 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhen0409 commented Aug 25, 2023

Uh oh!

IgnacioFDM commented Aug 25, 2023

Uh oh!

jhen0409 commented Aug 26, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

jhen0409 commented Aug 2, 2023 •

edited

Loading

ghost commented Aug 20, 2023 •

edited by ghost

Loading

jhen0409 Aug 21, 2023 •

edited

Loading

slaren commented Aug 21, 2023 •

edited

Loading

jhen0409 commented Aug 22, 2023 •

edited

Loading

jhen0409 commented Aug 24, 2023 •

edited

Loading