/home/johannesg/Projects/llama.cpp [git::master *] [johannesg@johannes-pc] [11:32]
> ./main --model models/llama-33b-ggml-q4_0.bin --ignore-eos --n_predict 16 --ctx_size 2048 --batch_size 512 --threads 6 --seed 1337 --file navy_seals_copypasta.txt | tee chat.txt
main: build = 514 (173d0e6)
main: seed = 1337
llama.cpp: loading model from models/llama-33b-ggml-q4_0.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 2048
llama_model_load_internal: n_embd = 6656
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 52
llama_model_load_internal: n_layer = 60
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 17920
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 30B
llama_model_load_internal: ggml ctx size = 127.27 KB
llama_model_load_internal: mem required = 21695.48 MB (+ 3124.00 MB per state)
llama_init_from_file: kv self size = 3120.00 MB
system_info: n_threads = 6 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 2048, n_batch = 512, n_predict = 16, n_keep = 0
What the fuck did you just fucking say about me, you little bitch? I'll have you know I graduated top of my class in the Navy Seals, and I've been involved in numerous secret raids on Al-Quaeda, and I have over 300 confirmed kills. I am trained in gorilla warfare and I'm the top sniper in the entire US armed forces. You are nothing to me but just another target. I will wipe you the fuck out with precision the likes of which has never been seen before on this Earth, mark my fucking words. You think you can get away with saying that shit to me over the Internet? Think again, fucker. As we speak I am contacting my secret network of spies across the USA and your IP is being traced right now so you better prepare for the storm, maggot. The storm that wipes out the pathetic little thing you call your life. You're fucking dead, kid. I can be anywhere, anytime, and I can kill you in over seven hundred ways, and that's just with my bare hands. Not only am I extensively trained in unarmed combat, but I have access to the entire arsenal of the United States Marine Corps and I will use it to its full extent to wipe your miserable ass off the face of the continent, you little shit. If only you could have known what unholy retribution your little "clever" comment was about to bring down upon you, maybe you would have held your fucking tongue. But you couldn't, you didn't, and now you're paying the price, you goddamn idiot. I will shit fury all over you and you will drown in it. You're fucking dead, kiddo.
Labels: 4chan, epic win, fail, fun
llama_print_timings: load time = 19322.96 ms
nyllama_print_timings: sample time = 9.39 ms / 16 runs ( 0.59 ms per run)
llama_print_timings: prompt eval time = 17365.60 ms / 399 tokens ( 43.52 ms per token)
llama_print_timings: eval time = 7815.47 ms / 15 runs ( 521.03 ms per run)
llama_print_timings: total time = 27151.10 ms
/home/johannesg/Projects/llama.cpp [git::master *] [johannesg@johannes-pc] [11:33]
> ./main --model models/llama-33b-ggml-q4_0.bin --ignore-eos --n_predict 16 --ctx_size 2048 --batch_size 512 --threads 6 --seed 1337 --file navy_seals_copypasta.txt | tee chat.txt
main: build = 514 (173d0e6)
main: seed = 1337
llama.cpp: loading model from models/llama-33b-ggml-q4_0.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 2048
llama_model_load_internal: n_embd = 6656
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 52
llama_model_load_internal: n_layer = 60
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 17920
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 30B
llama_model_load_internal: ggml ctx size = 127.27 KB
llama_model_load_internal: mem required = 21695.48 MB (+ 3124.00 MB per state)
llama_init_from_file: kv self size = 3120.00 MB
system_info: n_threads = 6 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 2048, n_batch = 512, n_predict = 16, n_keep = 0
What the fuck did you just fucking say about me, you little bitch? I'll have you know I graduated top of my class in the Navy Seals, and I've been involved in numerous secret raids on Al-Quaeda, and I have over 300 confirmed kills. I am trained in gorilla warfare and I'm the top sniper in the entire US armed forces. You are nothing to me but just another target. I will wipe you the fuck out with precision the likes of which has never been seen before on this Earth, mark my fucking words. You think you can get away with saying that shit to me over the Internet? Think again, fucker. As we speak I am contacting my secret network of spies across the USA and your IP is being traced right now so you better prepare for the storm, maggot. The storm that wipes out the pathetic little thing you call your life. You're fucking dead, kid. I can be anywhere, anytime, and I can kill you in over seven hundred ways, and that's just with my bare hands. Not only am I extensively trained in unarmed combat, but I have access to the entire arsenal of the United States Marine Corps and I will use it to its full extent to wipe your miserable ass off the face of the continent, you little shit. If only you could have known what unholy retribution your little "clever" comment was about to bring down upon you, maybe you would have held your fucking tongue. But you couldn't, you didn't, and now you're paying the price, you goddamn idiot. I will shit fury all over you and you will drown in it. You're fucking dead, kiddo.
Labels: 4chan, epic win, fail, fun
nyllama_print_timings: load time = 19352.40 ms
llama_print_timings: sample time = 9.50 ms / 16 runs ( 0.59 ms per run)
llama_print_timings: prompt eval time = 17379.04 ms / 399 tokens ( 43.56 ms per token)
llama_print_timings: eval time = 7831.54 ms / 15 runs ( 522.10 ms per run)
llama_print_timings: total time = 27196.73 ms
/home/johannesg/Projects/llama.cpp [git::master *] [johannesg@johannes-pc] [11:33]
> ./main --model models/llama-33b-ggml-q4_0.bin --ignore-eos --n_predict 16 --ctx_size 2048 --batch_size 512 --threads 6 --seed 1337 --file navy_seals_copypasta.txt | tee chat.txt
main: build = 514 (173d0e6)
main: seed = 1337
llama.cpp: loading model from models/llama-33b-ggml-q4_0.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 2048
llama_model_load_internal: n_embd = 6656
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 52
llama_model_load_internal: n_layer = 60
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 17920
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 30B
llama_model_load_internal: ggml ctx size = 127.27 KB
llama_model_load_internal: mem required = 21695.48 MB (+ 3124.00 MB per state)
llama_init_from_file: kv self size = 3120.00 MB
system_info: n_threads = 6 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 2048, n_batch = 512, n_predict = 16, n_keep = 0
What the fuck did you just fucking say about me, you little bitch? I'll have you know I graduated top of my class in the Navy Seals, and I've been involved in numerous secret raids on Al-Quaeda, and I have over 300 confirmed kills. I am trained in gorilla warfare and I'm the top sniper in the entire US armed forces. You are nothing to me but just another target. I will wipe you the fuck out with precision the likes of which has never been seen before on this Earth, mark my fucking words. You think you can get away with saying that shit to me over the Internet? Think again, fucker. As we speak I am contacting my secret network of spies across the USA and your IP is being traced right now so you better prepare for the storm, maggot. The storm that wipes out the pathetic little thing you call your life. You're fucking dead, kid. I can be anywhere, anytime, and I can kill you in over seven hundred ways, and that's just with my bare hands. Not only am I extensively trained in unarmed combat, but I have access to the entire arsenal of the United States Marine Corps and I will use it to its full extent to wipe your miserable ass off the face of the continent, you little shit. If only you could have known what unholy retribution your little "clever" comment was about to bring down upon you, maybe you would have held your fucking tongue. But you couldn't, you didn't, and now you're paying the price, you goddamn idiot. I will shit fury all over you and you will drown in it. You're fucking dead, kiddo.
(thing) by Kalkin Tue Jul 10
2llama_print_timings: load time = 19449.27 ms
llama_print_timings: sample time = 9.53 ms / 16 runs ( 0.60 ms per run)
llama_print_timings: prompt eval time = 17486.82 ms / 399 tokens ( 43.83 ms per token)
llama_print_timings: eval time = 7820.27 ms / 15 runs ( 521.35 ms per run)
llama_print_timings: total time = 27282.36 ms
/home/johannesg/Projects/llama.cpp [git::master *] [johannesg@johannes-pc] [11:34]
> ./main --model models/llama-33b-ggml-q4_0.bin --ignore-eos --n_predict 16 --ctx_size 2048 --batch_size 512 --threads 6 --seed 1337 --file navy_seals_copypasta.txt | tee chat.txt
main: build = 514 (173d0e6)
main: seed = 1337
llama.cpp: loading model from models/llama-33b-ggml-q4_0.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 2048
llama_model_load_internal: n_embd = 6656
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 52
llama_model_load_internal: n_layer = 60
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 17920
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 30B
llama_model_load_internal: ggml ctx size = 127.27 KB
llama_model_load_internal: mem required = 21695.48 MB (+ 3124.00 MB per state)
llama_init_from_file: kv self size = 3120.00 MB
system_info: n_threads = 6 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 2048, n_batch = 512, n_predict = 16, n_keep = 0
What the fuck did you just fucking say about me, you little bitch? I'll have you know I graduated top of my class in the Navy Seals, and I've been involved in numerous secret raids on Al-Quaeda, and I have over 300 confirmed kills. I am trained in gorilla warfare and I'm the top sniper in the entire US armed forces. You are nothing to me but just another target. I will wipe you the fuck out with precision the likes of which has never been seen before on this Earth, mark my fucking words. You think you can get away with saying that shit to me over the Internet? Think again, fucker. As we speak I am contacting my secret network of spies across the USA and your IP is being traced right now so you better prepare for the storm, maggot. The storm that wipes out the pathetic little thing you call your life. You're fucking dead, kid. I can be anywhere, anytime, and I can kill you in over seven hundred ways, and that's just with my bare hands. Not only am I extensively trained in unarmed combat, but I have access to the entire arsenal of the United States Marine Corps and I will use it to its full extent to wipe your miserable ass off the face of the continent, you little shit. If only you could have known what unholy retribution your little "clever" comment was about to bring down upon you, maybe you would have held your fucking tongue. But you couldn't, you didn't, and now you're paying the price, you goddamn idiot. I will shit fury all over you and you will drown in it. You're fucking dead, kiddo.
You think this is abuse? This is how I treat people who
sayllama_print_timings: load time = 19359.57 ms
llama_print_timings: sample time = 9.34 ms / 16 runs ( 0.58 ms per run)
llama_print_timings: prompt eval time = 17398.35 ms / 399 tokens ( 43.60 ms per token)
llama_print_timings: eval time = 7865.56 ms / 15 runs ( 524.37 ms per run)
llama_print_timings: total time = 27237.87 ms
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
When I set a seed and repeat a generation with the exact same parameters I expect to get the exact same text again.
Current Behavior
I re-run a generation with the same seed and parameters and the generated text is not always the same between generations. It is sometimes the same, but not always.
Environment and Context
Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.
Details
git commit: 173d0e6
Physical (or virtual) hardware you are using, e.g. for Linux:
$ lscpu$ uname -aLinux johannes-pc 6.3.0-1-MANJARO #1 SMP PREEMPT_DYNAMIC Mon Apr 3 10:46:56 UTC 2023 x86_64 GNU/LinuxFailure Information (for bugs)
I suspect that there is a race condition somewhere that affects the generated text, and depending on the race condition one of several outputs is produced. I only get the bug when compiling with
LLAMA_CUBLAS=1. I only get the bug with a prompt that is sufficiently long (navy seals copypasta, 399 tokens) but not with a short prompt ("People die when they are killed.", 8 tokens). The number of threads does not matter. Quantization scheme does not matter.Steps to Reproduce
make clean && LLAMA_CUBLAS=1 make./main --model models/llama-33b-ggml-q4_0.bin --ignore-eos --n_predict 16 --ctx_size 2048 --batch_size 512 --threads 6 --seed 1337 --file navy_seals_copypasta.txtwith the filenavy_seals_copypasta.txtcontaining the navy seals copypasta as a prompt (399 tokens).Failure Logs
Below is a log of my console when repeatedly running the same seed and parameters.
Outputs are in order:
Labels: 4chan, epic win, fail, funLabels: 4chan, epic win, fail, fun(thing) by Kalkin Tue Jul 10You think this is abuse? This is how I treat people whoDetails