Skip to content

lock instead of spinlock#3

Draft
bogdad wants to merge 2 commits intomasterfrom
locking_for_threads
Draft

lock instead of spinlock#3
bogdad wants to merge 2 commits intomasterfrom
locking_for_threads

Conversation

@bogdad
Copy link
Copy Markdown
Owner

@bogdad bogdad commented Mar 31, 2023

a try to replace busy wait spinlock with mutex - cond var described here ggml-org#633

vladimir@FT751F6N7D ~/w/llama.cpp (locking_for_threads) [SIGINT]> ./build/bin/main -m ./models/7B/ggml-model-q4_0.bin --color -f ./prompts/alpaca.txt -n 64 -t 6
main: seed = 1680291012
llama_model_load: loading model from './models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: type    = 1
llama_model_load: ggml map size = 4017.70 MB
llama_model_load: ggml ctx size =  81.25 KB
llama_model_load: mem required  = 5809.78 MB (+ 1026.00 MB per state)
llama_model_load: loading tensors from './models/7B/ggml-model-q4_0.bin'
llama_model_load: model size =  4017.27 MB / num tensors = 291
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 6 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
sampling: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000
generate: n_ctx = 512, n_batch = 8, n_predict = 64, n_keep = 0


 Below is an instruction that describes a task. Write a response that appropriately completes the request.
Alice is planning on going to dinner with her boyfriend, and she wants to look particularly pretty for him. She asks you what she should wear so that he will think she looks great.
“I’ll probably be wearing black jeans, blue top, pink necklace and silver shoes
llama_print_timings:        load time =   726.89 ms
llama_print_timings:      sample time =    46.19 ms /    64 runs   (    0.72 ms per run)
llama_print_timings: prompt eval time =  1302.22 ms /    21 tokens (   62.01 ms per token)
llama_print_timings:        eval time =  8034.21 ms /    63 runs   (  127.53 ms per run)
llama_print_timings:       total time =  9589.29 ms
v

vs. master:

vladimir@FT751F6N7D ~/w/llama.cpp (master)> ./build/bin/main -m ./models/7B/ggml-model-q4_0.bin --color -f ./prompts/alpaca.txt -n 64 -t 6        (base)
main: seed = 1680291260
llama_model_load: loading model from './models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: type    = 1
llama_model_load: ggml map size = 4017.70 MB
llama_model_load: ggml ctx size =  81.25 KB
llama_model_load: mem required  = 5809.78 MB (+ 1026.00 MB per state)
llama_model_load: loading tensors from './models/7B/ggml-model-q4_0.bin'
llama_model_load: model size =  4017.27 MB / num tensors = 291
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 6 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
sampling: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000
generate: n_ctx = 512, n_batch = 8, n_predict = 64, n_keep = 0


 Below is an instruction that describes a task. Write a response that appropriately completes the request.
How to write a response: (1) Identify which type of writing you are expected to complete; and (2) describe your work and any additional steps necessary in order to complete the task at hand. After your initial draft, revise and edit as needed to ensure your work is clear and succinct.
llama_print_timings:        load time =   690.21 ms
llama_print_timings:      sample time =    46.70 ms /    64 runs   (    0.73 ms per run)
llama_print_timings: prompt eval time =  1121.04 ms /    21 tokens (   53.38 ms per token)
llama_print_timings:        eval time =  3738.95 ms /    63 runs   (   59.35 ms per run)
llama_print_timings:       total time =  5131.97 ms
vladimir@FT751F6N7D ~/w/llama.cpp (master)> ./build/bin/main -m ./models/7B/ggml-model-q4_0.bin --color -f ./prompts/alpaca.txt -n 64 -t 6        (base)
main: seed = 1680291274
llama_model_load: loading model from './models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: type    = 1
llama_model_load: ggml map size = 4017.70 MB
llama_model_load: ggml ctx size =  81.25 KB
llama_model_load: mem required  = 5809.78 MB (+ 1026.00 MB per state)
llama_model_load: loading tensors from './models/7B/ggml-model-q4_0.bin'
llama_model_load: model size =  4017.27 MB / num tensors = 291
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 6 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
sampling: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000
generate: n_ctx = 512, n_batch = 8, n_predict = 64, n_keep = 0


 Below is an instruction that describes a task. Write a response that appropriately completes the request.
Sir, I have done my assignment, but i got some points incorrect and they are all 5 points. Can you please help me to identify where it went wrong so that i can rectify them?
In your essay on "Masculinity" why do you consider Masculinities a
llama_print_timings:        load time =   658.70 ms
llama_print_timings:      sample time =    46.60 ms /    64 runs   (    0.73 ms per run)
llama_print_timings: prompt eval time =  1114.95 ms /    21 tokens (   53.09 ms per token)
llama_print_timings:        eval time =  3744.40 ms /    63 runs   (   59.43 ms per run)
llama_print_timings:       total time =  5102.00 ms
vladimir@FT751F6N7D ~/w/llama.cpp (master)>                                                                                                       (base)

@bogdad bogdad force-pushed the locking_for_threads branch 5 times, most recently from 32316c5 to 316d873 Compare March 31, 2023 19:51
Comment thread ggml.c Outdated
Copy link
Copy Markdown
Owner Author

@bogdad bogdad Mar 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does not work on windows - fixed, now should work

@bogdad bogdad force-pushed the locking_for_threads branch 2 times, most recently from 67d52da to 1eb0553 Compare April 1, 2023 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant