Add Jina Chinese Embedding model by JoanFM · Pull Request #2 · JoanFM/llama.cpp

JoanFM · 2024-04-30T12:31:07Z

No description provided.

…rompt (ggml-org#7950) * SimpleChat: Allow for chat req bool options to be user controlled * SimpleChat: Allow user to control cache_prompt flag in request * SimpleChat: Add sample GUI images to readme file Show the chat screen and the settings screen * SimpleChat:Readme: Add quickstart block, title to image, cleanup * SimpleChat: RePosition contents of the Info and Settings UI Make it more logically structured and flow through. * SimpleChat: Rename to apiRequestOptions from chatRequestOptions So that it is not wrongly assumed that these request options are used only for chat/completions endpoint. Rather these are used for both the end points, so rename to match semantic better. * SimpleChat: Update image included with readme wrt settings ui * SimpleChat:ReadMe: Switch to webp screen image to reduce size

* add chat template support for llama-cli * add help message * server: simplify format_chat * more consistent naming * improve * add llama_chat_format_example * fix server * code style * code style * Update examples/main/main.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

) * remove completions file * fix inverted vector * add mean method * code style * remove inverted pca hotfix

…ggml-org#8054) * gguf-dump: add --data-offset * gguf-dump: add tensor data offset table * gguf-dump: refactor GGUFReader for clarity * gguf-dump: add --data-alignment * gguf-dump.py: Rename variables and adjust comments start_data_offset --> data_offset _build_tensors_info_fields --> _build_tensor_info

* added healthcheck * added healthcheck * added healthcheck * added healthcheck * added healthcheck * moved curl to base * moved curl to base

…Maximum (ggml-org#7797) * json: support minimum for positive integer values * json: fix min 0 * json: min + max integer constraints * json: handle negative min / max integer bounds * json: fix missing paren min/max bug * json: proper paren fix * json: integration test for schemas * json: fix bounds tests * Update json-schema-to-grammar.cpp * json: fix negative max * json: fix negative min (w/ more than 1 digit) * Update test-grammar-integration.cpp * json: nit: move string rules together * json: port min/max integer support to Python & JS * nit: move + rename _build_min_max_int * fix min in [1, 9] * Update test-grammar-integration.cpp * add C++11-compatible replacement for std::string_view * add min/max constrained int field to pydantic json schema example * fix merge * json: add integration tests for min/max bounds * reshuffle/merge min/max integ test cases * nits / cleanups * defensive code against string out of bounds (apparently different behaviour of libstdc++ vs. clang's libc++, can't read final NULL char w/ former)

* llama : fix codeshell support * llama : move codeshell after smollm below to respect the enum order

* contrib : clarify PR squashing * contrib : fix typo + add list of modules

The check gating the use of `__builtin_amdgc_sdot4` specifically checks for gfx1030. This causes a severe perf regression for anything gfx103? that's not gfx1030 and not using `HSA_OVERRIDE_GFX_VERSION` (if you've built ROCm to support it). We already have a generic RDNA2 define, let's use it.

* Fix Vulkan matmul tests compile errors * Add Vulkan IQ4_NL support * Fix Vulkan DeepSeek-Coder-V2-Lite MoE support

…g#8508) * llama : move sampling code into llama-sampling ggml-ci * llama : move grammar code into llama-grammar ggml-ci * cont ggml-ci * cont : pre-fetch rules * cont ggml-ci * llama : deprecate llama_sample_grammar * llama : move tokenizers into llama-vocab ggml-ci * make : update llama.cpp deps [no ci] * llama : redirect external API to internal APIs ggml-ci * llama : suffix the internal APIs with "_impl" ggml-ci * llama : clean-up

* Update cmake to support nvidia hardware & open-source compiler --------- Signed-off-by: Joe Todd <joe.todd@codeplay.com>

* fix export-lora example * add more logging * reject merging subset * better check * typo

* fix `llama_chat_format_single` for mistral * fix typo * use printf

Ensure SYCL CI builds both static & dynamic libs for testing purposes Signed-off-by: Joe Todd <joe.todd@codeplay.com>

Added link to game I made that depends on llama

* use sliding window for phi3 * fix typo, "data_swa" -> "data" * [conver_hf_to_gguf.py] add phi3 sliding window

* docfix: imatrix readme, quantum models -> quantized models. * docfix: server readme: quantum models -> quantized models.

…8669) * examples : remove finetune and train-text-from-scratch * fix build * update help message * fix small typo for export-lora

--------- Signed-off-by: Chen Xi <xi2chen@intel.com> Co-authored-by: Meng, Hengyu <hengyu.meng@intel.com>

* Improvements for Windows with Snapdragon X * Revert "Improvements for Windows with Snapdragon X" This reverts commit bf21397. * Improvements for Windows with Snapdragon X * WOA build clarifications * WIndows on ARM build clarifications * cmake build for Windows clarifications * Update docs/build.md Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: AndreasKunar <andreaskmsn.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

ggml-ci

`ggml_init` can fail if no unused context is found. In that case, a NULL-pointer deref will happen later in the code during a call to `ggml_set_on_alloc`. This fixes it by bailing out if no context is found.

…rg#8687)

* server : add Speech Recognition & Synthesis to UI * server : add Speech Recognition & Synthesis to UI (fixes)

…t-jina-embeddings-v2-zh

* vulkan : do not use tensor->extra This patch allows using the Vulkan backend with the RPC backend as tensor->extra is no longer used. Ref: ggml-org#8536 * Adapt GGML_VULKAN_CHECK_RESULTS to extra removal (#2) --------- Co-authored-by: 0cc4m <picard12@live.de>

JoanFM force-pushed the feat-jina-embeddings branch from da96368 to d9b8dd6 Compare April 30, 2024 12:34

JoanFM force-pushed the feat-jina-embeddings-v2-zh branch from 72e93b2 to 3269efe Compare May 11, 2024 09:53

JoanFM changed the title ~~Feat jina embeddings v2 zh~~ Add Jina Chinese Embedding model May 13, 2024

JoanFM changed the base branch from feat-jina-embeddings to master May 13, 2024 07:46

JoanFM force-pushed the feat-jina-embeddings-v2-zh branch 2 times, most recently from fa527a5 to ea0f7df Compare May 13, 2024 08:31

JoanFM changed the base branch from master to feat-jina-v2-base-code June 5, 2024 14:31

github-actions Bot added the python label Jun 5, 2024

JoanFM force-pushed the feat-jina-embeddings-v2-zh branch 2 times, most recently from 4dc0fe9 to 605a619 Compare June 6, 2024 08:31

github-actions Bot added Nvidia GPU build devops ggml labels Jun 6, 2024

github-actions Bot added documentation Improvements or additions to documentation Kompute SYCL Vulkan testing examples script server nix labels Jun 18, 2024

hanishkvc and others added 7 commits June 25, 2024 21:27

cvector: better prompt handling, add "mean vector" method (ggml-org#8069

49c03c7

) * remove completions file * fix inverted vector * add mean method * code style * remove inverted pca hotfix

Add healthchecks to llama-server containers (ggml-org#8081)

925c309

* added healthcheck * added healthcheck * added healthcheck * added healthcheck * added healthcheck * moved curl to base * moved curl to base

disable docker CI on pull requests (ggml-org#8110)

dd047b4

hankeke303 and others added 26 commits July 22, 2024 19:43

llama : fix codeshell support (ggml-org#8599)

081fe43

* llama : fix codeshell support * llama : move codeshell after smollm below to respect the enum order

[SYCL] fix scratch size of softmax (ggml-org#8642)

063d99a

contrib : clarify PR squashing + module names (ggml-org#8630)

e7e6487

* contrib : clarify PR squashing * contrib : fix typo + add list of modules

Vulkan IQ4_NL Support (ggml-org#8613)

751fcfc

* Fix Vulkan matmul tests compile errors * Add Vulkan IQ4_NL support * Fix Vulkan DeepSeek-Coder-V2-Lite MoE support

sycl : Add support for non-release DPC++ & oneMKL (ggml-org#8644)

64cf50a

* Update cmake to support nvidia hardware & open-source compiler --------- Signed-off-by: Joe Todd <joe.todd@codeplay.com>

server : fix URL.parse in the UI (ggml-org#8646)

b841d07

examples : Fix llama-export-lora example (ggml-org#8607)

de28008

* fix export-lora example * add more logging * reject merging subset * better check * typo

add llama_lora_adapter_clear (ggml-org#8653)

b115105

Re-add erroneously removed -fsycl from GGML_EXTRA_LIBS (ggml-org#8667)

79167d9

llama : fix llama_chat_format_single for mistral (ggml-org#8657)

96952e7

* fix `llama_chat_format_single` for mistral * fix typo * use printf

readme : update UI list [no ci] (ggml-org#8505)

3a7ac53

Build Llama SYCL Intel with static libs (ggml-org#8668)

f19bf99

Ensure SYCL CI builds both static & dynamic libs for testing purposes Signed-off-by: Joe Todd <joe.todd@codeplay.com>

readme : update games list (ggml-org#8673)

68504f0

Added link to game I made that depends on llama

llama: use sliding window for phi3 (ggml-org#8627)

8a4bad5

* use sliding window for phi3 * fix typo, "data_swa" -> "data" * [conver_hf_to_gguf.py] add phi3 sliding window

docs : Quantum -> Quantized (ggml-org#8666)

4b0eff3

* docfix: imatrix readme, quantum models -> quantized models. * docfix: server readme: quantum models -> quantized models.

examples : remove finetune and train-text-from-scratch (ggml-org#…

be6d7c0

…8669) * examples : remove finetune and train-text-from-scratch * fix build * update help message * fix small typo for export-lora

ggml : add and use ggml_cpu_has_llamafile() (ggml-org#8664)

eddcb52

[SYCL] fix multi-gpu issue on sycl (ggml-org#8554)

ed67bcb

--------- Signed-off-by: Chen Xi <xi2chen@intel.com> Co-authored-by: Meng, Hengyu <hengyu.meng@intel.com>

tests : fix printfs (ggml-org#8068)

88954f7

llama : fix build + fix fabs compile warnings (ggml-org#8683)

4226a8d

ggml-ci

ggml: handle ggml_init failure to fix NULL pointer deref (ggml-org#8692)

49ce0ab

`ggml_init` can fail if no unused context is found. In that case, a NULL-pointer deref will happen later in the code during a call to `ggml_set_on_alloc`. This fixes it by bailing out if no context is found.

examples : export-lora : fix issue with quantized base models (ggml-o…

41cd47c

…rg#8687)

server : add Speech Recognition & Synthesis to UI (ggml-org#8679)

01aec4a

* server : add Speech Recognition & Synthesis to UI * server : add Speech Recognition & Synthesis to UI (fixes)

JoanFM force-pushed the feat-jina-embeddings-v2-zh branch from 2cfcbbe to e2a91ef Compare July 26, 2024 07:19

Merge branch 'master' of https://github.com/JoanFM/llama.cpp into fea…

201559d

…t-jina-embeddings-v2-zh

JoanFM force-pushed the feat-jina-embeddings-v2-zh branch from e2a91ef to 201559d Compare July 26, 2024 07:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Jina Chinese Embedding model#2

Add Jina Chinese Embedding model#2
JoanFM wants to merge 423 commits intofeat-jina-v2-base-codefrom
feat-jina-embeddings-v2-zh

JoanFM commented Apr 30, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

JoanFM commented Apr 30, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants