Implement DeepSeek3B-MoE-A570M (LM component)#2
Implement DeepSeek3B-MoE-A570M (LM component)#2sfallah merged 5 commits intosfallah:sf/deepseek-ocrfrom
Conversation
|
@sfallah I've got DeepSeek3B-MoE-A570M running with llama-cli. It generates responses, but sometimes it just outputs nonsense. This doesn't happen with the original model on text-only prompt. Something is still off though I have double-checked the configuration and architecture. Probably because of the tokenizer. |
|
@sfallah Fixed the bug. LM is ready now. Back to working on the vision model. Let me know if you need me to focus on any particular part. |
|
@bluebread The original pytorch impl is actually simple: And the same thing with get_rel_pos FYI: I have fixed some issues that I will push before merging. |
|
@sfallah I've implemented these operations in CUDA backend and opened a PR ggml-org#17383 to the main repository. You can get this feature from the op-dsocr-clean branch. |
|
@bluebread I am still investigating/experimenting with this. |
|
@sfallah nice! good idea to work around. where are we at with the vision model? does it runs yet? fyi you can copy-paste the code from examples/eval-callback.cpp and set cb_eval parameter to verify the model runs as expected. |
|
@bluebread FYI: I have been using https://github.com/ggml-org/ggml/blob/master/examples/sam/sam.cpp for testing replacement of ggml_win_part . So I can test the SAM changes isolated this way. |
|
@bluebread https://github.com/sfallah/llama.cpp/blob/sf/deepseek-ocr/tools/mtmd/clip.cpp#L2473 The functions are working. But the clip.cpp still has an issue, so the latest commit still doesn't run. |
|
@sfallah No problem. I'll take a look. |
|
@bluebread |
|
@sfallah |
Make sure to read the contributing guidelines before submitting a PR
Implemented DeepSeek3B-MoE-A570M (the LM component of DeepSeek-OCR)
but haven't tested it through.Todo