Added CUDA support#228
Conversation
|
i'm sorry, but there are some conflicts. |
|
Thank you especially for your outstanding contribution. I’m really looking forward to merging this PR. |
|
I'll try to have those conflicts worked out asap 🫡 |
|
Merge conflicts should be worked out! |
- CUDA detection - Memory handling - Ollama model release after training
added cuda support flag so log accurately reflected cuda toggle
Changed llama.cpp to only check if cuda support is enabled and if so rebuild during the first build rather than each run
Enabled memory pinning and optimizer state offload
rewrote llama.cpp rebuild logic, added manual y/n toggle if user wants to enable cuda support
Added make docker-restart-backend-fast to restart the backend and reflect code changes without causing a full llama.cpp rebuild Fixed make docker-check-cuda command to correctly reflect cuda support
05c10aa to
ac10939
Compare
|
Let’s ignore these new conflicts for now. I tested your branch, and it seems there’s a bit of an issue. [+] Building 2/2 |
|
I probably haven’t installed the GPU driver for Docker. Maybe this PR should also mention that in the README. |
|
Yeah sorry I should've added that, it needs the Nvidia container toolkit installed, can I ask if you're running this on windows or Linux, and if Windows are you using the WSL 2 backend? |
|
@yingapple I tried replicating your issue by completely resetting my docker and wsl environments but it seems to have all started fine. Please try installing the nvidia container toolkit with:
then to test your gpu is detected
Please let me know if this works for you and your system details. Thank you! |
I have run success on my a100 server! We are testing it on none gpu machine |
|
One Error is we should add "\n" before you modify the .env, otherwise the docker start fail |
|
@zpitroda Additionally: |
Added docker-compose.gpu.yml to fix error on machines without nvidia gpu and made sure "\n" is added before .env modification
0489efe to
e1ae6f5
Compare
…cond-Me into feature/cuda-support
Last push accidentally broke cuda toggle
|
@yingapple thanks for the suggestions! I've incorporated both of them and synced the branch with the main one! |
Use cuda by default when using docker-restart-backend-fast
nvm was fixed with last merge |
Removed unnecessary makefile command and fixed gpu logging
I'm testing. |
Yeah you beat me to it I was about to push a fix as well lol |
The problem has been solved. I've found the model location. Thank you. |
- Removed dtype setting to let torch automatically handle it - Removed vram logging - Removed Unnecessary/old comments
Made "make docker-use-gpu/cpu" command work with .gpu_selected flag and changed "make docker-restart-backend-fast" command to respect flag instead of always using gpu
|
Working on figuring out the cause but training is currently failing with a |
|
When calling local Ollama, the following problems will occur: |
Added custom exception class for Ollama embeddings, which seemed to be returning keyword arguments while the Python exception class only accepts positional ones
@3050226203 Please let me know if it works after my last push! |
Fixed training defaulting to 0.5B model regardless of selection and fixed "free(): double free detected in tcache 2" error caused by cuda flag being passed incorrectly
|
I finished my test. Let's merge. And do regression testing. Your contribution will be in our first release version! |
Love to hear it! Last thing I'm just working out rn is making sure large models like the 7B work regardless of available memory I'll hopefully be able to push that tomorrow |
* Add CUDA support - CUDA detection - Memory handling - Ollama model release after training * Fix logging issue added cuda support flag so log accurately reflected cuda toggle * Update llama.cpp rebuild Changed llama.cpp to only check if cuda support is enabled and if so rebuild during the first build rather than each run * Improved vram management Enabled memory pinning and optimizer state offload * Fix CUDA check rewrote llama.cpp rebuild logic, added manual y/n toggle if user wants to enable cuda support * Added fast restart and fixed CUDA check command Added make docker-restart-backend-fast to restart the backend and reflect code changes without causing a full llama.cpp rebuild Fixed make docker-check-cuda command to correctly reflect cuda support * Added docker-compose.gpu.yml Added docker-compose.gpu.yml to fix error on machines without nvidia gpu and made sure "\n" is added before .env modification * Fixed cuda toggle Last push accidentally broke cuda toggle * Code review fixes Fixed errors resulting from removed code: - Added return save_path to end of save_hf_model function - Rolled back download_file_with_progress function * Update Makefile Use cuda by default when using docker-restart-backend-fast * Minor cleanup Removed unnecessary makefile command and fixed gpu logging * Delete .gpu_selected * Simplified cuda training code - Removed dtype setting to let torch automatically handle it - Removed vram logging - Removed Unnecessary/old comments * Fixed gpu/cpu selection Made "make docker-use-gpu/cpu" command work with .gpu_selected flag and changed "make docker-restart-backend-fast" command to respect flag instead of always using gpu * Fix Ollama embedding error Added custom exception class for Ollama embeddings, which seemed to be returning keyword arguments while the Python exception class only accepts positional ones * Fixed model selection & memory error Fixed training defaulting to 0.5B model regardless of selection and fixed "free(): double free detected in tcache 2" error caused by cuda flag being passed incorrectly
* feature: use uv to setup python environment * TrainProcessService add singleten method: get_instance * feat: fix code * Added CUDA support (#228) * Add CUDA support - CUDA detection - Memory handling - Ollama model release after training * Fix logging issue added cuda support flag so log accurately reflected cuda toggle * Update llama.cpp rebuild Changed llama.cpp to only check if cuda support is enabled and if so rebuild during the first build rather than each run * Improved vram management Enabled memory pinning and optimizer state offload * Fix CUDA check rewrote llama.cpp rebuild logic, added manual y/n toggle if user wants to enable cuda support * Added fast restart and fixed CUDA check command Added make docker-restart-backend-fast to restart the backend and reflect code changes without causing a full llama.cpp rebuild Fixed make docker-check-cuda command to correctly reflect cuda support * Added docker-compose.gpu.yml Added docker-compose.gpu.yml to fix error on machines without nvidia gpu and made sure "\n" is added before .env modification * Fixed cuda toggle Last push accidentally broke cuda toggle * Code review fixes Fixed errors resulting from removed code: - Added return save_path to end of save_hf_model function - Rolled back download_file_with_progress function * Update Makefile Use cuda by default when using docker-restart-backend-fast * Minor cleanup Removed unnecessary makefile command and fixed gpu logging * Delete .gpu_selected * Simplified cuda training code - Removed dtype setting to let torch automatically handle it - Removed vram logging - Removed Unnecessary/old comments * Fixed gpu/cpu selection Made "make docker-use-gpu/cpu" command work with .gpu_selected flag and changed "make docker-restart-backend-fast" command to respect flag instead of always using gpu * Fix Ollama embedding error Added custom exception class for Ollama embeddings, which seemed to be returning keyword arguments while the Python exception class only accepts positional ones * Fixed model selection & memory error Fixed training defaulting to 0.5B model regardless of selection and fixed "free(): double free detected in tcache 2" error caused by cuda flag being passed incorrectly * fix: train service singlten --------- Co-authored-by: Zachary Pitroda <30330004+zpitroda@users.noreply.github.com>
* Add CUDA support - CUDA detection - Memory handling - Ollama model release after training * Fix logging issue added cuda support flag so log accurately reflected cuda toggle * Update llama.cpp rebuild Changed llama.cpp to only check if cuda support is enabled and if so rebuild during the first build rather than each run * Improved vram management Enabled memory pinning and optimizer state offload * Fix CUDA check rewrote llama.cpp rebuild logic, added manual y/n toggle if user wants to enable cuda support * Added fast restart and fixed CUDA check command Added make docker-restart-backend-fast to restart the backend and reflect code changes without causing a full llama.cpp rebuild Fixed make docker-check-cuda command to correctly reflect cuda support * Added docker-compose.gpu.yml Added docker-compose.gpu.yml to fix error on machines without nvidia gpu and made sure "\n" is added before .env modification * Fixed cuda toggle Last push accidentally broke cuda toggle * Code review fixes Fixed errors resulting from removed code: - Added return save_path to end of save_hf_model function - Rolled back download_file_with_progress function * Update Makefile Use cuda by default when using docker-restart-backend-fast * Minor cleanup Removed unnecessary makefile command and fixed gpu logging * Delete .gpu_selected * Simplified cuda training code - Removed dtype setting to let torch automatically handle it - Removed vram logging - Removed Unnecessary/old comments * Fixed gpu/cpu selection Made "make docker-use-gpu/cpu" command work with .gpu_selected flag and changed "make docker-restart-backend-fast" command to respect flag instead of always using gpu * Fix Ollama embedding error Added custom exception class for Ollama embeddings, which seemed to be returning keyword arguments while the Python exception class only accepts positional ones * Fixed model selection & memory error Fixed training defaulting to 0.5B model regardless of selection and fixed "free(): double free detected in tcache 2" error caused by cuda flag being passed incorrectly
* feature: use uv to setup python environment * TrainProcessService add singleten method: get_instance * feat: fix code * Added CUDA support (mindverse#228) * Add CUDA support - CUDA detection - Memory handling - Ollama model release after training * Fix logging issue added cuda support flag so log accurately reflected cuda toggle * Update llama.cpp rebuild Changed llama.cpp to only check if cuda support is enabled and if so rebuild during the first build rather than each run * Improved vram management Enabled memory pinning and optimizer state offload * Fix CUDA check rewrote llama.cpp rebuild logic, added manual y/n toggle if user wants to enable cuda support * Added fast restart and fixed CUDA check command Added make docker-restart-backend-fast to restart the backend and reflect code changes without causing a full llama.cpp rebuild Fixed make docker-check-cuda command to correctly reflect cuda support * Added docker-compose.gpu.yml Added docker-compose.gpu.yml to fix error on machines without nvidia gpu and made sure "\n" is added before .env modification * Fixed cuda toggle Last push accidentally broke cuda toggle * Code review fixes Fixed errors resulting from removed code: - Added return save_path to end of save_hf_model function - Rolled back download_file_with_progress function * Update Makefile Use cuda by default when using docker-restart-backend-fast * Minor cleanup Removed unnecessary makefile command and fixed gpu logging * Delete .gpu_selected * Simplified cuda training code - Removed dtype setting to let torch automatically handle it - Removed vram logging - Removed Unnecessary/old comments * Fixed gpu/cpu selection Made "make docker-use-gpu/cpu" command work with .gpu_selected flag and changed "make docker-restart-backend-fast" command to respect flag instead of always using gpu * Fix Ollama embedding error Added custom exception class for Ollama embeddings, which seemed to be returning keyword arguments while the Python exception class only accepts positional ones * Fixed model selection & memory error Fixed training defaulting to 0.5B model regardless of selection and fixed "free(): double free detected in tcache 2" error caused by cuda flag being passed incorrectly * fix: train service singlten --------- Co-authored-by: Zachary Pitroda <30330004+zpitroda@users.noreply.github.com>
* Add CUDA support - CUDA detection - Memory handling - Ollama model release after training * Fix logging issue added cuda support flag so log accurately reflected cuda toggle * Update llama.cpp rebuild Changed llama.cpp to only check if cuda support is enabled and if so rebuild during the first build rather than each run * Improved vram management Enabled memory pinning and optimizer state offload * Fix CUDA check rewrote llama.cpp rebuild logic, added manual y/n toggle if user wants to enable cuda support * Added fast restart and fixed CUDA check command Added make docker-restart-backend-fast to restart the backend and reflect code changes without causing a full llama.cpp rebuild Fixed make docker-check-cuda command to correctly reflect cuda support * Added docker-compose.gpu.yml Added docker-compose.gpu.yml to fix error on machines without nvidia gpu and made sure "\n" is added before .env modification * Fixed cuda toggle Last push accidentally broke cuda toggle * Code review fixes Fixed errors resulting from removed code: - Added return save_path to end of save_hf_model function - Rolled back download_file_with_progress function * Update Makefile Use cuda by default when using docker-restart-backend-fast * Minor cleanup Removed unnecessary makefile command and fixed gpu logging * Delete .gpu_selected * Simplified cuda training code - Removed dtype setting to let torch automatically handle it - Removed vram logging - Removed Unnecessary/old comments * Fixed gpu/cpu selection Made "make docker-use-gpu/cpu" command work with .gpu_selected flag and changed "make docker-restart-backend-fast" command to respect flag instead of always using gpu * Fix Ollama embedding error Added custom exception class for Ollama embeddings, which seemed to be returning keyword arguments while the Python exception class only accepts positional ones * Fixed model selection & memory error Fixed training defaulting to 0.5B model regardless of selection and fixed "free(): double free detected in tcache 2" error caused by cuda flag being passed incorrectly
* feature: use uv to setup python environment * TrainProcessService add singleten method: get_instance * feat: fix code * Added CUDA support (mindverse#228) * Add CUDA support - CUDA detection - Memory handling - Ollama model release after training * Fix logging issue added cuda support flag so log accurately reflected cuda toggle * Update llama.cpp rebuild Changed llama.cpp to only check if cuda support is enabled and if so rebuild during the first build rather than each run * Improved vram management Enabled memory pinning and optimizer state offload * Fix CUDA check rewrote llama.cpp rebuild logic, added manual y/n toggle if user wants to enable cuda support * Added fast restart and fixed CUDA check command Added make docker-restart-backend-fast to restart the backend and reflect code changes without causing a full llama.cpp rebuild Fixed make docker-check-cuda command to correctly reflect cuda support * Added docker-compose.gpu.yml Added docker-compose.gpu.yml to fix error on machines without nvidia gpu and made sure "\n" is added before .env modification * Fixed cuda toggle Last push accidentally broke cuda toggle * Code review fixes Fixed errors resulting from removed code: - Added return save_path to end of save_hf_model function - Rolled back download_file_with_progress function * Update Makefile Use cuda by default when using docker-restart-backend-fast * Minor cleanup Removed unnecessary makefile command and fixed gpu logging * Delete .gpu_selected * Simplified cuda training code - Removed dtype setting to let torch automatically handle it - Removed vram logging - Removed Unnecessary/old comments * Fixed gpu/cpu selection Made "make docker-use-gpu/cpu" command work with .gpu_selected flag and changed "make docker-restart-backend-fast" command to respect flag instead of always using gpu * Fix Ollama embedding error Added custom exception class for Ollama embeddings, which seemed to be returning keyword arguments while the Python exception class only accepts positional ones * Fixed model selection & memory error Fixed training defaulting to 0.5B model regardless of selection and fixed "free(): double free detected in tcache 2" error caused by cuda flag being passed incorrectly * fix: train service singlten --------- Co-authored-by: Zachary Pitroda <30330004+zpitroda@users.noreply.github.com>
Added:
Current issues:
To do:
Technical changes:
make docker-up./api/kernel2/cuda/availablethat usestorch.cuda.is_available()to check if CUDA is usable at runtime.