[Draft] Tensor Parallel support to llama.cpp by ClarkChin08 · Pull Request #9648 · ggml-org/llama.cpp

ClarkChin08 · 2024-09-26T02:36:27Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- [ * ] Medium
- High
  Add tensor parallel support to llama.cpp, still draft code now.

Signed-off-by: Chen Xi <xi2chen@intel.com>

ClarkChin08 · 2024-09-26T02:37:58Z

#9086 Refer to this issue for detailed design.

NeoZhangJianyu · 2024-09-26T10:06:16Z

@ClarkChin08
It's great to see this feature is implemented.

Is it possible to update the guide/doc to explain how to use this feature:

how to enable it.
what's the benefit.
which case should use this feature.
update the installation for dependent package (oneCCL, MPI) in oneAPI.

Thank you!

NeoZhangJianyu · 2024-09-26T10:10:26Z

        list(APPEND GGML_EXTRA_LIBS_PRIVATE DNNL::dnnl)
    endif()

+    set(oneCCL_DIR "/opt/intel/oneapi/ccl/latest/lib/cmake/oneCCL")


The real oneapi path is not always in /opt/intel/oneapi/.
Please use ENV{ONEAPI_ROOT} which is mandatory env variable in cmakefile.

Same for following script

NeoZhangJianyu · 2024-09-26T10:13:10Z

+    find_library(MPI_LIBRARY mpi HINTS ${MPI_LIBRARY_PATH})
+    find_library(ONECCL_LIBRARY ccl HINTS ${ONECCL_LIBRARY_PATH})
+    # find_package(oneCCL REQUIRED)
+    message("-- oneCCL found")


Add script for not found oneCCL.

oneCCL is not included in oneAPI base toolkit, please print the message to guide user how to install it.

NeoZhangJianyu · 2024-09-26T10:16:45Z

            return -1;
        }
-
+	inline int get_rank() { return _rank; }


These new functions have no relationship with DPCT.
It's better to move the ggml-sycl/src.
Recommend to reduce the dependence on DPCT code.

NeoZhangJianyu · 2024-09-26T10:20:02Z

                    _cpu_device = _devs.size() - 1;
                }
            }
+	    init_ccl();


mv this init() function to ggml-sycl/src.

slaren · 2024-09-26T22:22:04Z

+    enum tensor_parallel_mode {
+        TENSOR_NO_CHANGE,
+	TENSOR_SPLIT_BY_ROW,
+	TENSOR_SPLIT_BY_COLUMN,
+	TENSOR_KEEPED_ON_MASTER
+    };


Changes to the common ggml code should not be made unless absolutely necessary, which is not likely to be the case here. We already have a way to handle this with custom buffer types like the existing CUDA and SYCL split buffer types. You can extend this model instead by creating a different buffer type for tensors split by column. The "tensors kept on master" is just the default buffer type.

I don't think it is possible to do #9086 (comment) with only backend extra buffer.

fairydreaming · 2024-10-27T16:27:48Z

+    find_library(ONECCL_LIBRARY ccl HINTS ${ONECCL_LIBRARY_PATH})
+    # find_package(oneCCL REQUIRED)
+    message("-- oneCCL found")
+    set(GGML_EXTRA_LIBS ${GGML_EXTRA_LIBS} ${MPI_LIBRARY_PATH} ${ONECCL_LIBRARY_PATH})


GGML_EXTRA_LIBS was recently split into GGML_EXTRA_LIBS_PUBLIC and GGML_EXTRA_LIBS_PRIVATE, so I think the line above won't work anymore
Also why there are paths to the lib directories inside this variable instead of found mpi/ccl libraries?

fairydreaming · 2024-10-27T16:29:32Z

        llama_model_loader ml(fname, params.use_mmap, params.check_tensors, params.kv_overrides);

        model.hparams.vocab_only = params.vocab_only;
+        if (params.tensor_split == LLAMA_SPLIT_MODE_TENSOR) {


Shouldn't it be params.split_mode instead of params.tensor_split?

ehartford · 2025-01-07T23:03:03Z

hello - was this feature completed?

lexasub · 2025-01-12T00:02:49Z

@ClarkChin08, hello - was this feature completed?

Neko-Box-Coder · 2025-04-19T18:58:30Z

Hi, thanks and appreciate the work. It would be great to have this feature added/completed, which will bring great performance for multi gpu setup, similar to what vllm already has.

AbdullahMPrograms · 2025-04-30T22:55:05Z

This looks really interesting! Having tp support like vllm does would bring some great speed ups!

zacksiri · 2025-05-14T12:21:23Z

looking forward to having this feature.

zacksiri · 2025-05-31T13:34:01Z

Just a bump, this feature would be really great for the community.

aidendle94 · 2025-06-27T22:45:53Z

I suspect the OP has abandoned development and this feature is incomplete.

netrunnereve · 2025-06-28T01:26:30Z

There's actually some much more recent progress on this in #13818 and #13776, but it's not ready yet.

Chen Xi added 2 commits September 26, 2024 02:34

add tensor parallelism support to SYCL

cb8507b

Signed-off-by: Chen Xi <xi2chen@intel.com>

add tensor parallel support

c9ae191

Signed-off-by: Chen Xi <xi2chen@intel.com>

github-actions Bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Sep 26, 2024

NeoZhangJianyu reviewed Sep 26, 2024

View reviewed changes

slaren reviewed Sep 26, 2024

View reviewed changes

fairydreaming reviewed Oct 27, 2024

View reviewed changes

rankaiyx mentioned this pull request Aug 13, 2025

Feature Request: Multi NUMA Tensor Parallel ikawrakow/ik_llama.cpp#663

Open

4 tasks

Conversation

ClarkChin08 commented Sep 26, 2024

Uh oh!

ClarkChin08 commented Sep 26, 2024

Uh oh!

NeoZhangJianyu commented Sep 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NeoZhangJianyu Sep 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NeoZhangJianyu Sep 26, 2024

Choose a reason for hiding this comment

Uh oh!

NeoZhangJianyu Sep 26, 2024

Choose a reason for hiding this comment

Uh oh!

NeoZhangJianyu Sep 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

slaren Sep 26, 2024

Choose a reason for hiding this comment

Uh oh!

Djip007 Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

fairydreaming Oct 27, 2024

Choose a reason for hiding this comment

Uh oh!

fairydreaming Oct 27, 2024

Choose a reason for hiding this comment

Uh oh!

ehartford commented Jan 7, 2025

Uh oh!

lexasub commented Jan 12, 2025

Uh oh!

Neko-Box-Coder commented Apr 19, 2025

Uh oh!

AbdullahMPrograms commented Apr 30, 2025

Uh oh!

zacksiri commented May 14, 2025

Uh oh!

zacksiri commented May 31, 2025

Uh oh!

aidendle94 commented Jun 27, 2025

Uh oh!

netrunnereve commented Jun 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

NeoZhangJianyu commented Sep 26, 2024 •

edited

Loading

NeoZhangJianyu Sep 26, 2024 •

edited

Loading

NeoZhangJianyu Sep 26, 2024 •

edited

Loading