Skip to content

【Hackathon 9th No.109】[CppExtension] Support build Custom OP in setuptools 80+ -part#4977

Merged
luotao1 merged 3 commits intoPaddlePaddle:developfrom
megemini:setuptools80
Nov 17, 2025
Merged

【Hackathon 9th No.109】[CppExtension] Support build Custom OP in setuptools 80+ -part#4977
luotao1 merged 3 commits intoPaddlePaddle:developfrom
megemini:setuptools80

Conversation

@megemini
Copy link
Contributor

@megemini megemini commented Nov 12, 2025

Motivation

关联 PaddlePaddle/Paddle#76008

框架兼容 setuptools80+ 之后,cpp_extension 的打包方式发生了改变,因此需要修改 FD 中 copy_ops 的逻辑。

Modifications

  1. 增加 WHEEL_MODERN_NAME WHEEL_MODERN_CPU_NAME 这两个变量,分别表示新打包方式下算子生成的目录。
  2. 先判断是否存在旧的 WHEEL_NAME WHEEL_CPU_NAME 这两个目录,如果存在,说明使用的是旧的框架代码(注意,不是旧的 setuptools 版本,新的框架代码不区分 setuptools 版本)。此时,后续逻辑不做改变。
  3. 如果不存在这两个目录,则判断 WHEEL_MODERN_NAME WHEEL_MODERN_CPU_NAME 这两个目录是否存在,如果存在,则说明使用的是新的框架代码,然后,将 WHEEL_MODERN_NAME WHEEL_MODERN_CPU_NAME 这两个目录(注意,不是目录中的文件)整个复制到 WHEEL_NAME WHEEL_CPU_NAME 中(需要先新建)。之后的逻辑也不变。

Usage or Command

bash build.sh

Accuracy Tests

这里在本地测试,修改 custom_ops/setup_ops.py ,只测试 cpu 算子(gpu 编译不了,cuda 版本太低了 ... ...)

use_bf16 = False # envs.FD_CPU_USE_BF16 == "True"

# cc flags
paddle_extra_compile_args = [
    "-std=c++17",
    "-shared",
    "-fPIC",
    "-Wno-parentheses",
    "-DPADDLE_WITH_CUSTOM_KERNEL",
    "-DPADDLE_ON_INFERENCE",
    "-Wall",
    "-O3",
    "-g",
    "-lstdc++fs",
    "-D_GLIBCXX_USE_CXX11_ABI=1",
    "-DPy_LIMITED_API=0x03090000",
]

setup(
    name="fastdeploy_cpu_ops",
    ext_modules=CppExtension(
        sources=[
            "gpu_ops/save_with_output_msg.cc",
            "gpu_ops/get_output.cc",
            "gpu_ops/get_output_msg_with_topk.cc",
            "gpu_ops/save_output_msg_with_topk.cc",
            "gpu_ops/transfer_output.cc",
            "cpu_ops/rebuild_padding.cc",
            # "cpu_ops/simd_sort.cc",
            # "cpu_ops/set_value_by_flags.cc",
            # "cpu_ops/token_penalty_multi_scores.cc",
            "cpu_ops/stop_generation_multi_ends.cc",
            # "cpu_ops/update_inputs.cc",
            # "cpu_ops/get_padding_offset.cc",
        ],
        extra_link_args=[
            "-Wl,-rpath,$ORIGIN/x86-simd-sort/builddir",
            "-Wl,-rpath,$ORIGIN/xFasterTransformer/build",
        ],
        extra_compile_args=paddle_extra_compile_args,
    ),
    packages=find_namespace_packages(where="third_party"),
    package_dir={"": "third_party"},
    include_package_data=True,
)

进行安装:

➜  custom_ops git:(setuptools80) ✗ pip show setuptools           
Name: setuptools
Version: 57.1.0
Summary: Easily download, build, install, upgrade, and uninstall Python packages
Home-page: https://github.com/pypa/setuptools
Author: Python Packaging Authority
Author-email: distutils-sig@python.org
License: UNKNOWN
Location: /usr/local/lib/python3.9/dist-packages
Requires: 
Required-by: astroid, nodeenv, wandb
➜  custom_ops git:(setuptools80) ✗ python setup_ops.py install --install-lib /paddle/Paddle/build/tmp_setuptools/tmp_install
running install
running build
running build_py
package init file 'third_party/cutlass/__init__.py' not found (or not a regular file)
package init file 'third_party/nlohmann_json/__init__.py' not found (or not a regular file)
package init file 'third_party/DeepGEMM/__init__.py' not found (or not a regular file)
running egg_info
writing third_party/fastdeploy_cpu_ops.egg-info/PKG-INFO
writing dependency_links to third_party/fastdeploy_cpu_ops.egg-info/dependency_links.txt
writing top-level names to third_party/fastdeploy_cpu_ops.egg-info/top_level.txt
reading manifest file 'third_party/fastdeploy_cpu_ops.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '*.so' under directory 'third_party/x86-simd-sort'
warning: no files found matching '*.h' under directory 'third_party/x86-simd-sort'
warning: no files found matching '*.so' under directory 'third_party/xFasterTransformer'
warning: no files found matching '*.h' under directory 'third_party/xFasterTransformer'
writing manifest file 'third_party/fastdeploy_cpu_ops.egg-info/SOURCES.txt'
running build_ext
Compiling user custom op, it will cost a few seconds.....
building 'fastdeploy_cpu_ops' extension
creating /paddle/FastDeploy/custom_ops/build/fastdeploy_cpu_ops/lib.linux-x86_64-3.9/build
creating /paddle/FastDeploy/custom_ops/build/fastdeploy_cpu_ops/lib.linux-x86_64-3.9/build/fastdeploy_cpu_ops
creating /paddle/FastDeploy/custom_ops/build/fastdeploy_cpu_ops/lib.linux-x86_64-3.9/build/fastdeploy_cpu_ops/temp.linux-x86_64-3.9
creating /paddle/FastDeploy/custom_ops/build/fastdeploy_cpu_ops/lib.linux-x86_64-3.9/build/fastdeploy_cpu_ops/temp.linux-x86_64-3.9/cpu_ops
creating /paddle/FastDeploy/custom_ops/build/fastdeploy_cpu_ops/lib.linux-x86_64-3.9/build/fastdeploy_cpu_ops/temp.linux-x86_64-3.9/gpu_ops
/usr/local/bin/ccache x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/local/lib/python3.9/dist-packages/paddle/include -I/usr/local/lib/python3.9/dist-packages/paddle/include/third_party -I/usr/local/lib/python3.9/dist-packages/paddle/include/paddle/phi/api/include/compat -I/usr/local/lib/python3.9/dist-packages/paddle/include/paddle/phi/api/include/compat/torch/csrc/api/include -I/usr/include/python3.9 -I/usr/include/python3.9 -c /paddle/FastDeploy/custom_ops/cpu_ops/rebuild_padding.cc -o /paddle/FastDeploy/custom_ops/build/fastdeploy_cpu_ops/lib.linux-x86_64-3.9/build/fastdeploy_cpu_ops/temp.linux-x86_64-3.9/cpu_ops/rebuild_padding.o -std=c++17 -shared -fPIC -Wno-parentheses -DPADDLE_WITH_CUSTOM_KERNEL -DPADDLE_ON_INFERENCE -Wall -O3 -g -lstdc++fs -D_GLIBCXX_USE_CXX11_ABI=1 -DPy_LIMITED_API=0x03090000 -w -DPADDLE_WITH_CUSTOM_KERNEL -DPADDLE_EXTENSION_NAME=fastdeploy_cpu_ops -D_GLIBCXX_USE_CXX11_ABI=1
/usr/local/bin/ccache x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/local/lib/python3.9/dist-packages/paddle/include -I/usr/local/lib/python3.9/dist-packages/paddle/include/third_party -I/usr/local/lib/python3.9/dist-packages/paddle/include/paddle/phi/api/include/compat -I/usr/local/lib/python3.9/dist-packages/paddle/include/paddle/phi/api/include/compat/torch/csrc/api/include -I/usr/include/python3.9 -I/usr/include/python3.9 -c /paddle/FastDeploy/custom_ops/cpu_ops/stop_generation_multi_ends.cc -o /paddle/FastDeploy/custom_ops/build/fastdeploy_cpu_ops/lib.linux-x86_64-3.9/build/fastdeploy_cpu_ops/temp.linux-x86_64-3.9/cpu_ops/stop_generation_multi_ends.o -std=c++17 -shared -fPIC -Wno-parentheses -DPADDLE_WITH_CUSTOM_KERNEL -DPADDLE_ON_INFERENCE -Wall -O3 -g -lstdc++fs -D_GLIBCXX_USE_CXX11_ABI=1 -DPy_LIMITED_API=0x03090000 -w -DPADDLE_WITH_CUSTOM_KERNEL -DPADDLE_EXTENSION_NAME=fastdeploy_cpu_ops -D_GLIBCXX_USE_CXX11_ABI=1
/usr/local/bin/ccache x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/local/lib/python3.9/dist-packages/paddle/include -I/usr/local/lib/python3.9/dist-packages/paddle/include/third_party -I/usr/local/lib/python3.9/dist-packages/paddle/include/paddle/phi/api/include/compat -I/usr/local/lib/python3.9/dist-packages/paddle/include/paddle/phi/api/include/compat/torch/csrc/api/include -I/usr/include/python3.9 -I/usr/include/python3.9 -c /paddle/FastDeploy/custom_ops/gpu_ops/get_output.cc -o /paddle/FastDeploy/custom_ops/build/fastdeploy_cpu_ops/lib.linux-x86_64-3.9/build/fastdeploy_cpu_ops/temp.linux-x86_64-3.9/gpu_ops/get_output.o -std=c++17 -shared -fPIC -Wno-parentheses -DPADDLE_WITH_CUSTOM_KERNEL -DPADDLE_ON_INFERENCE -Wall -O3 -g -lstdc++fs -D_GLIBCXX_USE_CXX11_ABI=1 -DPy_LIMITED_API=0x03090000 -w -DPADDLE_WITH_CUSTOM_KERNEL -DPADDLE_EXTENSION_NAME=fastdeploy_cpu_ops -D_GLIBCXX_USE_CXX11_ABI=1
/usr/local/bin/ccache x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/local/lib/python3.9/dist-packages/paddle/include -I/usr/local/lib/python3.9/dist-packages/paddle/include/third_party -I/usr/local/lib/python3.9/dist-packages/paddle/include/paddle/phi/api/include/compat -I/usr/local/lib/python3.9/dist-packages/paddle/include/paddle/phi/api/include/compat/torch/csrc/api/include -I/usr/include/python3.9 -I/usr/include/python3.9 -c /paddle/FastDeploy/custom_ops/gpu_ops/get_output_msg_with_topk.cc -o /paddle/FastDeploy/custom_ops/build/fastdeploy_cpu_ops/lib.linux-x86_64-3.9/build/fastdeploy_cpu_ops/temp.linux-x86_64-3.9/gpu_ops/get_output_msg_with_topk.o -std=c++17 -shared -fPIC -Wno-parentheses -DPADDLE_WITH_CUSTOM_KERNEL -DPADDLE_ON_INFERENCE -Wall -O3 -g -lstdc++fs -D_GLIBCXX_USE_CXX11_ABI=1 -DPy_LIMITED_API=0x03090000 -w -DPADDLE_WITH_CUSTOM_KERNEL -DPADDLE_EXTENSION_NAME=fastdeploy_cpu_ops -D_GLIBCXX_USE_CXX11_ABI=1
/usr/local/bin/ccache x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/local/lib/python3.9/dist-packages/paddle/include -I/usr/local/lib/python3.9/dist-packages/paddle/include/third_party -I/usr/local/lib/python3.9/dist-packages/paddle/include/paddle/phi/api/include/compat -I/usr/local/lib/python3.9/dist-packages/paddle/include/paddle/phi/api/include/compat/torch/csrc/api/include -I/usr/include/python3.9 -I/usr/include/python3.9 -c /paddle/FastDeploy/custom_ops/gpu_ops/save_output_msg_with_topk.cc -o /paddle/FastDeploy/custom_ops/build/fastdeploy_cpu_ops/lib.linux-x86_64-3.9/build/fastdeploy_cpu_ops/temp.linux-x86_64-3.9/gpu_ops/save_output_msg_with_topk.o -std=c++17 -shared -fPIC -Wno-parentheses -DPADDLE_WITH_CUSTOM_KERNEL -DPADDLE_ON_INFERENCE -Wall -O3 -g -lstdc++fs -D_GLIBCXX_USE_CXX11_ABI=1 -DPy_LIMITED_API=0x03090000 -w -DPADDLE_WITH_CUSTOM_KERNEL -DPADDLE_EXTENSION_NAME=fastdeploy_cpu_ops -D_GLIBCXX_USE_CXX11_ABI=1
/usr/local/bin/ccache x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/local/lib/python3.9/dist-packages/paddle/include -I/usr/local/lib/python3.9/dist-packages/paddle/include/third_party -I/usr/local/lib/python3.9/dist-packages/paddle/include/paddle/phi/api/include/compat -I/usr/local/lib/python3.9/dist-packages/paddle/include/paddle/phi/api/include/compat/torch/csrc/api/include -I/usr/include/python3.9 -I/usr/include/python3.9 -c /paddle/FastDeploy/custom_ops/gpu_ops/save_with_output_msg.cc -o /paddle/FastDeploy/custom_ops/build/fastdeploy_cpu_ops/lib.linux-x86_64-3.9/build/fastdeploy_cpu_ops/temp.linux-x86_64-3.9/gpu_ops/save_with_output_msg.o -std=c++17 -shared -fPIC -Wno-parentheses -DPADDLE_WITH_CUSTOM_KERNEL -DPADDLE_ON_INFERENCE -Wall -O3 -g -lstdc++fs -D_GLIBCXX_USE_CXX11_ABI=1 -DPy_LIMITED_API=0x03090000 -w -DPADDLE_WITH_CUSTOM_KERNEL -DPADDLE_EXTENSION_NAME=fastdeploy_cpu_ops -D_GLIBCXX_USE_CXX11_ABI=1
/usr/local/bin/ccache x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/local/lib/python3.9/dist-packages/paddle/include -I/usr/local/lib/python3.9/dist-packages/paddle/include/third_party -I/usr/local/lib/python3.9/dist-packages/paddle/include/paddle/phi/api/include/compat -I/usr/local/lib/python3.9/dist-packages/paddle/include/paddle/phi/api/include/compat/torch/csrc/api/include -I/usr/include/python3.9 -I/usr/include/python3.9 -c /paddle/FastDeploy/custom_ops/gpu_ops/transfer_output.cc -o /paddle/FastDeploy/custom_ops/build/fastdeploy_cpu_ops/lib.linux-x86_64-3.9/build/fastdeploy_cpu_ops/temp.linux-x86_64-3.9/gpu_ops/transfer_output.o -std=c++17 -shared -fPIC -Wno-parentheses -DPADDLE_WITH_CUSTOM_KERNEL -DPADDLE_ON_INFERENCE -Wall -O3 -g -lstdc++fs -D_GLIBCXX_USE_CXX11_ABI=1 -DPy_LIMITED_API=0x03090000 -w -DPADDLE_WITH_CUSTOM_KERNEL -DPADDLE_EXTENSION_NAME=fastdeploy_cpu_ops -D_GLIBCXX_USE_CXX11_ABI=1
/paddle/FastDeploy/custom_ops/build/fastdeploy_cpu_ops/lib.linux-x86_64-3.9/build/fastdeploy_cpu_ops/temp.linux-x86_64-3.9/cpu_ops/stop_generation_multi_ends.o is compiled
/paddle/FastDeploy/custom_ops/build/fastdeploy_cpu_ops/lib.linux-x86_64-3.9/build/fastdeploy_cpu_ops/temp.linux-x86_64-3.9/gpu_ops/get_output.o is compiled
/paddle/FastDeploy/custom_ops/build/fastdeploy_cpu_ops/lib.linux-x86_64-3.9/build/fastdeploy_cpu_ops/temp.linux-x86_64-3.9/gpu_ops/save_output_msg_with_topk.o is compiled
/paddle/FastDeploy/custom_ops/build/fastdeploy_cpu_ops/lib.linux-x86_64-3.9/build/fastdeploy_cpu_ops/temp.linux-x86_64-3.9/gpu_ops/get_output_msg_with_topk.o is compiled
/paddle/FastDeploy/custom_ops/build/fastdeploy_cpu_ops/lib.linux-x86_64-3.9/build/fastdeploy_cpu_ops/temp.linux-x86_64-3.9/gpu_ops/save_with_output_msg.o is compiled
/paddle/FastDeploy/custom_ops/build/fastdeploy_cpu_ops/lib.linux-x86_64-3.9/build/fastdeploy_cpu_ops/temp.linux-x86_64-3.9/gpu_ops/transfer_output.o is compiled
/paddle/FastDeploy/custom_ops/build/fastdeploy_cpu_ops/lib.linux-x86_64-3.9/build/fastdeploy_cpu_ops/temp.linux-x86_64-3.9/cpu_ops/rebuild_padding.o is compiled
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -g -fwrapv -O2 -Wl,-Bsymbolic-functions -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 /paddle/FastDeploy/custom_ops/build/fastdeploy_cpu_ops/lib.linux-x86_64-3.9/build/fastdeploy_cpu_ops/temp.linux-x86_64-3.9/cpu_ops/rebuild_padding.o /paddle/FastDeploy/custom_ops/build/fastdeploy_cpu_ops/lib.linux-x86_64-3.9/build/fastdeploy_cpu_ops/temp.linux-x86_64-3.9/cpu_ops/stop_generation_multi_ends.o /paddle/FastDeploy/custom_ops/build/fastdeploy_cpu_ops/lib.linux-x86_64-3.9/build/fastdeploy_cpu_ops/temp.linux-x86_64-3.9/gpu_ops/get_output.o /paddle/FastDeploy/custom_ops/build/fastdeploy_cpu_ops/lib.linux-x86_64-3.9/build/fastdeploy_cpu_ops/temp.linux-x86_64-3.9/gpu_ops/get_output_msg_with_topk.o /paddle/FastDeploy/custom_ops/build/fastdeploy_cpu_ops/lib.linux-x86_64-3.9/build/fastdeploy_cpu_ops/temp.linux-x86_64-3.9/gpu_ops/save_output_msg_with_topk.o /paddle/FastDeploy/custom_ops/build/fastdeploy_cpu_ops/lib.linux-x86_64-3.9/build/fastdeploy_cpu_ops/temp.linux-x86_64-3.9/gpu_ops/save_with_output_msg.o /paddle/FastDeploy/custom_ops/build/fastdeploy_cpu_ops/lib.linux-x86_64-3.9/build/fastdeploy_cpu_ops/temp.linux-x86_64-3.9/gpu_ops/transfer_output.o -L/usr/local/lib/python3.9/dist-packages/paddle/libs -L/usr/local/lib/python3.9/dist-packages/paddle/base -Wl,--enable-new-dtags,-R/usr/local/lib/python3.9/dist-packages/paddle/libs -Wl,--enable-new-dtags,-R/usr/local/lib/python3.9/dist-packages/paddle/base -o build/fastdeploy_cpu_ops/lib.linux-x86_64-3.9/fastdeploy_cpu_ops.so -Wl,-rpath,$ORIGIN/x86-simd-sort/builddir -Wl,-rpath,$ORIGIN/xFasterTransformer/build -l:libpaddle.so
Received len(custom_op) = 9, using custom operator
Removed: build/fastdeploy_cpu_ops/lib.linux-x86_64-3.9/build/fastdeploy_cpu_ops/temp.linux-x86_64-3.9/gpu_ops/get_output.o
Removed: build/fastdeploy_cpu_ops/lib.linux-x86_64-3.9/build/fastdeploy_cpu_ops/temp.linux-x86_64-3.9/gpu_ops/save_with_output_msg.o
Removed: build/fastdeploy_cpu_ops/lib.linux-x86_64-3.9/build/fastdeploy_cpu_ops/temp.linux-x86_64-3.9/gpu_ops/save_output_msg_with_topk.o
Removed: build/fastdeploy_cpu_ops/lib.linux-x86_64-3.9/build/fastdeploy_cpu_ops/temp.linux-x86_64-3.9/gpu_ops/transfer_output.o
Removed: build/fastdeploy_cpu_ops/lib.linux-x86_64-3.9/build/fastdeploy_cpu_ops/temp.linux-x86_64-3.9/gpu_ops/get_output_msg_with_topk.o
Removed: build/fastdeploy_cpu_ops/lib.linux-x86_64-3.9/build/fastdeploy_cpu_ops/temp.linux-x86_64-3.9/cpu_ops/rebuild_padding.o
Removed: build/fastdeploy_cpu_ops/lib.linux-x86_64-3.9/build/fastdeploy_cpu_ops/temp.linux-x86_64-3.9/cpu_ops/stop_generation_multi_ends.o
running install_lib
copying build/fastdeploy_cpu_ops/lib.linux-x86_64-3.9/fastdeploy_cpu_ops.py -> /paddle/Paddle/build/tmp_setuptools/tmp_install
creating /paddle/Paddle/build/tmp_setuptools/tmp_install/build
creating /paddle/Paddle/build/tmp_setuptools/tmp_install/build/fastdeploy_cpu_ops
creating /paddle/Paddle/build/tmp_setuptools/tmp_install/build/fastdeploy_cpu_ops/temp.linux-x86_64-3.9
creating /paddle/Paddle/build/tmp_setuptools/tmp_install/build/fastdeploy_cpu_ops/temp.linux-x86_64-3.9/gpu_ops
creating /paddle/Paddle/build/tmp_setuptools/tmp_install/build/fastdeploy_cpu_ops/temp.linux-x86_64-3.9/cpu_ops
copying build/fastdeploy_cpu_ops/lib.linux-x86_64-3.9/fastdeploy_cpu_ops.so -> /paddle/Paddle/build/tmp_setuptools/tmp_install
copying build/fastdeploy_cpu_ops/lib.linux-x86_64-3.9/version.txt -> /paddle/Paddle/build/tmp_setuptools/tmp_install
byte-compiling /paddle/Paddle/build/tmp_setuptools/tmp_install/fastdeploy_cpu_ops.py to fastdeploy_cpu_ops.cpython-39.pyc
running install_egg_info
Copying third_party/fastdeploy_cpu_ops.egg-info to /paddle/Paddle/build/tmp_setuptools/tmp_install/fastdeploy_cpu_ops-0.0.0-py3.9.egg-info
running install_scripts

此时,tmp_install 目录中已经安装好算子,并可以加载:

# 这是安装的目录tmp_install git:(setuptools80) ✗ l
total 28K
drwxr-xr-x 6 root root 4.0K Nov 12 06:27 .
drwxr-xr-x 3 1000 1000 4.0K Nov 12 05:29 ..
drwxr-xr-x 2 root root 4.0K Nov 12 06:27 __pycache__
drwxr-xr-x 3 root root 4.0K Nov 12 06:27 build
drwxr-xr-x 3 root root 4.0K Nov 12 06:27 fastdeploy_cpu_ops
drwxr-xr-x 2 root root 4.0K Nov 12 06:27 fastdeploy_cpu_ops-0.0.0-py3.9.egg-info
-rw-r--r-- 1 root root 1.7K Nov 12 06:27 version.txt

# 删除掉其他的目录,模仿 cp fastdeploy_cpu_ops 目录的效果tmp_install git:(setuptools80) ✗ rm -rf build fastdeploy_cpu_ops-0.0.0-py3.9.egg-infotmp_install git:(setuptools80) ✗ l
total 20K
drwxr-xr-x 4 root root 4.0K Nov 12 06:27 .
drwxr-xr-x 3 1000 1000 4.0K Nov 12 05:29 ..
drwxr-xr-x 2 root root 4.0K Nov 12 06:27 __pycache__
drwxr-xr-x 3 root root 4.0K Nov 12 06:27 fastdeploy_cpu_ops
-rw-r--r-- 1 root root 1.7K Nov 12 06:27 version.txt

# 此时,在其他目录无法使用此算子tmp_install git:(setuptools80) ✗ cd ..
➜  tmp_setuptools git:(setuptools80) ✗ ipython
Python 3.9.18 (main, Aug 25 2023, 13:20:04) 
Type 'copyright', 'credits' or 'license' for more information
IPython 8.18.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import fastdeploy_cpu_ops
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 1
----> 1 import fastdeploy_cpu_ops

ModuleNotFoundError: No module named 'fastdeploy_cpu_ops'

In [2]: exit

# 在安装的目录可以正常加载算子tmp_setuptools git:(setuptools80) ✗ cd tmp_installtmp_install git:(setuptools80) ✗ l
total 20K
drwxr-xr-x 4 root root 4.0K Nov 12 06:27 .
drwxr-xr-x 3 1000 1000 4.0K Nov 12 05:29 ..
drwxr-xr-x 2 root root 4.0K Nov 12 06:27 __pycache__
drwxr-xr-x 3 root root 4.0K Nov 12 06:27 fastdeploy_cpu_ops
-rw-r--r-- 1 root root 1.7K Nov 12 06:27 version.txttmp_install git:(setuptools80) ✗ ipython
Python 3.9.18 (main, Aug 25 2023, 13:20:04) 
Type 'copyright', 'credits' or 'license' for more information
IPython 8.18.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import fastdeploy_cpu_ops

In [2]: fastdeploy_cpu_ops.static_op_get_output_dynamic
Out[2]: <function fastdeploy_cpu_ops.static_op_get_output_dynamic(x, rank_id, wait_flag, msg_queue_id)>

In [3]: pip show setuptools
Name: setuptools
Version: 57.1.0
Summary: Easily download, build, install, upgrade, and uninstall Python packages
Home-page: https://github.com/pypa/setuptools
Author: Python Packaging Authority
Author-email: distutils-sig@python.org
License: UNKNOWN
Location: /usr/local/lib/python3.9/dist-packages
Requires: 
Required-by: astroid, nodeenv, wandb
Note: you may need to restart the kernel to use updated packages.

In [4]: 

setuptools 的版本为 80.9.0 也同样的效果。

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@SigureMo

@paddle-bot
Copy link

paddle-bot bot commented Nov 12, 2025

Thanks for your contribution!

@SigureMo
Copy link
Member

只测试 cpu 算子(gpu 编译不了,cuda 版本太低了 ... ...)

我今天在 FD 里测了下 GPU OP 还是有问题,现在需要在 FD 里测些东西,等测完后我找时间看看是什么问题

Copy link
Member

@SigureMo SigureMo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我重新测了下不用 deep_gemm 的话应该没大问题~不过得在 .gitignore 里添加一下 **/fastdeploy_ops/__init__.py 以确保新的生成的 stub 文件被 ignore 掉

这里总结一下

之前 tmp 生成的目录结构如下:

custom_ops/tmp
├── fastdeploy_ops-0.0.0-py3.10-linux-x86_64.egg
│   ├── deep_gemm    # 注意这有 deep_gemm
│   │   └── ...
│   ├── EGG-INFO
│   │   ├── dependency_links.txt
│   │   ├── native_libs.txt
│   │   ├── not-zip-safe
│   │   ├── PKG-INFO
│   │   ├── SOURCES.txt
│   │   └── top_level.txt
│   ├── fastdeploy_ops_pd_.so
│   ├── fastdeploy_ops.py
│   └── version.txt
└── version.txt

copy 到 fastdeploy/model_executor/ops/gpu 结构如下:

fastdeploy/model_executor/ops/gpu
├── deep_gemm
│   └── ...
├── EGG-INFO
│   ├── dependency_links.txt
│   ├── native_libs.txt
│   ├── not-zip-safe
│   ├── PKG-INFO
│   ├── SOURCES.txt
│   └── top_level.txt
├── fastdeploy_ops_pd_.so
├── fastdeploy_ops.py
├── __init__.py
└── version.txt

fastdeploy_ops 模块的入口是 fastdeploy_ops.py

而本 PR 结合 PaddlePaddle/Paddle#76008 修改之后的现代化目录结构如下

tmp 目录如下

custom_ops/tmp
├── deep_gemm
│   └── ...
├── fastdeploy_ops
│   ├── fastdeploy_ops_pd_.so
│   └── __init__.py
├── fastdeploy_ops-0.0.0-py3.10.egg-info
│   ├── dependency_links.txt
│   ├── not-zip-safe
│   ├── PKG-INFO
│   ├── SOURCES.txt
│   └── top_level.txt
├── fastdeploy_ops-0.0.0-py3.10-linux-x86_64.egg  # 这个目录是新增 shell 脚本里 copy 过来的,为了和之前兼容,上面两个目录才是原生生成的现代化目录结构
│   ├── deep_gemm  # 这里需要把 deep_gemm 也 copy 过来,现在还没实现
│   │   └── ...
│   └── fastdeploy_ops
│       ├── fastdeploy_ops_pd_.so
│       └── __init__.py
└── version.txt

copy 到 fastdeploy/model_executor/ops/gpu 结构如下:

fastdeploy/model_executor/ops/gpu
├── deep_gemm  # 同上,deep_gemm 还没 copy 过来
│   └── ...
├── fastdeploy_ops
│   ├── fastdeploy_ops_pd_.so
│   └── __init__.py
└── __init__.py

另外还有一个思路,在检测到 modern dir 时直接让 WHEEL_NAME="",这样 copy 时候也是没啥问题的(毕竟之前的目录结构就相当于比新的目录结构多了一层 WHEEL_NAME

    if [ -d "./${OPS_TMP_DIR}/${WHEEL_MODERN_NAME}" ]; then
        WHEEL_NAME=""
    fi

此时的 fastdeploy_ops 模块的入口则变成了 fastdeploy_ops 目录,新增的 __init__.py 是新的 stub 文件,需要 ignore 下

解决上述问题后我觉得就没啥问题了

build.sh Outdated
WHEEL_MODERN_CPU_NAME="fastdeploy_cpu_ops"

# Handle GPU ops directories (WHEEL_NAME and WHEEL_MODERN_NAME)
if [ -d "./${OPS_TMP_DIR}/${WHEEL_NAME}" ]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是否可以修改为判断是否存在 WHEEL_MODERN_NAME?因为判断 WHEEL_NAME 的话,一旦第二次执行是没有清理 tmp 目录的(我记得有选项,而且调试时候可能注释掉清理那一行),就可能走到旧的逻辑,进而导致装的是旧的包

我之前测的时候可能是命中了这个逻辑误以为没有生效

mkdir -p "./${OPS_TMP_DIR}/${WHEEL_CPU_NAME}"
cp -r "./${OPS_TMP_DIR}/${WHEEL_MODERN_CPU_NAME}" "./${OPS_TMP_DIR}/${WHEEL_CPU_NAME}/"
fi
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

另外需要注意下 gpu ops 不仅仅包含 fastdeploy_ops,还包含 deep_gemm

setup(
name="fastdeploy_ops",
ext_modules=CUDAExtension(
sources=sources,
extra_compile_args={"cxx": cc_compile_args, "nvcc": nvcc_compile_args},
libraries=["cublasLt"],
extra_link_args=["-lcuda", "-lnvidia-ml"],
),
packages=find_packages(where="third_party/DeepGEMM"),
package_dir={"": "third_party/DeepGEMM"},
package_data={
"deep_gemm": [
"include/deep_gemm/**/*",
"include/cute/**/*",
"include/cutlass/**/*",
]
},
include_package_data=True,
)

因此 deep_gemm 目录也需要 copy 过去,详情可以看整体的 review

@megemini
Copy link
Contributor Author

megemini commented Nov 16, 2025

Update 20251116

另外还有一个思路,在检测到 modern dir 时直接让 WHEEL_NAME="",这样 copy 时候也是没啥问题的(毕竟之前的目录结构就相当于比新的目录结构多了一层 WHEEL_NAME)

应该可以~ 只是有个小问题,gpu (或其他) 和 cpu 应该不会同时编译到 tmp 目录吧?话说,如果不会的话,为啥要区分 WHEEL_NAME 和 WHEEL_CPU_NAME ?

p.s. fd 的 ci 是不是有点问题? #4998 是想用来 debug 的,结果 ci 不动 ... ...

@SigureMo
Copy link
Member

SigureMo commented Nov 16, 2025

应该可以~ 只是有个小问题,gpu (或其他) 和 cpu 应该不会同时编译到 tmp 目录吧?话说,如果不会的话,为啥要区分 WHEEL_NAME 和 WHEEL_CPU_NAME ?

这个我也不清楚 😂

p.s. fd 的 ci 是不是有点问题? #4998 是想用来 debug 的,结果 ci 不动 ... ...

第一次提 PR 需要有 write 权限的人 approve 才会跑,这边我也没 write 权限,周一我找下其他同学

Copy link
Member

@SigureMo SigureMo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTMeow 🐾

@luotao1 luotao1 merged commit c35e540 into PaddlePaddle:develop Nov 17, 2025
15 of 16 checks passed
@luotao1 luotao1 changed the title 【Hackathon 9th No.109】[CppExtension] Support build Custom OP in setuptools 80+ 【Hackathon 9th No.109】[CppExtension] Support build Custom OP in setuptools 80+ -part Nov 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants