From 7f802bbf7e4b623d3f5b93287cbb2b5aa0d03774 Mon Sep 17 00:00:00 2001 From: Jinzhe Zeng Date: Thu, 25 May 2023 18:50:45 -0400 Subject: [PATCH 1/2] improve docs and scripts to install TF 2.12 In TF 2.11, TensorFlow adds a new directory "tensorflow/tsl" where the headers should be copied. 1. add docs to install TF 2.12 2. build_tf.py: bump to TF 2.12, and copy tensorflow/tsl 3. output the detailed errors when checking ABIs. Sometimes the reason is missing header files instead of ABI. Signed-off-by: Jinzhe Zeng --- doc/install/install-from-source.md | 2 +- doc/install/install-tf.2.12.md | 118 +++++++++++++++++++++++++++++ source/cmake/Findtensorflow.cmake | 7 +- source/install/build_tf.py | 15 +++- 4 files changed, 139 insertions(+), 3 deletions(-) create mode 100644 doc/install/install-tf.2.12.md diff --git a/doc/install/install-from-source.md b/doc/install/install-from-source.md index 372de8225b..5d3a434f0a 100644 --- a/doc/install/install-from-source.md +++ b/doc/install/install-from-source.md @@ -156,7 +156,7 @@ Since TensorFlow 2.12, TensorFlow C++ library (`libtensorflow_cc`) is packaged i The C++ interface of DeePMD-kit was tested with compiler GCC >= 4.8. It is noticed that the I-Pi support is only compiled with GCC >= 4.8. Note that TensorFlow may have specific requirements for the compiler version. -First, the C++ interface of Tensorflow should be installed. It is noted that the version of Tensorflow should be consistent with the python interface. You may follow [the instruction](install-tf.2.8.md) or run the script `$deepmd_source_dir/source/install/build_tf.py` to install the corresponding C++ interface. +First, the C++ interface of Tensorflow should be installed. It is noted that the version of Tensorflow should be consistent with the python interface. You may follow [the instruction](install-tf.2.12.md) or run the script `$deepmd_source_dir/source/install/build_tf.py` to install the corresponding C++ interface. ### Install DeePMD-kit's C++ interface diff --git a/doc/install/install-tf.2.12.md b/doc/install/install-tf.2.12.md new file mode 100644 index 0000000000..dce0c224d5 --- /dev/null +++ b/doc/install/install-tf.2.12.md @@ -0,0 +1,118 @@ +# Install TensorFlow's C++ interface +TensorFlow's C++ interface will be compiled from the source code. In this manual, we install TensorFlow 2.12.0. It is noted that the source code of TensorFlow 2.12.0 uses C++ 17, so one needs a C++ compiler that supports C++ 17. + +Firstly one installs Bazel. [bazelisk](https://github.com/bazelbuild/bazelisk) can be lanuched to use [bazel](https://github.com/bazelbuild/bazel). + +```bash +wget https://github.com/bazelbuild/bazelisk/releases/download/v1.11.0/bazelisk-linux-amd64 -O /some/workspace/bazel/bin/bazel +chmod +x /some/workspace/bazel/bin/bazel +export PATH=/some/workspace/bazel/bin:$PATH +``` + +Firstly get the source code of the TensorFlow +```bash +git clone https://github.com/tensorflow/tensorflow tensorflow -b v2.12.0 --depth=1 +cd tensorflow +./configure +``` + +You will answer a list of questions that help configure the building of TensorFlow. You may want to answer the question like the following. If you do not want to add CUDA support, please answer no. + +``` +Please specify the location of python. [Default is xxx]: + +Found possible Python library paths: + xxx +Please input the desired Python library path to use. Default is [xxx] + +Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: +No OpenCL SYCL support will be enabled for TensorFlow. + +Do you wish to build TensorFlow with ROCm support? [y/N]: +No ROCm support will be enabled for TensorFlow. + +Do you wish to build TensorFlow with CUDA support? [y/N]: y +CUDA support will be enabled for TensorFlow. + +Do you wish to build TensorFlow with TensorRT support? [y/N]: +No TensorRT support will be enabled for TensorFlow. + +Found CUDA 10.2 in: + /usr/local/cuda/lib64 + /usr/local/cuda/include +Found cuDNN 7 in: + /usr/local/cuda/lib64 + /usr/local/cuda/include + +Please specify a list of comma-separated CUDA compute capabilities you want to build with. +You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. +Please note that each additional compute capability significantly increases your build time and binary size, and that TensorFlow only supports compute capabilities >= 3.5 [Default is: 7.5,7.5]: + +Do you want to use clang as CUDA compiler? [y/N]: +nvcc will be used as CUDA compiler. + +Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: + +Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]: + +Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: +Not configuring the WORKSPACE for Android builds. + +Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details. + --config=mkl # Build with MKL support. + --config=monolithic # Config for mostly static monolithic build. + --config=ngraph # Build with Intel nGraph support. + --config=numa # Build with NUMA support. + --config=dynamic_kernels # (Experimental) Build kernels into separate shared objects. + --config=v2 # Build TensorFlow 2.x instead of 1.x. +Preconfigured Bazel build configs to DISABLE default on features: + --config=noaws # Disable AWS S3 filesystem support. + --config=nogcp # Disable GCP support. + --config=nohdfs # Disable HDFS support. + --config=nonccl # Disable NVIDIA NCCL support. +Configuration finished +``` + +The library path for Python should be set accordingly. + +Now build the shared library of TensorFlow: +```bash +bazel build -c opt --verbose_failures //tensorflow:libtensorflow_cc.so +``` +You may want to add options `--copt=-msse4.2`, `--copt=-mavx`, `--copt=-mavx2` and `--copt=-mfma` to enable SSE4.2, AVX, AVX2 and FMA SIMD accelerations, respectively. It is noted that these options should be chosen according to the CPU architecture. If the RAM becomes an issue for your machine, you may limit the RAM usage by using `--local_resources 2048,.5,1.0`. If you want to enable [oneDNN optimization](https://www.oneapi.io/blog/tensorflow-and-onednn-in-partnership/), add `--config=mkl`. + +Now I assume you want to install TensorFlow in directory `$tensorflow_root`. Create the directory if it does not exist +```bash +mkdir -p $tensorflow_root +``` +Now, copy the libraries to the TensorFlow's installation directory: +```bash +mkdir -p $tensorflow_root/lib +cp -d bazel-bin/tensorflow/libtensorflow_cc.so* $tensorflow_root/lib/ +cp -d bazel-bin/tensorflow/libtensorflow_framework.so* $tensorflow_root/lib/ +cp -d $tensorflow_root/lib/libtensorflow_framework.so.2 $tensorflow_root/lib/libtensorflow_framework.so +``` +Then copy the headers +```bash +mkdir -p $tensorflow_root/include/tensorflow +rsync -avzh --exclude '_virtual_includes/' --include '*/' --include '*.h' --include '*.inc' --exclude '*' bazel-bin/ $tensorflow_root/include/ +rsync -avzh --include '*/' --include '*.h' --include '*.inc' --exclude '*' tensorflow/cc $tensorflow_root/include/tensorflow/ +rsync -avzh --include '*/' --include '*.h' --include '*.inc' --exclude '*' tensorflow/core $tensorflow_root/include/tensorflow/ +rsync -avzh --include '*/' --include '*.h' --include '*.inc' --exclude '*' tensorflow/tsl $tensorflow_root/include/tensorflow/ +rsync -avzh --include '*/' --include '*' --exclude '*.cc' third_party/ $tensorflow_root/include/third_party/ +rsync -avzh --include '*/' --include '*' --exclude '*.txt' bazel-tensorflow/external/eigen_archive/Eigen/ $tensorflow_root/include/Eigen/ +rsync -avzh --include '*/' --include '*' --exclude '*.txt' bazel-tensorflow/external/eigen_archive/unsupported/ $tensorflow_root/include/unsupported/ +rsync -avzh --include '*/' --include '*.h' --include '*.inc' --exclude '*' bazel-tensorflow/external/com_google_protobuf/src/google/ $tensorflow_root/include/google/ +rsync -avzh --include '*/' --include '*.h' --include '*.inc' --exclude '*' bazel-tensorflow/external/com_google_absl/absl/ $tensorflow_root/include/absl/ +``` + +If you've enabled oneDNN, also copy `libiomp5.so`: +```bash +cp -d bazel-out/k8-opt/bin/external/llvm_openmp/libiomp5.so $tensorflow_root/lib/ +``` + +# Troubleshooting +```bash +git: unknown command -C ... +``` +This may be an issue with your Git version issue. Early versions of Git do not support this command, in this case upgrading your Git to a newer version may resolve any issues. diff --git a/source/cmake/Findtensorflow.cmake b/source/cmake/Findtensorflow.cmake index f9ba0184a6..584cc7f77b 100644 --- a/source/cmake/Findtensorflow.cmake +++ b/source/cmake/Findtensorflow.cmake @@ -337,12 +337,14 @@ elseif(NOT DEFINED OP_CXX_ABI) try_compile( CPP_CXX_ABI_COMPILE_RESULT_VAR0 ${CMAKE_CURRENT_BINARY_DIR}/tf_cxx_abi0 "${CMAKE_CURRENT_LIST_DIR}/test_cxx_abi.cpp" + OUTPUT_VARIABLE CPP_CXX_ABI_COMPILE_OUTPUT_VAR0 LINK_LIBRARIES ${TensorFlowFramework_LIBRARY} CMAKE_FLAGS "-DINCLUDE_DIRECTORIES:STRING=${TensorFlow_INCLUDE_DIRS}" COMPILE_DEFINITIONS -D_GLIBCXX_USE_CXX11_ABI=0) try_compile( CPP_CXX_ABI_COMPILE_RESULT_VAR1 ${CMAKE_CURRENT_BINARY_DIR}/tf_cxx_abi1 "${CMAKE_CURRENT_LIST_DIR}/test_cxx_abi.cpp" + OUTPUT_VARIABLE CPP_CXX_ABI_COMPILE_OUTPUT_VAR1 LINK_LIBRARIES ${TensorFlowFramework_LIBRARY} CMAKE_FLAGS "-DINCLUDE_DIRECTORIES:STRING=${TensorFlow_INCLUDE_DIRS}" COMPILE_DEFINITIONS -D_GLIBCXX_USE_CXX11_ABI=1) @@ -360,9 +362,12 @@ elseif(NOT DEFINED OP_CXX_ABI) ) set(OP_CXX_ABI 1) else() + # print results of try_compile + message(WARNING "Output with _GLIBCXX_USE_CXX11_ABI=0:" ${CPP_CXX_ABI_COMPILE_OUTPUT_VAR0}) + message(WARNING "Output with _GLIBCXX_USE_CXX11_ABI=1:" ${CPP_CXX_ABI_COMPILE_OUTPUT_VAR1}) message( FATAL_ERROR - "Both _GLIBCXX_USE_CXX11_ABI=0 and 1 do not work. The reason may be that your C++ compiler (e.g. Red Hat Developer Toolset) does not support the custom cxx11 abi flag." + "Both _GLIBCXX_USE_CXX11_ABI=0 and 1 do not work. The reason may be that your C++ compiler (e.g. Red Hat Developer Toolset) does not support the custom cxx11 abi flag. Please check the above outputs." ) endif() else() diff --git a/source/install/build_tf.py b/source/install/build_tf.py index 883c654b5c..de5e71a3c7 100755 --- a/source/install/build_tf.py +++ b/source/install/build_tf.py @@ -410,6 +410,13 @@ def call(commands: List[str], env={}, **kwargs): "b5a1bb04c84b6fe1538377e5a1f649bb5d5f0b2e3625a3c526ff3a8af88633e8", gzip="tensorflow", ), + "tensorflow-2.12.0": OnlineResource( + "tensorflow-2.12.0.tar.gz", + "https://github.com/tensorflow/tensorflow/archive/refs/tags/v2.12.0.tar.gz", + "c030cb1905bff1d2446615992aad8d8d85cbe90c4fb625cee458c63bf466bc8e", + gzip="tensorflow", + ), + } @@ -583,7 +590,7 @@ class BuildTensorFlow(Build): def __init__( self, - version: str = "2.9.1", + version: str = "2.12.0", enable_mkl: bool = True, enable_cuda: bool = False, enable_rocm: bool = False, @@ -666,6 +673,12 @@ def build(self): include_dst / "tensorflow" / "core", ignore=include_patterns("*.h", "*.inc"), ) + if tuple([int(x) for x in self.version.split(".")[:2]]) >= (2, 11): + copytree2( + src / "tensorflow" / "tsl", + include_dst / "tensorflow" / "core", + ignore=include_patterns("*.h", "*.inc"), + ) # bazel-bin includes generated headers like version, pb.h, .. copytree2( src / "bazel-bin", include_dst, ignore=include_patterns("*.h", "*.inc") From 909e23db7ea3b03737469175248c3c66108c4f83 Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Thu, 25 May 2023 22:52:04 +0000 Subject: [PATCH 2/2] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- source/cmake/Findtensorflow.cmake | 6 ++++-- source/install/build_tf.py | 1 - 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/source/cmake/Findtensorflow.cmake b/source/cmake/Findtensorflow.cmake index 584cc7f77b..b39f6ab215 100644 --- a/source/cmake/Findtensorflow.cmake +++ b/source/cmake/Findtensorflow.cmake @@ -363,8 +363,10 @@ elseif(NOT DEFINED OP_CXX_ABI) set(OP_CXX_ABI 1) else() # print results of try_compile - message(WARNING "Output with _GLIBCXX_USE_CXX11_ABI=0:" ${CPP_CXX_ABI_COMPILE_OUTPUT_VAR0}) - message(WARNING "Output with _GLIBCXX_USE_CXX11_ABI=1:" ${CPP_CXX_ABI_COMPILE_OUTPUT_VAR1}) + message(WARNING "Output with _GLIBCXX_USE_CXX11_ABI=0:" + ${CPP_CXX_ABI_COMPILE_OUTPUT_VAR0}) + message(WARNING "Output with _GLIBCXX_USE_CXX11_ABI=1:" + ${CPP_CXX_ABI_COMPILE_OUTPUT_VAR1}) message( FATAL_ERROR "Both _GLIBCXX_USE_CXX11_ABI=0 and 1 do not work. The reason may be that your C++ compiler (e.g. Red Hat Developer Toolset) does not support the custom cxx11 abi flag. Please check the above outputs." diff --git a/source/install/build_tf.py b/source/install/build_tf.py index de5e71a3c7..7dfa8f4ffb 100755 --- a/source/install/build_tf.py +++ b/source/install/build_tf.py @@ -416,7 +416,6 @@ def call(commands: List[str], env={}, **kwargs): "c030cb1905bff1d2446615992aad8d8d85cbe90c4fb625cee458c63bf466bc8e", gzip="tensorflow", ), - }