-
Notifications
You must be signed in to change notification settings - Fork 593
WIP: add bolt backend in gluten #11261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
WangGuangxin
wants to merge
34
commits into
apache:main
Choose a base branch
from
WangGuangxin:add_bolt_backend
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
34 commits
Select commit
Hold shift + click to select a range
06b80ce
add bolt backend in gluten
0fd3370
Add parent project information to pom.xml
taiyang-li 92119e6
test commit
taiyang-li 8ef118c
fix style
taiyang-li 1ad78da
Refine build script
kexianda 0fa674f
register SparkExprToSubfieldFilterParser
guhaiyan0221 e74e874
fix style
guhaiyan0221 762b340
Add Dockerfile for bolt backend
kexianda 03ce1b8
Refine Makefile
kexianda 4e8e1cb
Optimize out-of-the-box parameters
guhaiyan0221 3aaa02c
overwrite batchsize default value
WangGuangxin d4ee706
add make arrow instruction
taiyang-li c37a0e6
add docker instructions
taiyang-li f7c7d30
fix S3 compile error
guhaiyan0221 bfb8fb1
add Bolt.md
guhaiyan0221 0988a35
add bolt-spark-configuration.md
guhaiyan0221 1870bcf
add BoltStageResourceAdj.md
guhaiyan0221 67ee1d9
bolt-backend-generator-function-support.md
guhaiyan0221 315ff2b
add aggregate-function/scalar-function/window-function/write-configur…
guhaiyan0221 4eacd71
add bolt-function-development-guide.md BoltFileSystem.md BoltLocalCac…
guhaiyan0221 5f0d7f6
add velox-to-bolt-migration-guide.md
guhaiyan0221 407f7be
add bolt-quick-start.md
guhaiyan0221 cf64eb3
align gtest version wiht bolt
kexianda 6f75c3d
Remove outputType init logic
WangGuangxin f021682
fix: symbols conflicts with other JNI libraries
kexianda 68f26b0
fix: avoid flatten twice in write
fzhedu 847a271
fix compilation error caused by renaming RegisterGCSFileSystem.h to R…
guhaiyan0221 47eff98
[VL] Support mapping columns by position index for ORC and Parquet fi…
kevinwilfong 970ab69
[fix] pass the full tables schema when creating HiveTableHandle for o…
markjin1990 ad59e35
Remove __cxa_throw hook in gluten
kexianda 6dbdc7c
Add support for paimon from master
ZacBlanco f54de19
Remove unused data type cases in BoltBackend
VvanFalleaves faf6d0d
fix: dont push down paimon metadata column filters
ZacBlanco 0229207
remove unused sort_before_repartition for round robin shuffle
zhangxffff File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,216 @@ | ||
| # Licensed to the Apache Software Foundation (ASF) under one or more | ||
| # contributor license agreements. See the NOTICE file distributed with | ||
| # this work for additional information regarding copyright ownership. | ||
| # The ASF licenses this file to You under the Apache License, Version 2.0 | ||
| # (the "License"); you may not use this file except in compliance with | ||
| # the License. You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| ROOT_DIR := $(dir $(abspath $(lastword $(MAKEFILE_LIST)))) | ||
| BUILD_DIR := ${ROOT_DIR}/cpp/build | ||
| CONAN_FILE_DIR := ${ROOT_DIR}/cpp/ | ||
| BUILD_TYPE=Debug | ||
| ENABLE_ASAN ?= False | ||
| LDB_BUILD ?= False | ||
| BUILD_BENCHMARKS ?= False | ||
| BUILD_TESTS ?= False | ||
| BUILD_EXAMPLES ?= False | ||
| BUILD_ORC ?= False | ||
| ENABLE_PROTON ?= False | ||
|
|
||
| # conan package info | ||
| GLUTEN_BUILD_VERSION ?= main | ||
| BOLT_BUILD_VERSION ?= main | ||
| BUILD_USER ?= | ||
| BUILD_CHANNEL ?= | ||
|
|
||
| ENABLE_HDFS ?= True | ||
| ENABLE_S3 ?= False | ||
| RSS_PROFILE ?= '' | ||
|
|
||
| ifeq ($(BUILD_BENCHMARKS),True) | ||
| BUILD_ORC = True | ||
| endif | ||
|
|
||
| ARCH := $(shell arch) | ||
| ifeq ($(ARCH), x86_64) | ||
| ARCH := amd64 | ||
| endif | ||
|
|
||
| SHARED_LIBRARY ?= True | ||
|
|
||
| # Manually specify the number of bolt compilation threads by setting the BOLT_NUM_THREADS environment variable. | ||
| # e.g. export BOLT_NUM_THREADS=50 | ||
| ifndef CI_NUM_THREADS | ||
| ifdef BOLT_NUM_THREADS | ||
| NUM_THREADS ?= $(BOLT_NUM_THREADS) | ||
| else | ||
| NUM_THREADS ?= $$(( $(shell grep -c ^processor /proc/cpuinfo) / 2 )) | ||
| endif | ||
| else | ||
| NUM_THREADS ?= $(CI_NUM_THREADS) | ||
| endif | ||
|
|
||
| ALLOWED_VERSIONS := 11 17 | ||
| ifeq ($(JAVA_HOME),) | ||
| $(error ERROR: JAVA_HOME is not set) | ||
| endif | ||
| ifneq ($(wildcard $(JAVA_HOME)/bin/java),) | ||
| ifneq ($(wildcard $(JAVA_HOME)/bin/javac),) | ||
| JDK_VERSION := $(shell $(JAVA_HOME)/bin/java -version 2>&1 | sed -n 's/.*version "\(1\.\)\{0,1\}\([0-9]\+\).*/\2/p') | ||
| ifneq ($(filter $(JDK_VERSION),$(ALLOWED_VERSIONS)),$(JDK_VERSION)) | ||
| $(error ERROR: JDK version $(JDK_VERSION) is not supported, only 11 and 17 are allowed now) | ||
| endif | ||
| endif | ||
| endif | ||
|
|
||
| .PHONY: clean debug release java | ||
|
|
||
| bolt-recipe: | ||
| @echo "Install Bolt recipe into local cache" | ||
| rm -rf ep/bolt | ||
| git clone --depth=1 --branch ${BOLT_BUILD_VERSION} https://github.com/bytedance/bolt.git ep/bolt &&\ | ||
| bash ep/bolt/scripts/install-bolt-deps.sh && \ | ||
| conan export ep/bolt/conanfile.py --name=bolt --version=${BOLT_BUILD_VERSION} --user=${BUILD_USER} --channel=${BUILD_CHANNEL} | ||
| @echo "Bolt recipe has been installed" | ||
|
|
||
| build: | ||
| mkdir -p ${BUILD_DIR} && mkdir -p ${BUILD_DIR}/releases &&\ | ||
| cd ${CONAN_FILE_DIR} && export BOLT_BUILD_VERSION=${BOLT_BUILD_VERSION} &&\ | ||
| ALL_CONAN_OPTIONS=" -o gluten/*:shared=${SHARED_LIBRARY} \ | ||
| -o gluten/*:enable_hdfs=${ENABLE_HDFS} \ | ||
| -o gluten/*:enable_s3=${ENABLE_S3} \ | ||
| -o gluten/*:enable_asan=${ENABLE_ASAN} \ | ||
| -o gluten/*:build_benchmarks=${BUILD_BENCHMARKS} \ | ||
| -o gluten/*:build_tests=${BUILD_TESTS} \ | ||
| -o gluten/*:build_examples=${BUILD_EXAMPLES} " && \ | ||
| conan graph info . --name=gluten --version=${GLUTEN_BUILD_VERSION} --user=${BUILD_USER} --channel=${BUILD_CHANNEL} -c "arrow/*:tools.build:download_source=True" $${ALL_CONAN_OPTIONS} --format=html > gluten.conan.graph.html && \ | ||
| NUM_THREADS=$(NUM_THREADS) conan install . --name=gluten --version=${GLUTEN_BUILD_VERSION} --user=${BUILD_USER} --channel=${BUILD_CHANNEL} \ | ||
| -s llvm-core/*:build_type=Release -s build_type=${BUILD_TYPE} --build=missing $${ALL_CONAN_OPTIONS} && \ | ||
| cmake --preset `echo conan-${BUILD_TYPE} | tr A-Z a-z` && \ | ||
| cmake --build build/${BUILD_TYPE} -j $(NUM_THREADS) && \ | ||
| if [ "${SHARED_LIBRARY}" = "True" ]; then cmake --build ${BUILD_DIR}/${BUILD_TYPE} --target install ; fi && \ | ||
| if [ "${SHARED_LIBRARY}" = "False" ]; then \ | ||
| conan export-pkg . --name=gluten --version=${GLUTEN_BUILD_VERSION} --user=${BUILD_USER} --channel=${BUILD_CHANNEL} -s build_type=${BUILD_TYPE} \ | ||
| $${ALL_CONAN_OPTIONS} ; \ | ||
| fi && cd - | ||
|
|
||
| release : | ||
| $(MAKE) build BUILD_TYPE=Release GLUTEN_BUILD_VERSION=${GLUTEN_BUILD_VERSION} BOLT_BUILD_VERSION=${BOLT_BUILD_VERSION} BUILD_USER=${BUILD_USER} BUILD_CHANNEL=${BUILD_CHANNEL} | ||
|
|
||
| debug: | ||
| $(MAKE) build BUILD_TYPE=Debug GLUTEN_BUILD_VERSION=${GLUTEN_BUILD_VERSION} BOLT_BUILD_VERSION=${BOLT_BUILD_VERSION} BUILD_USER=${BUILD_USER} BUILD_CHANNEL=${BUILD_CHANNEL} | ||
|
|
||
| RelWithDebInfo: | ||
| $(MAKE) build BUILD_TYPE=RelWithDebInfo GLUTEN_BUILD_VERSION=${GLUTEN_BUILD_VERSION} BUILD_USER=${BUILD_USER} BUILD_CHANNEL=${BUILD_CHANNEL} | ||
|
|
||
| clean_cpp: | ||
| rm -rf ${ROOT_DIR}/cpp/build &&\ | ||
| rm -f cpp/conan.lock cpp/conaninfo.txt cpp/graph_info.json CMakeCache.txt | ||
|
|
||
| install_debug: | ||
| $(MAKE) clean_cpp | ||
| $(MAKE) debug SHARED_LIBRARY=False | ||
|
|
||
| install_release: | ||
| $(MAKE) clean_cpp | ||
| $(MAKE) release SHARED_LIBRARY=False | ||
|
|
||
| release-with-tests : | ||
| $(MAKE) build BUILD_TYPE=Release GLUTEN_BUILD_VERSION=${GLUTEN_BUILD_VERSION} BOLT_BUILD_VERSION=${BOLT_BUILD_VERSION} BUILD_USER=${BUILD_USER} BUILD_CHANNEL=${BUILD_CHANNEL} BUILD_TESTS=True | ||
|
|
||
| debug-with-tests : | ||
| $(MAKE) build BUILD_TYPE=Debug GLUTEN_BUILD_VERSION=${GLUTEN_BUILD_VERSION} BOLT_BUILD_VERSION=${BOLT_BUILD_VERSION} BUILD_USER=${BUILD_USER} BUILD_CHANNEL=${BUILD_CHANNEL} BUILD_TESTS=True | ||
|
|
||
| release-with-benchmarks : | ||
| $(MAKE) build BUILD_TYPE=Release GLUTEN_BUILD_VERSION=${GLUTEN_BUILD_VERSION} BOLT_BUILD_VERSION=${BOLT_BUILD_VERSION} B UILD_USER=${BUILD_USER} BUILD_CHANNEL=${BUILD_CHANNEL} BUILD_BENCHMARKS=True | ||
|
|
||
| debug-with-benchmarks : | ||
| $(MAKE) build BUILD_TYPE=Debug GLUTEN_BUILD_VERSION=${GLUTEN_BUILD_VERSION} BOLT_BUILD_VERSION=${BOLT_BUILD_VERSION} BUILD_USER=${BUILD_USER} BUILD_CHANNEL=${BUILD_CHANNEL} BUILD_BENCHMARKS=True | ||
|
|
||
| release-with-tests-and-benchmarks : | ||
| $(MAKE) build BUILD_TYPE=Release GLUTEN_BUILD_VERSION=${GLUTEN_BUILD_VERSION} BOLT_BUILD_VERSION=${BOLT_BUILD_VERSION} BUILD_USER=${BUILD_USER} BUILD_CHANNEL=${BUILD_CHANNEL} BUILD_BENCHMARKS=True BUILD_TESTS=True | ||
|
|
||
| debug-with-tests-and-benchmarks : | ||
| $(MAKE) build BUILD_TYPE=Debug GLUTEN_BUILD_VERSION=${GLUTEN_BUILD_VERSION} BOLT_BUILD_VERSION=${BOLT_BUILD_VERSION} BUILD_USER=${BUILD_USER} BUILD_CHANNEL=${BUILD_CHANNEL} BUILD_BENCHMARKS=True BUILD_TESTS=True | ||
|
|
||
| arrow: | ||
| bash dev/build_bolt_arrow.sh | ||
|
|
||
| # build gluten jar | ||
| jar: | ||
| java -version && mvn package -Pbackends-bolt -Pspark-3.3 -Pceleborn -DskipTests -Denforcer.skip=true -Pjava-8 -Ppaimon &&\ | ||
| mkdir -p output && \ | ||
| rm -rf output/gluten-spark*.jar | ||
| mv package/target/gluten-package-1.6.0-SNAPSHOT.jar output/gluten-spark3.2_2.12-1.0.0-SNAPSHOT-jar-with-dependencies.jar | ||
|
|
||
| jar-skip-check: | ||
| java -version && mvn package -Pbackends-bolt -Pspark-3.2 -Pceleborn -DskipTests -Denforcer.skip=true -Pjava-8 -Ppaimon -Dcheckstyle.skip=true -Dspotless.check.skip=true &&\ | ||
| mkdir -p output && \ | ||
| rm -rf output/gluten-spark*.jar | ||
| mv package/target/gluten-package-1.6.0-SNAPSHOT.jar output/gluten-spark3.2_2.12-1.0.0-SNAPSHOT-jar-with-dependencies.jar | ||
|
|
||
| spark32-las: | ||
| java -version && mvn package -Pbackends-bolt -Pspark-3.2-las -Pceleborn -DskipTests -Denforcer.skip=true -Pjava-8 -Ppaimon &&\ | ||
| mkdir -p output && \ | ||
| rm -rf output/gluten-spark*.jar | ||
| mv package/target/gluten-package-1.6.0-SNAPSHOT.jar output/gluten-spark3.2_2.12-1.0.0-SNAPSHOT-jar-with-dependencies.jar | ||
|
|
||
| fast-jar: | ||
| if [ ! -f "output/gluten-spark3.2_2.12-1.0.0-SNAPSHOT-jar-with-dependencies.jar" ] ; then \ | ||
| $(MAKE) jar; \ | ||
| else \ | ||
| jar uf output/gluten-spark3.2_2.12-1.0.0-SNAPSHOT-jar-with-dependencies.jar -C cpp/build/releases/ libbolt_backend.so; \ | ||
| fi | ||
|
|
||
| zip: | ||
| $(MAKE) jar | ||
| rm -rf output/gluten-spark*.zip | ||
| zip -j output/gluten-spark3.2_2.12-1.0.0-SNAPSHOT-jar-with-dependencies.zip output/gluten-spark3.2_2.12-1.0.0-SNAPSHOT-jar-with-dependencies.jar | ||
|
|
||
| fast-zip: | ||
| $(MAKE) fast-jar | ||
| rm -rf output/gluten-spark*.zip | ||
| zip -j output/gluten-spark3.2_2.12-1.0.0-SNAPSHOT-jar-with-dependencies.zip output/gluten-spark3.2_2.12-1.0.0-SNAPSHOT-jar-with-dependencies.jar | ||
|
|
||
| jar_spark33: | ||
| java -version && mvn -T32 clean package -Pbackends-bolt -Pspark-3.3 -Pceleborn -Piceberg -DskipTests -Denforcer.skip=true -Ppaimon && \ | ||
| mkdir -p output && \ | ||
| rm -rf output/gluten-spark*.jar | ||
| mv package/target/gluten-package-1.6.0-SNAPSHOT.jar output/gluten-spark3.3_2.12-1.0.0-SNAPSHOT-jar-with-dependencies.jar | ||
|
|
||
| jar_spark34: | ||
| java -version && mvn clean package -Pbackends-bolt -Pspark-3.4 -Pceleborn -Piceberg -DskipTests -Denforcer.skip=true -Ppaimon && \ | ||
| mkdir -p output && \ | ||
| rm -rf output/gluten-spark*.jar | ||
| mv package/target/gluten-package-1.6.0-SNAPSHOT.jar output/gluten-spark3.4_2.12-1.0.0-SNAPSHOT-jar-with-dependencies.jar | ||
|
|
||
| jar_spark35: | ||
| java -version && mvn -T32 clean package -Pbackends-bolt -Pspark-3.5 -Phadoop-3.2 -Pceleborn -Piceberg -DskipTests -Denforcer.skip=true -Ppaimon && \ | ||
| mkdir -p output && \ | ||
| rm -rf output/gluten-spark*.jar | ||
| mv package/target/gluten-package-1.6.0-SNAPSHOT.jar output/gluten-spark3.5_2.12-1.0.0-SNAPSHOT-jar-with-dependencies.jar | ||
|
|
||
| test: | ||
| mvn -Pbackends-bolt -Pspark-3.2 -Pceleborn -Ppaimon package -Denforcer.skip=true | ||
|
|
||
| test_spark35: | ||
| mvn -Pbackends-bolt -Pspark-3.5 -Ppaimon -Phadoop-3.2 -Pceleborn -Piceberg package -Denforcer.skip=true | ||
|
|
||
| cpp-test-release: release-with-tests | ||
| cd $(BUILD_DIR)/Release && ctest --timeout 7200 -j $(NUM_THREADS) --output-on-failure -V | ||
|
|
||
| cpp-test-debug: debug-with-tests | ||
| cd $(BUILD_DIR)/Debug && ctest --timeout 7200 -j $(NUM_THREADS) --output-on-failure -V | ||
|
|
||
| clean : | ||
| $(MAKE) clean_cpp | ||
| mvn clean -Pbackends-bolt -Pspark-3.2 -Pceleborn -Ppaimon -DskipTests -Denforcer.skip=true && \ | ||
| rm -rf ${ROOT_DIR}/output/gluten-*.jar |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
23 changes: 23 additions & 0 deletions
23
backends-bolt/benchmark/ColumnarTableCacheBenchmark-results.txt
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,23 @@ | ||
| OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Mac OS X 13.5 | ||
| Apple M1 Pro | ||
| table cache count: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative | ||
| ------------------------------------------------------------------------------------------------------------------------ | ||
| disable columnar table cache 16773 17024 401 1.2 838.7 1.0X | ||
| enable columnar table cache 9985 10051 65 2.0 499.3 1.0X | ||
|
|
||
|
|
||
| OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Mac OS X 13.5 | ||
| Apple M1 Pro | ||
| table cache column pruning: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative | ||
| ------------------------------------------------------------------------------------------------------------------------ | ||
| disable columnar table cache 16429 16873 688 1.2 821.5 1.0X | ||
| enable columnar table cache 15118 15495 456 1.3 755.9 1.0X | ||
|
|
||
|
|
||
| OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Mac OS X 13.5 | ||
| Apple M1 Pro | ||
| table cache filter: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative | ||
| ------------------------------------------------------------------------------------------------------------------------ | ||
| disable columnar table cache 22895 23527 722 0.9 1144.7 1.0X | ||
| enable columnar table cache 16673 17462 765 1.2 833.7 1.0X | ||
|
|
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.