Skip to content

feat: implement scripts for binary release build#932

Merged
andygrove merged 10 commits intoapache:mainfrom
parthchandra:binary-build
Sep 19, 2024
Merged

feat: implement scripts for binary release build#932
andygrove merged 10 commits intoapache:mainfrom
parthchandra:binary-build

Conversation

@parthchandra
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #721

Rationale for this change

Allows us to publish artifacts to maven

What changes are included in this PR?

Scripts, and Dockerfile to do the binary build in a docker container and include them in an uber jar

How are these changes tested?

Locally.

@parthchandra parthchandra marked this pull request as draft September 10, 2024 17:48
@parthchandra
Copy link
Copy Markdown
Contributor Author

@andygrove FYI. This will build an uber jar but does not have the script to deploy. That script can be a different PR.
Note: This includes support for MacOS binaries but that part does not actually work correctly because the build breaks on compiling Blake3. The MacOS build is skipped if the XCode library is not provided.

@parthchandra
Copy link
Copy Markdown
Contributor Author

MacOS build hits this - BLAKE3-team/BLAKE3#180. Will try the suggested solutions.

Comment thread dev/release/build-release-comet.sh
Comment thread dev/release/build-release-comet.sh
Comment thread dev/release/build-release-comet.sh Outdated
-t "comet-rm:$IMGTAG" \
--build-arg HAS_MACOS_SDK=${HAS_MACOS_SDK} \
--build-arg MACOS_SDK=${MACOS_SDK} \
--load \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to make this change to get this to work (or at least get further along) on linux, based on the comment at docker/buildx#59 (comment).

Suggested change
--load \
--push \

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The next issue I hit was:

------
 > exporting to image:
------
ERROR: failed to solve: failed to push comet-rm:latest: push access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed
Cleaning up ...
Error response from daemon: No such container: comet-arm64-builder-container

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--push will try to push to docker hub and we don't want to do that since this is a temporary image. --load will load into your local containerd store but it that does not work for you then let me look for a workaround for this.

@parthchandra
Copy link
Copy Markdown
Contributor Author

@andygrove I changed the script to build a different image for each architecture instead of a single multi-arch image. It makes things much simpler at the cost of having multiple images (and a small increase in building time). It also removes the need to have the local container store and will also work with a custom docker backend as long as the backend supports docker build.
I'm hoping this addresses some of the authentication issues you are seeing.
Also, I have noticed I get some network errors doing the build inside a container when on a VPN. Maybe we could try when not on a VPN?
Anyway, could you take this for a spin?

@parthchandra
Copy link
Copy Markdown
Contributor Author

@andygrove @viirya For the binary builder I chose to use Ubuntu 20.04 as the base image because that is the image we currently use for our published docker images.
Ubuntu 20.04 has glibc 2.31 which means that many redhat based releases will be incompatible because they have an older glibc version. Centos 7 for instance has glibc 2.17
(See: https://gist.github.com/wagenet/35adca1a032cec2999d47b6c40aa45b1)

Should we consider using an older version of Ubuntu?
(BTW I tried to build with an older version of glibc but the build kept failing for one reason or the other so I abandoned that effort).

@viirya
Copy link
Copy Markdown
Member

viirya commented Sep 13, 2024

Hmm, I think for OSS Comet we don't have the restriction on supported glibc for platform compatibility. Glibc 2.31 seems to be released on 2020. I think it is old enough for the compatibility of our binary release. For example, Centos 7 is already EOL (https://blog.centos.org/2023/04/end-dates-are-coming-for-centos-stream-8-and-centos-linux-7/)

Ubuntu 20.04 looks like a reasonable choice.

I personally wouldn't want to spend too much efforts on resolving issues on building on older versions of Ubuntu.

Comment thread dev/release/comet-rm/build-comet-native-libs.sh
@andygrove
Copy link
Copy Markdown
Member

I ran the scripts locally and they seem to have worked.

I ran this command:

./dev/release/build-release-comet.sh -r https://github.com/parthchandra/datafusion-comet.git -b binary-build

The resulting jar file contains the following native libs:

% jar tvf  spark/target/comet-spark-spark3.4_2.12-0.3.0-SNAPSHOT.jar | grep libcomet
149504624 Wed Jan 22 15:10:16 MST 2020 org/apache/comet/darwin/aarch64/libcomet.dylib
52964152 Wed Jan 22 15:10:16 MST 2020 org/apache/comet/linux/aarch64/libcomet.so
56773320 Wed Jan 22 15:10:16 MST 2020 org/apache/comet/linux/amd64/libcomet.so

@parthchandra
Copy link
Copy Markdown
Contributor Author

The artifact

149504624 Wed Jan 22 15:10:16 MST 2020 org/apache/comet/darwin/aarch64/libcomet.dylib

seems to be a leftover from a manual run. The script will not prepare macos binaries at the moment.

@parthchandra parthchandra marked this pull request as ready for review September 16, 2024 20:40
@parthchandra
Copy link
Copy Markdown
Contributor Author

@andygrove thank you for testing! This is ready for review.

Copy link
Copy Markdown
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @parthchandra!

@andygrove andygrove requested a review from viirya September 16, 2024 22:18
Comment thread Makefile
endif

# build native libs for arm64 architecture Linux/MacOS on a Linux/arm64 machine/container
core-arm64-libs:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two core-arm64-libs targets?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

Comment thread Makefile Outdated
Comment on lines +52 to +55
ifdef $(HAS_OSXCROSS)
cd native && cargo zigbuild -j 1 --target aarch64-apple-darwin --release
endif
cd native && cargo build -j 2 --release
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for MacOSX build, we need to run both cargo zigbuild and cargo build?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. This was a mistake. I experimented with zigbuild for macos. Removed

Comment thread Makefile Outdated
# build native libs for arm64 architecture Linux/MacOS on a Linux/arm64 machine/container
core-arm64-libs:
# if the environment variable HAS_OSXCROSS is defined
ifdef $(HAS_OSXCROSS)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need MacOS X SDK installed for HAS_OSXCROSS case?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is a placeholder for future work to enable MacOS. MacOS Sdk has to be provided to the Docker file as input and the build-release-comet script will copy it into the release builder's Docker image.
I removed the option because the build did not succeed but left the work so we can fix this later. I can remove it if it makes things clearer.

# See the License for the specific language governing permissions and
# limitations under the License.
#
ARG HAS_MACOS_SDK="false"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://hub.docker.com/r/messense/cargo-zigbuild claims they have MacOS X SDK pre-installed in their docker image. Can we reuse it to use MacOS X SDK for Comet build?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not try the docker image from zigbuild (yet). I will try it and if it works, then we can remove the HAS_OSXCROSS portions entirely.
Follow up issue: #947

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried the zigbuild docker image and the build failed. I'll investigate the failure in the followup.

@parthchandra
Copy link
Copy Markdown
Contributor Author

@viirya Any further comments?

Comment thread Makefile

# build native libs for amd64 architecture Linux/MacOS on a Linux/amd64 machine/container
core-amd64-libs:
cd native && cargo build -j 2 --release
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to specify target for this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will build the binary for the same architecture as the machine. So no need to specify target.

Comment thread Makefile
Comment on lines +52 to +55
ifdef HAS_OSXCROSS
rustup target add x86_64-apple-darwin
cd native && cargo build -j 2 --target x86_64-apple-darwin --release
endif
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So as the L51 is not in an else block, if HAS_OSXCROSS is true, we will build the library for x86_64-apple-darwin additionally? I.e., two libraries for core-amd64-libs?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. For the amd64 architecture, one for linux and one for MacOS

./mvnw "-Dmaven.repo.local=${LOCAL_REPO}" -P spark-3.5 -P scala-2.13 -DskipTests install

echo "Installed to local repo: ${LOCAL_REPO}"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to remove the created docker image/container after installation?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The container is removed in the cleanup part of the script which is invoked on exit or error.

@viirya
Copy link
Copy Markdown
Member

viirya commented Sep 19, 2024

Looks good to me, with a few minor questions.

@dpengpeng
Copy link
Copy Markdown

@parthchandra Consulting a question: In the current compilation script dev/release/build-release-comet.sh, the final invocation of the compilation command is core-amd64-libs and core-arm64-libs in the Makefile. These two commands have a significant difference from release-nogit in whether they include RUSTFLAGS="-Ctarget-cpu=native". We know that this flag is for special CPU optimization. If we use the dev/release/build-release-comet.sh compilation method, it compiles both x86 and arm CPU architectures simultaneously through Docker commands, and the arm architecture is simulated. Does this mean we lose the special CPU optimizations? Would this lead to a performance drop for Comet? Should we add the RUSTFLAGS="-Ctarget-cpu=native" parameter under both core-amd64-libs and core-arm64-libs? Even if we do so, would the Comet compiled in the simulated ARM environment with RUSTFLAGS="-Ctarget-cpu=native" still not perform as well as when compiled on a real ARM physical machine?

@parthchandra
Copy link
Copy Markdown
Contributor Author

@parthchandra Consulting a question: In the current compilation script dev/release/build-release-comet.sh, the final invocation of the compilation command is core-amd64-libs and core-arm64-libs in the Makefile. These two commands have a significant difference from release-nogit in whether they include RUSTFLAGS="-Ctarget-cpu=native". We know that this flag is for special CPU optimization. If we use the dev/release/build-release-comet.sh compilation method, it compiles both x86 and arm CPU architectures simultaneously through Docker commands, and the arm architecture is simulated. Does this mean we lose the special CPU optimizations? Would this lead to a performance drop for Comet? Should we add the RUSTFLAGS="-Ctarget-cpu=native" parameter under both core-amd64-libs and core-arm64-libs? Even if we do so, would the Comet compiled in the simulated ARM environment with RUSTFLAGS="-Ctarget-cpu=native" still not perform as well as when compiled on a real ARM physical machine?

Per this: https://rust-lang.github.io/packed_simd/perf-guide/target-feature/rustflags.html#target-cpu I'm not entirely sure if we want to set target-cpu=native for the distribution.
I don't know enough about the rust compiler's optimizations to know if the performance difference is considerable.

coderfender pushed a commit to coderfender/datafusion-comet that referenced this pull request Dec 13, 2025
* feat: implement scripts for binary release build

* Install to temp local maven repo and updates for MacOS

* newline

* Use independent docker images for different architectures instead of
a multi-arch image

* update docs and cleanup

* remove unused code

* fail build script on error

* Build all profiles

* remove duplicate target from makefile

---------

Co-authored-by: Andy Grove <agrove@apache.org>
@parthchandra parthchandra deleted the binary-build branch January 14, 2026 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create binary releases

4 participants