Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
312 changes: 312 additions & 0 deletions dev/release/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,312 @@
<!---
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# Release Process

## Sub-projects

The Datafusion repo contains 3 different releasable sub-projects: Datafusion, Ballista and Datafusion python binding.

We use Datafusion release to drive the release for the other sub-projects. As a
result, Datafusion version bump is required for every release while version
bumps for the Python binding and Ballista are optional. In other words, we can
release a new version of Datafusion without releasing a new version of the
Python binding or Ballista. On the other hand, releasing a new version of the
Python binding or Ballista always requires a new Datafusion version release.

## Branching

Datafusion currently only releases from the `master` branch. Given the project
is still in early development state, we are not maintaining an active stable
release backport branch.

## Prerequisite

- Have upstream git repo `git@github.com:apache/arrow-datafusion.git` add as git remote `apache`.
- Created a peronal access token in Github for changelog automation script.
- Github PAT should be created with `repo` access
- Make sure your signing key is added to the following files in SVN:
- https://dist.apache.org/repos/dist/dev/arrow/KEYS
- https://dist.apache.org/repos/dist/release/arrow/KEYS

## Process Overview

As part of the Apache governance model, official releases consist of signed
source tarballs approved by the PMC.

We then use the code in the approved source tarball to release to crates.io and
PyPI.

### Change Log

We maintain `CHANGELOG.md` for each sub project so our users know what has been
changed between releases.

The CHANGELOG is managed automatically using
[update_change_log.sh](https://github.com/apache/arrow-datafusion/blob/master/dev/release/update_change_log.sh)

This script creates a changelog using github PRs and issues based on the labels
associated with them.

## Prepare release comimts and PR

Prepare a PR to update `CHANGELOG.md` and versions to reflect the planned
release.

See [#801](https://github.com/apache/arrow-datafusion/pull/801) for an example.

Here are the commands that could be used to prepare the `5.1.0` release:

### Update Version

Checkout the master commit to be released

```
git fetch apache
git checkout apache/master
```

Update datafusion version in `datafusion/Cargo.toml` to `5.1.0`.

If there is a ballista release, update versions in ballista Cargo.tomls, run

```
./dev/update_ballista_versions.py 0.5.0
```

If there is a datafusion python binding release, update versions in
`./python/Cargo.toml`.

Lastly commit the version change:

```
git commit -a -m 'Update version'
```

### Update CHANGELOG.md

Create local release rc tags:

```
git tag -f 5.1.0-rc-local
# if there is ballista release
git tag -f ballista-0.5.0-rc-local
# if there is python binding release
git tag -f python-0.3.0-rc-local
```

Manully edit the previous release version tag in
`dev/release/update_change_log-{ballista,datafusion,python}.sh`. Commits
between the previous verstion tag and the new rc tag will be used to
populate the changelog content.

```bash
# create the changelog
CHANGELOG_GITHUB_TOKEN=<TOKEN> ./dev/release/update_change_log-all.sh
# review change log / edit issues and labels if needed, rerun until you are happy with the result
git commit -a -m 'Create changelog for release'
```

Note that when reviewing the change log, rather than editing the
`CHANGELOG.md`, it is preferred to update the issues and their labels.

You can add `invalid` or `development-process` label to exclude items from
release notes. Add `datafusion`, `ballista` and `python` labels to group items
into each sub-project's change log.

Send a PR to get these changes merged into `master` branch. If new commits that
could change the change log content landed in the `master` branch before you
could merge the PR, you need to rerun the changelog update script to regenerate
the changelog and update the PR accordingly.

## Prepare release candidate tarball

After the PR gets merged, you are ready to create a releaes tarball from the
merged commit.

(Note you need to be a committer to run these scripts as they upload to the apache svn distribution servers)

### Pick an Release Candidate (RC) number

Pick numbers in sequential order, with `0` for `rc0`, `1` for `rc1`, etc.

### Create git tag for the release:

While the official release artifact is a signed tarball, we also tag the commit it was created for convenience and code archaeology.

Using a string such as `5.1.0` as the `<version>`, create and push the tag thusly:

```shell
git fetch apache
git tag <version>-<rc> apache/master
# push tag to Github remote
git push apache <version>
```

### Create, sign, and upload tarball

Run `create-tarball.sh` with the `<version>` tag and `<rc>` and you found in previous steps:

```shell
./dev/release/create-tarball.sh 5.1.0 0
```

The `create-tarball.sh` script

1. creates and uploads a release candidate tarball to the [arrow
dev](https://dist.apache.org/repos/dist/dev/arrow) location on the
apache distribution svn server

2. provide you an email template to
send to dev@arrow.apache.org for release voting.

### Vote on Release Candidate tarball

Send the email output from the script to dev@arrow.apache.org. The email should look like

```
To: dev@arrow.apache.org
Subject: [VOTE][Datafusion] Release Apache Arrow Datafusion 5.1.0 RC0

Hi,

I would like to propose a release of Apache Arrow Datafusion Implementation,
version 5.1.0.

This release candidate is based on commit: a5dd428f57e62db20a945e8b1895de91405958c4 [1]
The proposed release tarball and signatures are hosted at [2].
The changelog is located at [3].

Please download, verify checksums and signatures, run the unit tests,
and vote on the release.

The vote will be open for at least 72 hours.

[ ] +1 Release this as Apache Arrow Datafusion 5.1.0
[ ] +0
[ ] -1 Do not release this as Apache Arrow Datafusion 5.1.0 because...

[1]: https://github.com/apache/arrow-datafusion/tree/a5dd428f57e62db20a945e8b1895de91405958c4
[2]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-5.1.0
[3]: https://github.com/apache/arrow-datafusion/blob/a5dd428f57e62db20a945e8b1895de91405958c4/CHANGELOG.md
```

For the release to become "official" it needs at least three PMC members to vote +1 on it.

### Verifying Release Candidates

The `dev/release/verify-release-candidate.sh` is a script in this repository that can assist in the verification process. Run it like:

```
./dev/release/verify-release-candidate.sh 5.1.0 0
```

#### If the release is not approved

If the release is not approved, fix whatever the problem is, merge changelog
changes into master if there is any and try again with the next RC number.

## Finalize the release

### After the release is approved

Move tarball to the release location in SVN, e.g.
https://dist.apache.org/repos/dist/release/arrow/arrow-datafusion-5.1.0/, using
the `release-tarball.sh` script:

```shell
./dev/release/release-tarball.sh 5.1.0 0
```

Congratulations! The release is now offical!

### Create release git tags

Tag the same release candidate commit with the final release tag

```
git co apache/5.1.0-RC0
git tag 5.1.0
git push 5.1.0
```

If there is a ballista release, also push the ballista tag

```
git tag ballista-0.5.0
git push ballista-0.5.0
```

If there is a datafusion python binding release, also push the python tag

```
git tag python-0.3.0
git push python-0.3.0
```

### Publish on Crates.io

Only approved releases of the tarball should be published to
crates.io, in order to conform to Apache Software Foundation
governance standards.

An Arrow committer can publish this crate after an official project release has
been made to crates.io using the following instructions.

Follow [these
instructions](https://doc.rust-lang.org/cargo/reference/publishing.html) to
create an account and login to crates.io before asking to be added as an owner
of the following crates:

- [datafusion](https://crates.io/crates/datafusion)
- [ballista](https://crates.io/crates/ballista)
- [ballista-core](https://crates.io/crates/ballista-core)
- [ballista-executor](https://crates.io/crates/ballista-executor)
- [ballista-scheduler](https://crates.io/crates/ballista-scheduler)

Download and unpack the official release tarball

Verify that the Cargo.toml in the tarball contains the correct version
(e.g. `version = "5.1.0"`) and then publish the crate with the
following commands

```shell
(cd datafusion && cargo publish)
```

If there is a ballista release, run

```shell
(cd ballista/rust/client && cargo publish)
(cd ballista/rust/core && cargo publish)
(cd ballista/rust/executor && cargo publish)
(cd ballista/rust/scheduler && cargo publish)
```

### Publish on PyPI

TODO
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorgecarleitao Are you using maturin or twine to do the PyPI release?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the actual job is still present in the repo here. It uses a github action (based on twine I believe).

Copy link
Member

@jorgecarleitao jorgecarleitao Aug 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

driven by tags. AFAIK currently we can't release binaries without first voting them (as they constitute artifacts beyond source code). This means that we also need to sign and push them to the apache server, i.e. we need to at least download them from github, sign them, upload them, and them upload them to pypi?

In summary, it is difficult to automatize. A way to go here would be for Apache to offer a shared key pair that we could place in github secrets that would be used to sign artifacts. However, this requires restrictions on who can push tags to the repos to avoid anyone from releasing.

Another issue is that the binary contains software that is not necessarily Apache 2.0 licensed, as the binary is compiled using all dependencies of the crates (e.g. Tokio is only MIT). I think that the same applies for Ballista docker images (not the Dockerfile), since the image now contains software beyond our direct licensing control (thinking about this notice in apache/arrow repo)

(admittedly, I did not think about this when donated python-datafusion)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, ok, let me think more about how to best handle pypi release tomorrow. I totally missed the binary release part as well :(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will ping @wesm and @kszucs as they may be able to offer guidance here.

Copy link
Member Author

@houqp houqp Aug 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to at least download them from github, sign them, upload them, and them upload them to pypi?

I believe this is not too bad to get started for now. Including the wheel binaries as part of the datafusion release artifacts for voting won't further complicate the current release process. The only difference is we will need to update create-tarball.sh to download the artifact from github and sign these wheels together with the datafusion source tarball.

I will also look into whether it's possible to automate the signing with Github Action by provisioning a Github Action key. Today we are already giving committer access to create signed release tarballs. I believe only committer and PMC member have access to create tags today? If so, tag push based release signing should work.

Another issue is that the binary contains software that is not necessarily Apache 2.0 licensed

I will prepare a NOTICE file similar to what arrow has tomorrow for the python binding.


### Call the vote

Call the vote on the Arrow dev list by replying to the RC voting thread. The
reply should have a new subject constructed by adding `[RESULT]` prefix to the
old subject line.

TODO: add example mail
5 changes: 5 additions & 0 deletions dev/release/create-tarball.sh
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,11 @@ The changelog is located at [3].
Please download, verify checksums and signatures, run the unit tests, and vote
on the release. The vote will be open for at least 72 hours.

Only votes from PMC members are binding, but all members of the community are
encouraged to test the release and vote with "(non-binding)".

The standard verification procedure is documented at https://github.com/apache/arrow-datafusion/blob/master/dev/release/README.md#verifying-release-candidates.

[ ] +1 Release this as Apache Arrow Datafusion ${version}
[ ] +0
[ ] -1 Do not release this as Apache Arrow Datafusion ${version} because...
Expand Down
2 changes: 1 addition & 1 deletion dev/release/update_change_log-ballista.sh
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,4 @@ SOURCE_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
SOURCE_TOP_DIR="$(cd "${SOURCE_DIR}/../../" && pwd)"

CURRENT_VER=$(grep version "${SOURCE_TOP_DIR}/ballista/rust/client/Cargo.toml" | head -n 1 | awk '{print $3}' | tr -d '"')
${SOURCE_DIR}/update_change_log.sh ballista 4.0.0 "ballista-${CURRENT_VER}"
${SOURCE_DIR}/update_change_log.sh ballista 4.0.0 "ballista-${CURRENT_VER}-rc-local"
2 changes: 1 addition & 1 deletion dev/release/update_change_log-datafusion.sh
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,4 @@ SOURCE_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
SOURCE_TOP_DIR="$(cd "${SOURCE_DIR}/../../" && pwd)"

CURRENT_VER=$(grep version "${SOURCE_TOP_DIR}/datafusion/Cargo.toml" | head -n 1 | awk '{print $3}' | tr -d '"')
${SOURCE_DIR}/update_change_log.sh datafusion 4.0.0 "${CURRENT_VER}"
${SOURCE_DIR}/update_change_log.sh datafusion 4.0.0 "${CURRENT_VER}-rc-local"
2 changes: 1 addition & 1 deletion dev/release/update_change_log-python.sh
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,4 @@ SOURCE_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
SOURCE_TOP_DIR="$(cd "${SOURCE_DIR}/../../" && pwd)"

CURRENT_VER=$(grep version "${SOURCE_TOP_DIR}/python/Cargo.toml" | head -n 1 | awk '{print $3}' | tr -d '"')
${SOURCE_DIR}/update_change_log.sh python 4.0.0 "python-${CURRENT_VER}"
${SOURCE_DIR}/update_change_log.sh python 4.0.0 "python-${CURRENT_VER}-rc-local"