diff --git a/dev/release/README.md b/dev/release/README.md new file mode 100644 index 0000000000000..7a515732df652 --- /dev/null +++ b/dev/release/README.md @@ -0,0 +1,312 @@ + + +# Release Process + +## Sub-projects + +The Datafusion repo contains 3 different releasable sub-projects: Datafusion, Ballista and Datafusion python binding. + +We use Datafusion release to drive the release for the other sub-projects. As a +result, Datafusion version bump is required for every release while version +bumps for the Python binding and Ballista are optional. In other words, we can +release a new version of Datafusion without releasing a new version of the +Python binding or Ballista. On the other hand, releasing a new version of the +Python binding or Ballista always requires a new Datafusion version release. + +## Branching + +Datafusion currently only releases from the `master` branch. Given the project +is still in early development state, we are not maintaining an active stable +release backport branch. + +## Prerequisite + +- Have upstream git repo `git@github.com:apache/arrow-datafusion.git` add as git remote `apache`. +- Created a peronal access token in Github for changelog automation script. + - Github PAT should be created with `repo` access +- Make sure your signing key is added to the following files in SVN: + - https://dist.apache.org/repos/dist/dev/arrow/KEYS + - https://dist.apache.org/repos/dist/release/arrow/KEYS + +## Process Overview + +As part of the Apache governance model, official releases consist of signed +source tarballs approved by the PMC. + +We then use the code in the approved source tarball to release to crates.io and +PyPI. + +### Change Log + +We maintain `CHANGELOG.md` for each sub project so our users know what has been +changed between releases. + +The CHANGELOG is managed automatically using +[update_change_log.sh](https://github.com/apache/arrow-datafusion/blob/master/dev/release/update_change_log.sh) + +This script creates a changelog using github PRs and issues based on the labels +associated with them. + +## Prepare release comimts and PR + +Prepare a PR to update `CHANGELOG.md` and versions to reflect the planned +release. + +See [#801](https://github.com/apache/arrow-datafusion/pull/801) for an example. + +Here are the commands that could be used to prepare the `5.1.0` release: + +### Update Version + +Checkout the master commit to be released + +``` +git fetch apache +git checkout apache/master +``` + +Update datafusion version in `datafusion/Cargo.toml` to `5.1.0`. + +If there is a ballista release, update versions in ballista Cargo.tomls, run + +``` +./dev/update_ballista_versions.py 0.5.0 +``` + +If there is a datafusion python binding release, update versions in +`./python/Cargo.toml`. + +Lastly commit the version change: + +``` +git commit -a -m 'Update version' +``` + +### Update CHANGELOG.md + +Create local release rc tags: + +``` +git tag -f 5.1.0-rc-local +# if there is ballista release +git tag -f ballista-0.5.0-rc-local +# if there is python binding release +git tag -f python-0.3.0-rc-local +``` + +Manully edit the previous release version tag in +`dev/release/update_change_log-{ballista,datafusion,python}.sh`. Commits +between the previous verstion tag and the new rc tag will be used to +populate the changelog content. + +```bash +# create the changelog +CHANGELOG_GITHUB_TOKEN= ./dev/release/update_change_log-all.sh +# review change log / edit issues and labels if needed, rerun until you are happy with the result +git commit -a -m 'Create changelog for release' +``` + +Note that when reviewing the change log, rather than editing the +`CHANGELOG.md`, it is preferred to update the issues and their labels. + +You can add `invalid` or `development-process` label to exclude items from +release notes. Add `datafusion`, `ballista` and `python` labels to group items +into each sub-project's change log. + +Send a PR to get these changes merged into `master` branch. If new commits that +could change the change log content landed in the `master` branch before you +could merge the PR, you need to rerun the changelog update script to regenerate +the changelog and update the PR accordingly. + +## Prepare release candidate tarball + +After the PR gets merged, you are ready to create a releaes tarball from the +merged commit. + +(Note you need to be a committer to run these scripts as they upload to the apache svn distribution servers) + +### Pick an Release Candidate (RC) number + +Pick numbers in sequential order, with `0` for `rc0`, `1` for `rc1`, etc. + +### Create git tag for the release: + +While the official release artifact is a signed tarball, we also tag the commit it was created for convenience and code archaeology. + +Using a string such as `5.1.0` as the ``, create and push the tag thusly: + +```shell +git fetch apache +git tag - apache/master +# push tag to Github remote +git push apache +``` + +### Create, sign, and upload tarball + +Run `create-tarball.sh` with the `` tag and `` and you found in previous steps: + +```shell +./dev/release/create-tarball.sh 5.1.0 0 +``` + +The `create-tarball.sh` script + +1. creates and uploads a release candidate tarball to the [arrow + dev](https://dist.apache.org/repos/dist/dev/arrow) location on the + apache distribution svn server + +2. provide you an email template to + send to dev@arrow.apache.org for release voting. + +### Vote on Release Candidate tarball + +Send the email output from the script to dev@arrow.apache.org. The email should look like + +``` +To: dev@arrow.apache.org +Subject: [VOTE][Datafusion] Release Apache Arrow Datafusion 5.1.0 RC0 + +Hi, + +I would like to propose a release of Apache Arrow Datafusion Implementation, +version 5.1.0. + +This release candidate is based on commit: a5dd428f57e62db20a945e8b1895de91405958c4 [1] +The proposed release tarball and signatures are hosted at [2]. +The changelog is located at [3]. + +Please download, verify checksums and signatures, run the unit tests, +and vote on the release. + +The vote will be open for at least 72 hours. + +[ ] +1 Release this as Apache Arrow Datafusion 5.1.0 +[ ] +0 +[ ] -1 Do not release this as Apache Arrow Datafusion 5.1.0 because... + +[1]: https://github.com/apache/arrow-datafusion/tree/a5dd428f57e62db20a945e8b1895de91405958c4 +[2]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-5.1.0 +[3]: https://github.com/apache/arrow-datafusion/blob/a5dd428f57e62db20a945e8b1895de91405958c4/CHANGELOG.md +``` + +For the release to become "official" it needs at least three PMC members to vote +1 on it. + +### Verifying Release Candidates + +The `dev/release/verify-release-candidate.sh` is a script in this repository that can assist in the verification process. Run it like: + +``` +./dev/release/verify-release-candidate.sh 5.1.0 0 +``` + +#### If the release is not approved + +If the release is not approved, fix whatever the problem is, merge changelog +changes into master if there is any and try again with the next RC number. + +## Finalize the release + +### After the release is approved + +Move tarball to the release location in SVN, e.g. +https://dist.apache.org/repos/dist/release/arrow/arrow-datafusion-5.1.0/, using +the `release-tarball.sh` script: + +```shell +./dev/release/release-tarball.sh 5.1.0 0 +``` + +Congratulations! The release is now offical! + +### Create release git tags + +Tag the same release candidate commit with the final release tag + +``` +git co apache/5.1.0-RC0 +git tag 5.1.0 +git push 5.1.0 +``` + +If there is a ballista release, also push the ballista tag + +``` +git tag ballista-0.5.0 +git push ballista-0.5.0 +``` + +If there is a datafusion python binding release, also push the python tag + +``` +git tag python-0.3.0 +git push python-0.3.0 +``` + +### Publish on Crates.io + +Only approved releases of the tarball should be published to +crates.io, in order to conform to Apache Software Foundation +governance standards. + +An Arrow committer can publish this crate after an official project release has +been made to crates.io using the following instructions. + +Follow [these +instructions](https://doc.rust-lang.org/cargo/reference/publishing.html) to +create an account and login to crates.io before asking to be added as an owner +of the following crates: + +- [datafusion](https://crates.io/crates/datafusion) +- [ballista](https://crates.io/crates/ballista) +- [ballista-core](https://crates.io/crates/ballista-core) +- [ballista-executor](https://crates.io/crates/ballista-executor) +- [ballista-scheduler](https://crates.io/crates/ballista-scheduler) + +Download and unpack the official release tarball + +Verify that the Cargo.toml in the tarball contains the correct version +(e.g. `version = "5.1.0"`) and then publish the crate with the +following commands + +```shell +(cd datafusion && cargo publish) +``` + +If there is a ballista release, run + +```shell +(cd ballista/rust/client && cargo publish) +(cd ballista/rust/core && cargo publish) +(cd ballista/rust/executor && cargo publish) +(cd ballista/rust/scheduler && cargo publish) +``` + +### Publish on PyPI + +TODO + +### Call the vote + +Call the vote on the Arrow dev list by replying to the RC voting thread. The +reply should have a new subject constructed by adding `[RESULT]` prefix to the +old subject line. + +TODO: add example mail diff --git a/dev/release/create-tarball.sh b/dev/release/create-tarball.sh index ffcb430b5c7c1..94318d0777700 100755 --- a/dev/release/create-tarball.sh +++ b/dev/release/create-tarball.sh @@ -86,6 +86,11 @@ The changelog is located at [3]. Please download, verify checksums and signatures, run the unit tests, and vote on the release. The vote will be open for at least 72 hours. +Only votes from PMC members are binding, but all members of the community are +encouraged to test the release and vote with "(non-binding)". + +The standard verification procedure is documented at https://github.com/apache/arrow-datafusion/blob/master/dev/release/README.md#verifying-release-candidates. + [ ] +1 Release this as Apache Arrow Datafusion ${version} [ ] +0 [ ] -1 Do not release this as Apache Arrow Datafusion ${version} because... diff --git a/dev/release/update_change_log-ballista.sh b/dev/release/update_change_log-ballista.sh index 68193156622a2..05c5f6fe69849 100755 --- a/dev/release/update_change_log-ballista.sh +++ b/dev/release/update_change_log-ballista.sh @@ -25,4 +25,4 @@ SOURCE_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" SOURCE_TOP_DIR="$(cd "${SOURCE_DIR}/../../" && pwd)" CURRENT_VER=$(grep version "${SOURCE_TOP_DIR}/ballista/rust/client/Cargo.toml" | head -n 1 | awk '{print $3}' | tr -d '"') -${SOURCE_DIR}/update_change_log.sh ballista 4.0.0 "ballista-${CURRENT_VER}" +${SOURCE_DIR}/update_change_log.sh ballista 4.0.0 "ballista-${CURRENT_VER}-rc-local" diff --git a/dev/release/update_change_log-datafusion.sh b/dev/release/update_change_log-datafusion.sh index f0f455ad1c9b5..1570c91252756 100755 --- a/dev/release/update_change_log-datafusion.sh +++ b/dev/release/update_change_log-datafusion.sh @@ -25,4 +25,4 @@ SOURCE_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" SOURCE_TOP_DIR="$(cd "${SOURCE_DIR}/../../" && pwd)" CURRENT_VER=$(grep version "${SOURCE_TOP_DIR}/datafusion/Cargo.toml" | head -n 1 | awk '{print $3}' | tr -d '"') -${SOURCE_DIR}/update_change_log.sh datafusion 4.0.0 "${CURRENT_VER}" +${SOURCE_DIR}/update_change_log.sh datafusion 4.0.0 "${CURRENT_VER}-rc-local" diff --git a/dev/release/update_change_log-python.sh b/dev/release/update_change_log-python.sh index a48a5b657c5f3..6b864f9be1b2e 100755 --- a/dev/release/update_change_log-python.sh +++ b/dev/release/update_change_log-python.sh @@ -25,4 +25,4 @@ SOURCE_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" SOURCE_TOP_DIR="$(cd "${SOURCE_DIR}/../../" && pwd)" CURRENT_VER=$(grep version "${SOURCE_TOP_DIR}/python/Cargo.toml" | head -n 1 | awk '{print $3}' | tr -d '"') -${SOURCE_DIR}/update_change_log.sh python 4.0.0 "python-${CURRENT_VER}" +${SOURCE_DIR}/update_change_log.sh python 4.0.0 "python-${CURRENT_VER}-rc-local"