Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .compatibility
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
1.12.0-11.3.0
1.11.0-11.3.0
1.10.1-11.3.0
52 changes: 32 additions & 20 deletions .github/workflows/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
- [Dispatch Example Test](#dispatch-example-test)
- [Compatibility Test](#compatibility-test)
- [User Friendliness](#user-friendliness)
- [Configuration](#configuration)
- [Progress Log](#progress-log)

## Overview
Expand All @@ -37,30 +38,32 @@ In the section below, we will dive into the details of different workflows avail

### Regular Checks

| Workflow Name | File name | Description |
| ----------------------- | ------------------------ | ---------------------------------------------------------------------------------------------------------------------- |
| `Test example` | `auto_example_check.yml` | This workflow will test all examples every Sunday |
| `Build on 8 GPUs` | `build_gpu_8.yml` | This workflow will run the unit tests everyday with 8 GPUs. |
| `Synchronize submodule` | `submodule.yml` | This workflow will check if any git submodule is updated. If so, it will create a PR to update the submodule pointers. |
| `Close inactive issues` | `close_inactive.yml` | This workflow will close issues which are stale for 14 days. |
| Workflow Name | File name | Description |
| ----------------------- | ----------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `Test example` | `auto_example_check.yml` | This workflow will test all examples every Sunday |
| `Compatibility Test` | `auto_compatibility_test.yml` | This workflow will check the compatiblity of Colossal-AI against PyTorch and CUDA every Sunday. The PyTorch and CUDA versions are specified in `.compatibility`. |
| `Build on 8 GPUs` | `build_gpu_8.yml` | This workflow will run the unit tests everyday with 8 GPUs. |
| `Synchronize submodule` | `submodule.yml` | This workflow will check if any git submodule is updated. If so, it will create a PR to update the submodule pointers. |
| `Close inactive issues` | `close_inactive.yml` | This workflow will close issues which are stale for 14 days. |

### Release

| Workflow Name | File name | Description |
| --------------------------- | ------------------------------- | ----------------------------------------------------------------------------------------------------------------- |
| `Draft GitHub Release Post` | `draft_github_release_post.yml` | Compose a GitHub release post draft based on the commit history. Triggered when `version.txt` is updated. |
| `Release to PyPI` | `release_pypi.yml` | Build and release the wheel to PyPI. Triggered when `version.txt` is updated. |
| `Release Nightly to PyPI` | `release_nightly.yml` | Build and release the nightly wheel to PyPI as `colossalai-nightly`. Automatically executed every Sunday. |
| `Release Docker` | `release_docker.yml` | Build and release the Docker image to DockerHub. Triggered when `version.txt` is updated. |
| `Release bdist wheel` | `release_bdist.yml` | Build binary wheels with pre-built PyTorch extensions. Manually dispatched. See more details in the next section. |
| Workflow Name | File name | Description |
| --------------------------- | ------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
| `Draft GitHub Release Post` | `draft_github_release_post.yml` | Compose a GitHub release post draft based on the commit history. Triggered when the change of `version.txt` is merged. |
| `Release to PyPI` | `release_pypi.yml` | Build and release the wheel to PyPI. Triggered when the change of `version.txt` is merged. |
| `Release Nightly to PyPI` | `release_nightly.yml` | Build and release the nightly wheel to PyPI as `colossalai-nightly`. Automatically executed every Sunday. |
| `Release Docker` | `release_docker.yml` | Build and release the Docker image to DockerHub. Triggered when the change of `version.txt` is merged. |
| `Release bdist wheel` | `release_bdist.yml` | Build binary wheels with pre-built PyTorch extensions. Manually dispatched. See more details in the next section. |
| `Auto Compatibility Test` | `auto_compatibility_test.yml` | Check Colossal-AI's compatiblity against the PyTorch and CUDA version specified in `.compatibility`. Triggered when `version.txt` is changed in a PR. |

### Manual Dispatch

| Workflow Name | File name | Description |
| ----------------------- | ---------------------------- | ------------------------------------------------------ |
| `Release bdist wheel` | `release_bdist.yml` | Build binary wheels with pre-built PyTorch extensions. |
| `Dispatch Example Test` | `dispatch_example_check.yml` | Manually test a specified example. |
| `Compatiblity Test` | `compatiblity_test.yml` | Test PyTorch and Python Compatibility. |
| Workflow Name | File name | Description |
| ---------------------------- | -------------------------------- | ------------------------------------------------------ |
| `Release bdist wheel` | `release_bdist.yml` | Build binary wheels with pre-built PyTorch extensions. |
| `Dispatch Example Test` | `dispatch_example_check.yml` | Manually test a specified example. |
| `Dispatch Compatiblity Test` | `dispatch_compatiblity_test.yml` | Test PyTorch and Python Compatibility. |

Refer to this [documentation](https://docs.github.com/en/actions/managing-workflow-runs/manually-running-a-workflow) on how to manually trigger a workflow.
I will provide the details of each workflow below.
Expand Down Expand Up @@ -93,6 +96,15 @@ Parameters:
| ----------------- | ----------------------- | -------------------------------------------------------------------------------------------------------------------------------------- |
| `issue-translate` | `translate_comment.yml` | This workflow is triggered when a new issue comment is created. The comment will be translated into English if not written in English. |


## Configuration

This section lists the files used to configure the workflow.

1. `.compatibility`

This `.compatibility` file is to tell GitHub Actions which PyTorch and CUDA versions to test against. Each line in the file is in the format `${torch-version}-${cuda-version}`, which is a tag for Docker image. Thus, this tag must be present in the [docker registry](https://hub.docker.com/r/pytorch/conda-cuda) so as to perform the test.

## Progress Log

- [x] unit testing
Expand All @@ -112,9 +124,9 @@ Parameters:
- [x] check on PR
- [x] regular check
- [x] manual dispatch
- [ ] compatiblity check
- [x] compatiblity check
- [x] manual dispatch
- [ ] auto test when release
- [x] auto test when release
- [x] helpers
- [x] comment translation
- [x] submodule update
Expand Down
74 changes: 74 additions & 0 deletions .github/workflows/auto_compatibility_test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
name: Compatibility Test

on:
pull_request:
paths:
- 'version.txt'
- '.compatibility'
# run at 03:00 of every Sunday(singapore time) so here is UTC time Saturday 16:00
schedule:
- cron: '0 19 * * 6'

jobs:
matrix_preparation:
name: Prepare Container List
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
steps:
- uses: actions/checkout@v3
- id: set-matrix
run: |
IFS=','
DOCKER_IMAGE=()

while read tag; do
DOCKER_IMAGE+=("\"hpcaitech/pytorch-cuda:${tag}\"")
done <.compatibility

container=$( IFS=',' ; echo "${DOCKER_IMAGE[*]}" )
container="[${container}]"
echo "$container"
echo "::set-output name=matrix::{\"container\":$(echo "$container")}"

build:
name: Test for PyTorch Compatibility
needs: matrix_preparation
if: github.repository == 'hpcaitech/ColossalAI'
runs-on: [self-hosted, gpu]
strategy:
fail-fast: false
matrix: ${{fromJson(needs.matrix_preparation.outputs.matrix)}}
container:
image: ${{ matrix.container }}
options: --gpus all --rm -v /data/scratch/cifar-10:/data/scratch/cifar-10
timeout-minutes: 120
steps:
- name: Install dependencies
run: |
pip install -U pip setuptools wheel --user
- uses: actions/checkout@v2
with:
repository: hpcaitech/TensorNVMe
ssh-key: ${{ secrets.SSH_KEY_FOR_CI }}
path: TensorNVMe
- name: Install tensornvme
run: |
cd TensorNVMe
conda install cmake
pip install -r requirements.txt
pip install -v .
- uses: actions/checkout@v2
with:
ssh-key: ${{ secrets.SSH_KEY_FOR_CI }}
- name: Install Colossal-AI
run: |
pip install -v --no-cache-dir .
pip install -r requirements/requirements-test.txt
- name: Unit Testing
run: |
PYTHONPATH=$PWD pytest tests
env:
DATA: /data/scratch/cifar-10
NCCL_SHM_DISABLE: 1
LD_LIBRARY_PATH: /github/home/.tensornvme/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Compatibility Test
name: Dispatch Compatibility Test

on:
workflow_dispatch:
Expand Down