Skip to content
Merged

Ra #2

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
517 commits
Select commit Hold shift + click to select a range
d6cc8f3
[workflow] added test-pypi check before release (#2591)
FrankLeeeee Feb 6, 2023
fd90245
[workflow] hooked docker release with lark (#2594)
FrankLeeeee Feb 6, 2023
0c03802
[workflow] hooked pypi release with lark (#2596)
FrankLeeeee Feb 6, 2023
4d58289
[workflow] added cuda extension build test before release (#2598)
FrankLeeeee Feb 6, 2023
719c4d5
[doc] updated readme for CI/CD (#2600)
FrankLeeeee Feb 6, 2023
f7458d3
[release] v0.2.1 (#2602)
FrankLeeeee Feb 6, 2023
f566b0c
[workflow] fixed broken rellease workflows (#2604)
FrankLeeeee Feb 6, 2023
ae86be1
Automated submodule synchronization (#2607)
github-actions[bot] Feb 7, 2023
b3973b9
[workflow] fixed test coverage report (#2611)
FrankLeeeee Feb 7, 2023
aa7e9e4
[workflow] fixed the test coverage report (#2614)
FrankLeeeee Feb 7, 2023
8518263
[test] fixed the triton version for testing (#2608)
FrankLeeeee Feb 7, 2023
93fdd35
[build] fixed the doc build process (#2618)
FrankLeeeee Feb 7, 2023
0556f5d
[tutorial] add video link (#2619)
binmakeswell Feb 7, 2023
291b051
[doc] fixed broken badge (#2623)
FrankLeeeee Feb 7, 2023
6ba8364
[autochunk] support diffusion for autochunk (#2621)
oahzxl Feb 7, 2023
4ae02c4
[tutorial] added energonai to opt inference requirements (#2625)
FrankLeeeee Feb 7, 2023
90a9fdd
[autoparallel] Patch meta information of `torch.matmul` (#2584)
Cypher30 Feb 8, 2023
d348039
[doc] updated the sphinx theme (#2635)
FrankLeeeee Feb 8, 2023
292c81e
fix/transformer-verison (#2581)
Fazziekey Feb 8, 2023
c375563
[doc] removed pre-built wheel installation from readme (#2637)
FrankLeeeee Feb 8, 2023
cb3d1be
[autoparallel] adapt autoparallel tests with latest api (#2626)
YuliangLiu0306 Feb 8, 2023
28398f1
add overlap option (#2613)
YuliangLiu0306 Feb 8, 2023
37df666
[autoparallel] refactor handlers which reshape input tensors (#2615)
YuliangLiu0306 Feb 8, 2023
a020eec
[doc] fix typo of BLOOM (#2643)
binmakeswell Feb 8, 2023
85b2303
[doc] migrate the markdown files (#2652)
FrankLeeeee Feb 9, 2023
a4ae43f
[doc] added docusaurus-based version control (#2656)
FrankLeeeee Feb 9, 2023
cd4f02b
[doc] fixed compatiblity with docusaurus (#2657)
FrankLeeeee Feb 9, 2023
a255a38
[example] Polish README.md (#2658)
JThh Feb 9, 2023
94f87f9
[workflow] fixed gpu memory check condition (#2659)
FrankLeeeee Feb 10, 2023
b673e5f
[release] v0.2.2 (#2661)
FrankLeeeee Feb 10, 2023
0385b26
[autoparallel] Patch meta information of `torch.nn.LayerNorm` (#2647)
Cypher30 Feb 10, 2023
8de8505
[Docs] layout converting management (#2665)
YuliangLiu0306 Feb 10, 2023
85bd298
Update README-zh-Hans.md
binmakeswell Feb 10, 2023
9ab14b2
[doc] add CVPR tutorial (#2666)
binmakeswell Feb 10, 2023
81ea66d
[release] v0.2.3 (#2669)
FrankLeeeee Feb 13, 2023
6d60634
[doc] added documentation sidebar translation (#2670)
FrankLeeeee Feb 13, 2023
0966008
[dooc] fixed the sidebar itemm key (#2672)
FrankLeeeee Feb 13, 2023
8213f89
[gemini] add fake_release_chunk for keep-gathered chunk in the infere…
1SAA Feb 13, 2023
327bc06
[workflow] added doc build test (#2675)
FrankLeeeee Feb 13, 2023
40c916b
[autoparallel] Patch meta information of `torch.nn.functional.softmax…
Cypher30 Feb 13, 2023
c44fd0c
[workflow] added trigger to build doc upon release (#2678)
FrankLeeeee Feb 13, 2023
5cd8cae
[workflow] fixed communtity report ranking (#2680)
FrankLeeeee Feb 13, 2023
f0aa191
[gemini] fix colo_init_context (#2683)
ver217 Feb 13, 2023
df4f020
[zero1&2] only append parameters with gradients (#2681)
1SAA Feb 13, 2023
8841601
Automated submodule synchronization (#2648)
github-actions[bot] Feb 13, 2023
46f20ba
[doc] update auto parallel paper link (#2686)
binmakeswell Feb 13, 2023
1712da2
[NFC] polish colossalai/gemini/gemini_context.py code style (#2690)
Shawn-Kong Feb 14, 2023
56ff192
[NFC] polish colossalai/context/moe_context.py code style (#2693)
Gy-Lu Feb 14, 2023
534f68c
[NFC] polish pipeline process group code style (#2694)
kurisusnowdeng Feb 14, 2023
c3abdd0
[release] update version (#2691)
ver217 Feb 14, 2023
6427c40
[NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_hand…
MaruyamaAya Feb 14, 2023
1b34701
[app] add chatgpt application (#2698)
ver217 Feb 14, 2023
8408c85
[app] fix ChatGPT requirements (#2704)
binmakeswell Feb 14, 2023
6a8cd68
[doc] add ChatGPT (#2703)
binmakeswell Feb 14, 2023
71deddc
[doc] resize figure (#2705)
binmakeswell Feb 14, 2023
94f0005
[doc] add Quick Preview (#2706)
binmakeswell Feb 14, 2023
d701ef8
Automated submodule synchronization (#2707)
github-actions[bot] Feb 15, 2023
4ac8bfb
[NFC] polish colossalai/engine/gradient_handler/utils.py code style (…
CZYCW Feb 15, 2023
7fa6be4
[autoparallel] test compatibility for gemini and auto parallel (#2700)
YuliangLiu0306 Feb 15, 2023
0b2a738
[autoparallel] remove deprecated codes (#2664)
YuliangLiu0306 Feb 15, 2023
d03f442
add ci (#2641)
Fazziekey Feb 15, 2023
b3d10db
[NFC] polish colossalai/cli/launcher/__init__.py code style (#2709)
zengzh95 Feb 15, 2023
89f8975
[workflow] fixed tensor-nvme build caching (#2711)
FrankLeeeee Feb 15, 2023
cb2c6a2
[autoparallel] refactor runtime pass (#2644)
YuliangLiu0306 Feb 15, 2023
4603538
[NFC] posh colossalai/context/process_group_initializer/initializer_s…
Wesley-Jzy Feb 15, 2023
f6b4ca4
[devops] add chatgpt ci (#2713)
ver217 Feb 15, 2023
d4d3387
[doc] add open-source contribution invitation (#2714)
binmakeswell Feb 15, 2023
2045d45
[doc] updated documentation version list (#2715)
FrankLeeeee Feb 15, 2023
5b24987
[autoparallel] fix parameters sharding bug (#2716)
YuliangLiu0306 Feb 15, 2023
21d6a48
[autoparallel] add shard option (#2696)
YuliangLiu0306 Feb 15, 2023
9c0943e
[chatgpt] optimize generation kwargs (#2717)
ver217 Feb 15, 2023
7aacfad
fix typo (#2721)
lich99 Feb 15, 2023
51c45c2
[NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_hand…
yuxuan-lou Feb 15, 2023
e81caeb
[NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/cost_gr…
XueFuzhao Feb 15, 2023
d344313
[NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_hand…
ziyuhuang123 Feb 15, 2023
c5be83a
Update version.txt (#2727)
binmakeswell Feb 15, 2023
5479fdd
[doc] updated documentation version list (#2730)
FrankLeeeee Feb 15, 2023
43dffda
[doc] fixed a typo in GPT readme (#2736)
cloudhuang Feb 15, 2023
8331420
[NFC] polish colossalai/cli/cli.py code style (#2734)
wangbo-zhao Feb 15, 2023
1819373
[NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_hand…
Feb 15, 2023
c9e3ee3
[NFC] polish colossalai/context/process_group_initializer/initializer…
ziruizhu Feb 15, 2023
ae86a29
Refact method of grad store (#2687)
yhna940 Feb 15, 2023
1dc003c
[autoparallel] distinguish different parallel strategies (#2699)
YuliangLiu0306 Feb 15, 2023
2fd528b
[NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/graph_a…
xyupeng Feb 15, 2023
93b788b
Merge branch 'main' into fix/format
binmakeswell Feb 15, 2023
30aee9c
[NFC] polish code format
binmakeswell Feb 15, 2023
b6e3b95
Update README.md
fastalgo Feb 15, 2023
648183a
[chatgpt]fix train_rm bug with lora (#2741)
ht-zhou Feb 16, 2023
613efeb
[chatgpt] support colossalai strategy to train rm (#2742)
ht-zhou Feb 16, 2023
e376954
[doc] add opt service doc (#2747)
FrankLeeeee Feb 16, 2023
d6d6dec
[doc] update example and OPT serving link (#2769)
binmakeswell Feb 16, 2023
a88bc82
[chatgpt] disable shard init for colossalai (#2767)
ver217 Feb 16, 2023
0106615
Don't use `torch._six` (#2775)
malfet Feb 17, 2023
ba84cd8
fix pip install colossal (#2764)
Fazziekey Feb 17, 2023
8e3f66a
[zero] fix wrong import (#2777)
Cypher30 Feb 17, 2023
a2b43e3
[autoparallel] Patch meta information of `torch.nn.Embedding` (#2760)
Cypher30 Feb 17, 2023
4ee311c
[chatgpt] startegy add prepare method (#2766)
ver217 Feb 17, 2023
a619a19
[chatgpt] update readme about checkpoint (#2792)
ver217 Feb 17, 2023
56ddc9c
[hotfix] add correct device for fake_param (#2796)
1SAA Feb 17, 2023
09f4574
[doc] update OPT serving (#2804)
binmakeswell Feb 17, 2023
8593ae1
[autoparallel] rotor solver refactor (#2813)
Cypher30 Feb 18, 2023
dbd0fd1
[CI/CD] fix nightly release CD running on forked repo (#2812)
Gy-Lu Feb 18, 2023
2059fdd
[hotfix] add copyright for solver and device mesh (#2803)
YuliangLiu0306 Feb 18, 2023
cf6409d
Hotfix/auto parallel zh doc (#2820)
YuliangLiu0306 Feb 19, 2023
bf02046
[exmaple] add bert and albert (#2824)
feifeibear Feb 20, 2023
89f0017
Typo (#2826)
gothicx Feb 20, 2023
58abde2
Update README.md (#2791)
mickogoin Feb 20, 2023
c008d4a
[NFC] polish colossalai/engine/schedule/_pipeline_schedule.py code st…
MichelleMa8 Feb 20, 2023
b6a108c
[chatgpt] add test checkpoint (#2797)
ver217 Feb 20, 2023
47ecb22
[example] add LoRA support (#2821)
haofanwang Feb 20, 2023
a572122
Automated submodule synchronization (#2740)
github-actions[bot] Feb 20, 2023
7ea6bc7
[autoparallel] Patch tensor related operations meta information (#2789)
Cypher30 Feb 20, 2023
918bc94
[triton] added copyright information for flash attention (#2835)
FrankLeeeee Feb 21, 2023
3eebc4d
[chatgpt] fix rm eval (#2829)
ht-zhou Feb 21, 2023
9353464
[cli] handled version check exceptions (#2848)
FrankLeeeee Feb 21, 2023
5979143
[doc] fix typo in opt inference tutorial (#2849)
Aiemu Feb 21, 2023
34ca324
[chatgpt] Support saving ckpt in examples (#2846)
ht-zhou Feb 22, 2023
fcc4097
[autoparallel] Patch meta information of `torch.tanh()` and `torch.nn…
Cypher30 Feb 22, 2023
c7764d3
[autoparallel] Patch meta information of `torch.where` (#2822)
Cypher30 Feb 22, 2023
eae77c8
[autoparallel] Patch meta information for nodes that will not be hand…
Cypher30 Feb 22, 2023
55424a1
[doc] fix GPT tutorial (#2860)
dawei-wang Feb 22, 2023
a4fc125
Fix typos (#2863)
koking0 Feb 22, 2023
6e4ac08
[hotfix] fix chunk size can not be divided (#2867)
1SAA Feb 22, 2023
c52edcf
Rename class method of ZeroDDP (#2692)
junxu Feb 22, 2023
2e16f84
[chatgpt]support opt & gpt for rm training (#2876)
ht-zhou Feb 22, 2023
0f392d7
[autoparallel] find repeat blocks (#2854)
YuliangLiu0306 Feb 23, 2023
819e25d
[hotfix] fix autoparallel compatibility test issues (#2754)
YuliangLiu0306 Feb 23, 2023
8c8a39b
[hotfix]: Remove math.prod dependency (#2837)
JThh Feb 23, 2023
e33c043
[workflow] moved pre-commit to post-commit (#2895)
FrankLeeeee Feb 24, 2023
dbc01b9
Update README.md
fastalgo Feb 25, 2023
7b13f7d
[zero] trivial zero optimizer refactoring (#2869)
yhna940 Feb 27, 2023
0afb55f
[doc] add os scope, update tutorial install and tips (#2914)
binmakeswell Feb 27, 2023
12bafe0
[doc] update installation for GPT (#2922)
binmakeswell Feb 27, 2023
da05628
[format] applied code formatting on changed files in pull request 292…
github-actions[bot] Feb 27, 2023
eb5cf94
Automated submodule synchronization (#2927)
github-actions[bot] Feb 28, 2023
61e6878
fixed using zero with tp cannot access weight correctly
kurisusnowdeng Feb 27, 2023
a848091
Fix port exception type (#2925)
yhna940 Feb 28, 2023
197d0bf
[autoparallel] apply repeat block to reduce solving time (#2912)
YuliangLiu0306 Feb 28, 2023
77b88a3
[workflow] added auto doc test on PR (#2929)
FrankLeeeee Feb 28, 2023
9e3b8b7
[doc] removed read-the-docs (#2932)
FrankLeeeee Feb 28, 2023
b8804aa
[doc] added readme for documentation (#2935)
FrankLeeeee Feb 28, 2023
8264cd7
[doc] add env scope (#2933)
binmakeswell Feb 28, 2023
dca9893
[format] applied code formatting on changed files in pull request 293…
github-actions[bot] Feb 28, 2023
090f14f
[misc] add reference (#2930)
ver217 Feb 28, 2023
47fb214
[hotfix] add shard dim to aviod backward communication error (#2954)
YuliangLiu0306 Mar 1, 2023
489a956
[chatgpt]add inference example (#2944)
ht-zhou Mar 1, 2023
e414e40
[DTensor] implementation of dtensor (#2946)
YuliangLiu0306 Mar 1, 2023
0d07514
Automated submodule synchronization (#2951)
github-actions[bot] Mar 2, 2023
b0a8766
[doc] fix chatgpt inference typo (#2964)
binmakeswell Mar 2, 2023
bbf9c82
[ChatGPT] fix README (#2966)
Fazziekey Mar 2, 2023
82149e9
[chatgpt] fix inference demo loading bug (#2969)
ht-zhou Mar 2, 2023
c9e27f0
[chatgpt]fix lora bug (#2974)
ht-zhou Mar 2, 2023
9b4ceef
[doc] update news (#2983)
binmakeswell Mar 3, 2023
827a0af
Automated submodule synchronization (#2982)
github-actions[bot] Mar 3, 2023
19ad49f
[chatgpt] making experience support dp (#2971)
ver217 Mar 3, 2023
f5ca039
[chatgpt] fix lora gemini conflict in RM training (#2984)
ht-zhou Mar 3, 2023
0ff8406
[chatgpt] allow shard init and display warning (#2986)
ver217 Mar 3, 2023
3a5d93b
[kernel] cached the op kernel and fixed version check (#2886)
FrankLeeeee Mar 3, 2023
19fa0e5
Remove extraneous comma (#2993)
yasyf Mar 4, 2023
e0a1c13
[doc] added reference to related works (#2994)
FrankLeeeee Mar 4, 2023
823f3b9
[doc] add deepspeed citation and copyright (#2996)
ver217 Mar 4, 2023
35c8f4c
[refactor] restructure configuration files (#2977)
SauravMaheshkar Mar 5, 2023
52a5078
[doc] add ISC tutorial (#2997)
binmakeswell Mar 6, 2023
82503a9
[format] applied code formatting on changed files in pull request 299…
github-actions[bot] Mar 6, 2023
e588703
[chatgpt]fix inference model load (#2988)
ht-zhou Mar 7, 2023
287d604
[chatgpt] Add saving ckpt callback for PPO (#2880)
Gy-Lu Mar 7, 2023
55dcd30
[chatgpt] fix readme (#3025)
ht-zhou Mar 7, 2023
b42d3d2
[fx] remove depreciated algorithms. (#2312) (#2313)
super-dainiu Mar 7, 2023
400f630
[pipeline] Add Simplified Alpa DP Partition (#2507)
Wesley-Jzy Mar 7, 2023
cd2b0ea
[DTensor] refactor sharding spec (#2987)
YuliangLiu0306 Mar 7, 2023
e86d9bb
[format] applied code formatting on changed files in pull request 302…
github-actions[bot] Mar 7, 2023
2e427dd
[revert] recover "[refactor] restructure configuration files (#2977)"…
FrankLeeeee Mar 7, 2023
2cd6ba3
[workflow] fixed the post-commit failure when no formatting needed (#…
FrankLeeeee Mar 7, 2023
8fedc87
[workflow] supported conda package installation in doc test (#3028)
FrankLeeeee Mar 7, 2023
4269196
[hotfix] skip auto checkpointing tests (#3029)
YuliangLiu0306 Mar 7, 2023
c21b11e
change nn to models (#3032)
Fazziekey Mar 7, 2023
378d827
[doc] update nvme offload doc (#3014)
ver217 Mar 7, 2023
ea0b52c
[doc] specified operating system requirement (#3019)
FrankLeeeee Mar 7, 2023
29386a5
[DTensor] refactor CommSpec (#3034)
YuliangLiu0306 Mar 8, 2023
2ef855c
support shardinit option to avoid OPT OOM initializing problem (#3037)
nemoramo Mar 8, 2023
b51bfec
[chatgpt] change critic input as state (#3042)
wenjunyang Mar 8, 2023
2ca9728
[autochunk] refactor chunk memory estimation (#2762)
oahzxl Mar 8, 2023
af38884
[example] fixed opt model downloading from huggingface
tomekrut Mar 9, 2023
3606742
[example] fix redundant note (#3065)
binmakeswell Mar 9, 2023
faa8526
Automated submodule synchronization (#3062)
github-actions[bot] Mar 9, 2023
f19b49e
[booster] init module structure and definition (#3056)
FrankLeeeee Mar 9, 2023
91ccf97
[workflow] fixed doc build trigger condition (#3072)
FrankLeeeee Mar 9, 2023
416a50d
[doc] moved doc test command to bottom (#3075)
FrankLeeeee Mar 9, 2023
89aa792
[release] v0.2.6 (#3057)
FrankLeeeee Mar 10, 2023
8e4e860
[DTensor] implement layout converter (#3055)
YuliangLiu0306 Mar 10, 2023
e58a3c8
Fix the version of lightning and colossalai in Stable Diffusion envir…
Camille7777 Mar 10, 2023
10c61de
[autochunk] support vit (#3084)
oahzxl Mar 10, 2023
3213347
[doc] fixed typos in docs/README.md (#3082)
FrankLeeeee Mar 10, 2023
5d5f475
[diffusers] fix ci and docker (#3085)
Fazziekey Mar 10, 2023
fff98f0
[analyzer] a minimal implementation of static graph analyzer (#2852)
super-dainiu Mar 10, 2023
95a36ea
[kernel] added kernel loader to softmax autograd function (#3093)
FrankLeeeee Mar 10, 2023
02ae80b
[chatgpt]add flag of action mask in critic(#3086)
Fazziekey Mar 10, 2023
26db1cb
[release] v0.2.7 (#3094)
FrankLeeeee Mar 10, 2023
65a4dbd
[NVIDIA] Add FP8 example using TE (#3080)
ksivaman Mar 10, 2023
018936a
[tutorial] update notes for TransformerEngine (#3098)
binmakeswell Mar 10, 2023
c9dd036
[chatgpt] fix lora save bug (#3099)
ht-zhou Mar 10, 2023
145ccfd
[doc] add Intel cooperation for biomedicine (#3108)
binmakeswell Mar 11, 2023
191daf7
[chatgpt] type miss of kwargs (#3107)
hiko2msp Mar 12, 2023
453f7ae
prevent op_builder being installed in site-pkgs (#3104)
jeffra Mar 13, 2023
0aa92c0
Automated submodule synchronization (#3105)
github-actions[bot] Mar 13, 2023
0672b5a
[chatgpt] fix lora support for gpt (#3113)
ht-zhou Mar 13, 2023
68577fb
[chatgpt]Fix examples (#3116)
ht-zhou Mar 13, 2023
30dd13c
[autochunk] support complete benchmark (#3121)
oahzxl Mar 13, 2023
169ed4d
[workflow] purged extension cache before GPT test (#3128)
FrankLeeeee Mar 14, 2023
23cd5e2
[chatgpt]update ci (#3087)
ht-zhou Mar 14, 2023
86ac782
[test] added timm models to test model zoo (#3129)
FrankLeeeee Mar 14, 2023
ed8f60b
[lazyinit] refactor lazy tensor and lazy init ctx (#3131)
ver217 Mar 14, 2023
2eca4cd
[DTensor] refactor dtensor with new components (#3089)
YuliangLiu0306 Mar 14, 2023
1a46e71
[docker] Add opencontainers image-spec to `Dockerfile` (#3006)
SauravMaheshkar Mar 14, 2023
1216d1e
[tests] diffuser models in model zoo (#3136)
1SAA Mar 14, 2023
a674c63
[test] added torchvision models to test model zoo (#3132)
FrankLeeeee Mar 15, 2023
6d48eb0
[test] added transformers models to test model zoo (#3135)
FrankLeeeee Mar 15, 2023
14a1150
[tests] model zoo add torchaudio models (#3138)
ver217 Mar 15, 2023
ecd643f
[test] add torchrec models to test model zoo (#3139)
YuliangLiu0306 Mar 15, 2023
ed19290
[booster] implemented mixed precision class (#3151)
FrankLeeeee Mar 17, 2023
3c01280
[doc] add community contribution guide (#3153)
binmakeswell Mar 17, 2023
6ae8ed0
[lazyinit] add correctness verification (#3147)
ver217 Mar 17, 2023
c474fda
[chatgpt] fix ppo training hanging problem with gemini (#3162)
ver217 Mar 17, 2023
1e58d31
[chatgpt] fix trainer generate kwargs (#3166)
ver217 Mar 17, 2023
7548ca5
[chatgpt]Reward Model Training Process update (#3133)
ht-zhou Mar 20, 2023
20d1c99
[refactor] update docs (#3174)
SauravMaheshkar Mar 20, 2023
1ad3a63
[test] fixed torchrec model test (#3167)
FrankLeeeee Mar 20, 2023
a9b8402
[booster] added the accelerator implementation (#3159)
FrankLeeeee Mar 20, 2023
4e921cf
[examples] Solving the diffusion issue of incompatibility issue#3169 …
NatalieC323 Mar 20, 2023
085e7f4
[test] fixed torchrec registration in model zoo (#3177)
FrankLeeeee Mar 20, 2023
7bc0afc
updated flash attention usage
kurisusnowdeng Mar 17, 2023
9d644ff
Fix docstr for zero statedict (#3185)
yhna940 Mar 21, 2023
80aed29
[zero] Refactor ZeroContextConfig class using dataclass (#3186)
yhna940 Mar 21, 2023
258b433
[hotfix] layout converting issue (#3188)
YuliangLiu0306 Mar 21, 2023
18dbe76
[auto-parallel] add auto-offload feature (#3154)
zengzh95 Mar 21, 2023
e5f668f
[dreambooth] fixing the incompatibity in requirements.txt (#3190)
NatalieC323 Mar 21, 2023
e7f3bed
[booster] added the plugin base and torch ddp plugin (#3180)
FrankLeeeee Mar 21, 2023
b429529
[chatgpt] add supervised learning fine-tune code (#3183)
pgzhang Mar 22, 2023
f57d349
[FX] refactor experimental tracer and adapt it with hf models (#3157)
YuliangLiu0306 Mar 22, 2023
019a847
[Analyzer] fix analyzer tests (#3197)
YuliangLiu0306 Mar 22, 2023
e3ad88f
[booster] implemented the cluster module (#3191)
FrankLeeeee Mar 22, 2023
1e1b9d2
[chatgpt]support llama (#3070)
Fazziekey Mar 22, 2023
9998d5e
[chatgpt]add reward model code for deberta (#3199)
chengeharrison Mar 22, 2023
1893479
[auto] fix requirements typo for issue #3125 (#3209)
Suffoquer-fang Mar 23, 2023
f8289d4
[lazyinit] combine lazy tensor with dtensor (#3204)
ver217 Mar 23, 2023
cd142fb
[api] implemented the checkpoint io module (#3205)
FrankLeeeee Mar 23, 2023
4fd4bd9
[chatgpt] support instuct training (#3216)
Fazziekey Mar 23, 2023
fa97a9c
[chatgpt] unnify datasets (#3218)
Fazziekey Mar 23, 2023
bbac676
fix torch version (#3225)
Fazziekey Mar 23, 2023
7fb95c5
Merge pull request #1 from hpcaitech/main
jamesthesnake Mar 23, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
3 changes: 3 additions & 0 deletions .compatibility
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
1.12.0-11.3.0
1.11.0-11.3.0
1.10.1-11.3.0
16 changes: 16 additions & 0 deletions .cuda_ext.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"build": [
{
"torch_command": "pip install torch==1.12.1+cu102 torchvision==0.13.1+cu102 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu102",
"cuda_image": "hpcaitech/cuda-conda:10.2"
},
{
"torch_command": "pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113",
"cuda_image": "hpcaitech/cuda-conda:11.3"
},
{
"torch_command": "pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu116",
"cuda_image": "hpcaitech/cuda-conda:11.6"
}
]
}
36 changes: 36 additions & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
## 📌 Checklist before creating the PR

- [ ] I have created an issue for this PR for traceability
- [ ] The title follows the standard format: `[doc/gemini/tensor/...]: A concise description`
- [ ] I have added relevant tags if possible for us to better distinguish different PRs


## 🚨 Issue number

> Link this PR to your issue with words like fixed to automatically close the linked issue upon merge
>
> e.g. `fixed #1234`, `closed #1234`, `resolved #1234`



## 📝 What does this PR do?

> Summarize your work here.
> if you have any plots/diagrams/screenshots/tables, please attach them here.



## 💥 Checklist before requesting a review

- [ ] I have linked my PR to an issue ([instruction](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue))
- [ ] My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
- [ ] I have performed a self-review of my code
- [ ] I have added thorough tests.
- [ ] I have added docstrings for all the functions/methods I implemented

## ⭐️ Do you enjoy contributing to Colossal-AI?

- [ ] 🌝 Yes, I do.
- [ ] 🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.
157 changes: 157 additions & 0 deletions .github/workflows/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
# CI/CD

## Table of Contents

- [CI/CD](#cicd)
- [Table of Contents](#table-of-contents)
- [Overview](#overview)
- [Workflows](#workflows)
- [Code Style Check](#code-style-check)
- [Unit Test](#unit-test)
- [Example Test](#example-test)
- [Example Test on Dispatch](#example-test-on-dispatch)
- [Compatibility Test](#compatibility-test)
- [Compatibility Test on Dispatch](#compatibility-test-on-dispatch)
- [Release](#release)
- [User Friendliness](#user-friendliness)
- [Commmunity](#commmunity)
- [Configuration](#configuration)
- [Progress Log](#progress-log)

## Overview

Automation makes our development more efficient as the machine automatically run the pre-defined tasks for the contributors.
This saves a lot of manual work and allow the developer to fully focus on the features and bug fixes.
In Colossal-AI, we use [GitHub Actions](https://github.com/features/actions) to automate a wide range of workflows to ensure the robustness of the software.
In the section below, we will dive into the details of different workflows available.

## Workflows

Refer to this [documentation](https://docs.github.com/en/actions/managing-workflow-runs/manually-running-a-workflow) on how to manually trigger a workflow.
I will provide the details of each workflow below.

**A PR which changes the `version.txt` is considered as a release PR in the following coontext.**


### Code Style Check

| Workflow Name | File name | Description |
| ------------- | ----------------- | -------------------------------------------------------------------------------------------------------------- |
| `post-commit` | `post_commit.yml` | This workflow runs pre-commit checks for changed files to achieve code style consistency after a PR is merged. |

### Unit Test

| Workflow Name | File name | Description |
| ---------------------- | -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------- |
| `Build on PR` | `build_on_pr.yml` | This workflow is triggered when the label `Run build and Test` is assigned to a PR. It will run all the unit tests in the repository with 4 GPUs. |
| `Build on Schedule` | `build_on_schedule.yml` | This workflow will run the unit tests everyday with 8 GPUs. The result is sent to Lark. |
| `Report test coverage` | `report_test_coverage.yml` | This PR will put up a comment to report the test coverage results when `Build` is done. |

### Example Test

| Workflow Name | File name | Description |
| -------------------------- | ------------------------------- | ------------------------------------------------------------------------------ |
| `Test example on PR` | `example_check_on_pr.yml` | The example will be automatically tested if its files are changed in the PR |
| `Test example on Schedule` | `example_check_on_schedule.yml` | This workflow will test all examples every Sunday. The result is sent to Lark. |
| `Example Test on Dispatch` | `example_check_on_dispatch.yml` | Manually test a specified example. |

#### Example Test on Dispatch

This workflow is triggered by manually dispatching the workflow. It has the following input parameters:
- `example_directory`: the example directory to test. Multiple directories are supported and must be separated b$$y comma. For example, language/gpt, images/vit. Simply input language or simply gpt does not work.

### Compatibility Test

| Workflow Name | File name | Description |
| -------------------------------- | ------------------------------------ | -------------------------------------------------------------------------------------------------------------------- |
| `Compatibility Test on PR` | `compatibility_test_on_pr.yml` | Check Colossal-AI's compatiblity when `version.txt` is changed in a PR. |
| `Compatibility Test on Schedule` | `compatibility_test_on_schedule.yml` | This workflow will check the compatiblity of Colossal-AI against PyTorch specified in `.compatibility` every Sunday. |
| `Compatiblity Test on Dispatch` | `compatibility_test_on_dispatch.yml` | Test PyTorch Compatibility manually. |


#### Compatibility Test on Dispatch
This workflow is triggered by manually dispatching the workflow. It has the following input parameters:
- `torch version`:torch version to test against, multiple versions are supported but must be separated by comma. The default is value is all, which will test all available torch versions listed in this [repository](https://github.com/hpcaitech/public_assets/tree/main/colossalai/torch_build/torch_wheels).
- `cuda version`: cuda versions to test against, multiple versions are supported but must be separated by comma. The CUDA versions must be present in our [DockerHub repository](https://hub.docker.com/r/hpcaitech/cuda-conda).

> It only test the compatiblity of the main branch


### Release

| Workflow Name | File name | Description |
| ----------------------------------------------- | ------------------------------------------- | ------------------------------------------------------------------------------------------------------------- |
| `Draft GitHub Release Post` | `draft_github_release_post_after_merge.yml` | Compose a GitHub release post draft based on the commit history when a release PR is merged. |
| `Publish to PyPI` | `release_pypi_after_merge.yml` | Build and release the wheel to PyPI when a release PR is merged. The result is sent to Lark. |
| `Publish Nightly Version to PyPI` | `release_nightly_on_schedule.yml` | Build and release the nightly wheel to PyPI as `colossalai-nightly` every Sunday. The result is sent to Lark. |
| `Publish Docker Image to DockerHub after Merge` | `release_docker_after_merge.yml` | Build and release the Docker image to DockerHub when a release PR is merged. The result is sent to Lark. |
| `Check CUDA Extension Build Before Merge` | `cuda_ext_check_before_merge.yml` | Build CUDA extensions with different CUDA versions when a release PR is created. |
| `Publish to Test-PyPI Before Merge` | `release_test_pypi_before_merge.yml` | Release to test-pypi to simulate user installation when a release PR is created. |


### User Friendliness

| Workflow Name | File name | Description |
| ----------------------- | ----------------------- | -------------------------------------------------------------------------------------------------------------------------------------- |
| `issue-translate` | `translate_comment.yml` | This workflow is triggered when a new issue comment is created. The comment will be translated into English if not written in English. |
| `Synchronize submodule` | `submodule.yml` | This workflow will check if any git submodule is updated. If so, it will create a PR to update the submodule pointers. |
| `Close inactive issues` | `close_inactive.yml` | This workflow will close issues which are stale for 14 days. |

### Commmunity

| Workflow Name | File name | Description |
| -------------------------------------------- | -------------------------------- | -------------------------------------------------------------------------------- |
| `Generate Community Report and Send to Lark` | `report_leaderboard_to_lark.yml` | Collect contribution and user engagement stats and share with Lark every Friday. |

## Configuration

This section lists the files used to configure the workflow.

1. `.compatibility`

This `.compatibility` file is to tell GitHub Actions which PyTorch and CUDA versions to test against. Each line in the file is in the format `${torch-version}-${cuda-version}`, which is a tag for Docker image. Thus, this tag must be present in the [docker registry](https://hub.docker.com/r/pytorch/conda-cuda) so as to perform the test.

2. `.cuda_ext.json`

This file controls which CUDA versions will be checked against CUDA extenson built. You can add a new entry according to the json schema below to check the AOT build of PyTorch extensions before release.

```json
{
"build": [
{
"torch_command": "",
"cuda_image": ""
},
]
}
```

## Progress Log

- [x] Code style check
- [x] post-commit check
- [x] unit testing
- [x] test on PR
- [x] report test coverage
- [x] regular test
- [x] release
- [x] pypi release
- [x] test-pypi simulation
- [x] nightly build
- [x] docker build
- [x] draft release post
- [x] example check
- [x] check on PR
- [x] regular check
- [x] manual dispatch
- [x] compatiblity check
- [x] check on PR
- [x] manual dispatch
- [x] auto test when release
- [x] community
- [x] contribution report
- [x] user engagement report
- [x] helpers
- [x] comment translation
- [x] submodule update
- [x] close inactive issue
76 changes: 0 additions & 76 deletions .github/workflows/build.yml

This file was deleted.

Loading