Overview
Hybrid parallelism need at lease 8 processes to be fully tested. Our CI runner is run on a 4-gpu node and a 8-gpu node.
As the 8-gpu node is not always available, we run the most tests on 4-gpu node. And for those need 8-gpu, we run them regularly on 8-gpu node.
In short, we should add a marker to indicate which test should be run on 8-gpu node.
Overview
Hybrid parallelism need at lease 8 processes to be fully tested. Our CI runner is run on a 4-gpu node and a 8-gpu node.
As the 8-gpu node is not always available, we run the most tests on 4-gpu node. And for those need 8-gpu, we run them regularly on 8-gpu node.
In short, we should add a marker to indicate which test should be run on 8-gpu node.