From 488046c65b7bb743a9ac1042ff06e31a41b11f31 Mon Sep 17 00:00:00 2001 From: Yanjia0 <42895286+Yanjia0@users.noreply.github.com> Date: Fri, 26 Sep 2025 11:24:37 +0800 Subject: [PATCH 1/4] Update README.md text update --- README.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 790168d68a47..c40e819b832d 100644 --- a/README.md +++ b/README.md @@ -25,16 +25,16 @@ -## Get Started with Colossal-AI Without Setup +## Instantly Run Colossal-AI on Enterprise-Grade GPUs -Access high-end, on-demand compute for your research instantly—no setup needed. +Skip the setup. Access a powerful, pre-configured Colossal-AI environment on [**HPC-AI Cloud**](https://hpc-ai.com/?utm_source=github&utm_medium=social&utm_campaign=promotion-colossalai). -Sign up now and get $10 in credits! +Train your models and scale your AI workload in one click! -Limited Academic Bonuses: +* **NVIDIA Blackwell B200s**: Experience the next generation of AI performance ([See Benchmarks](https://hpc-ai.com/blog/b200)). Now available on cloud from **$2.47/hr**. +* **Cost-Effective H200 Cluster**: Get premier performance with on-demand rental from just **$1.99/hr**. -* Top up $1,000 and receive 300 credits -* Top up $500 and receive 100 credits +[**Get Started Now & Claim Your Free Credits →**](https://hpc-ai.com/?utm_source=github&utm_medium=social&utm_campaign=promotion-colossalai)
+### Colossal-AI Benchmark + +To see how these performance gains translate to real-world applications, we conducted a large language model training benchmark using Colossal-AI on Llama-like models. The tests were run on both 8-card and 16-card configurations for 7B and 70B models, respectively. + +| GPU | GPUs | Model Size | Parallelism | Batch Size per DP | Seqlen | Throughput | TFLOPS/GPU | Peak Mem(MiB) | +| :-----------------------------: | :--------: | :-------------: | :------------------: | :-----------: | :--------------: | :-------------: | :-------------: | :-------------: | +| H200 | 8 | 7B | zero2(dp8) | 36 | 4096 | 17.13 samp/s | 534.18 | 119040.02 | +| H200 | 16 | 70B | zero2 | 48 | 4096 | 3.27 samp/s | 469.1 | 150032.23 | +| B200 | 8 | 7B | zero1(dp2)+tp2+pp4 | 128 | 4096 | 25.83 samp/s | 805.69 | 100119.77 | +| H200 | 16 | 70B | zero1(dp2)+tp2+pp4 | 128 | 4096 | 5.66 samp/s | 811.79 | 100072.02 | + +The results from the Colossal-AI benchmark provide the most practical insight. For the 7B model on 8 cards, the **B200 achieved a 50% higher throughput** and a significant increase in TFLOPS per GPU. For the 70B model on 16 cards, the B200 again demonstrated a clear advantage, with **over 70% higher throughput and TFLOPS per GPU**. These numbers show that the B200's performance gains translate directly to faster training times for large-scale models. ## Latest News * [2025/02] [DeepSeek 671B Fine-Tuning Guide Revealed—Unlock the Upgraded DeepSeek Suite with One Click, AI Players Ecstatic!](https://company.hpc-ai.com/blog/shocking-release-deepseek-671b-fine-tuning-guide-revealed-unlock-the-upgraded-deepseek-suite-with-one-click-ai-players-ecstatic) From 0d1305b0e02bdfca000e547e3db1666e31986fbc Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Fri, 26 Sep 2025 03:58:35 +0000 Subject: [PATCH 4/4] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- README.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 21c0c45b3546..3977cdd25792 100644 --- a/README.md +++ b/README.md @@ -27,12 +27,12 @@ ## Instantly Run Colossal-AI on Enterprise-Grade GPUs -Skip the setup. Access a powerful, pre-configured Colossal-AI environment on [**HPC-AI Cloud**](https://hpc-ai.com/?utm_source=github&utm_medium=social&utm_campaign=promotion-colossalai). +Skip the setup. Access a powerful, pre-configured Colossal-AI environment on [**HPC-AI Cloud**](https://hpc-ai.com/?utm_source=github&utm_medium=social&utm_campaign=promotion-colossalai). -Train your models and scale your AI workload in one click! +Train your models and scale your AI workload in one click! -* **NVIDIA Blackwell B200s**: Experience the next generation of AI performance ([See Benchmarks](https://hpc-ai.com/blog/b200)). Now available on cloud from **$2.47/hr**. -* **Cost-Effective H200 Cluster**: Get premier performance with on-demand rental from just **$1.99/hr**. +* **NVIDIA Blackwell B200s**: Experience the next generation of AI performance ([See Benchmarks](https://hpc-ai.com/blog/b200)). Now available on cloud from **$2.47/hr**. +* **Cost-Effective H200 Cluster**: Get premier performance with on-demand rental from just **$1.99/hr**. [**Get Started Now & Claim Your Free Credits →**](https://hpc-ai.com/?utm_source=github&utm_medium=social&utm_campaign=promotion-colossalai) @@ -44,7 +44,7 @@ Train your models and scale your AI workload in one click! ### Colossal-AI Benchmark -To see how these performance gains translate to real-world applications, we conducted a large language model training benchmark using Colossal-AI on Llama-like models. The tests were run on both 8-card and 16-card configurations for 7B and 70B models, respectively. +To see how these performance gains translate to real-world applications, we conducted a large language model training benchmark using Colossal-AI on Llama-like models. The tests were run on both 8-card and 16-card configurations for 7B and 70B models, respectively. | GPU | GPUs | Model Size | Parallelism | Batch Size per DP | Seqlen | Throughput | TFLOPS/GPU | Peak Mem(MiB) | | :-----------------------------: | :--------: | :-------------: | :------------------: | :-----------: | :--------------: | :-------------: | :-------------: | :-------------: |