From 488046c65b7bb743a9ac1042ff06e31a41b11f31 Mon Sep 17 00:00:00 2001
From: Yanjia0 <42895286+Yanjia0@users.noreply.github.com>
Date: Fri, 26 Sep 2025 11:24:37 +0800
Subject: [PATCH 1/4] Update README.md

text update
---
 README.md | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/README.md b/README.md
index 790168d68a47..c40e819b832d 100644
--- a/README.md
+++ b/README.md
@@ -25,16 +25,16 @@
 
 </div>
 
-## Get Started with Colossal-AI Without Setup
+## Instantly Run Colossal-AI on Enterprise-Grade GPUs
 
-Access high-end, on-demand compute for your research instantly—no setup needed.
+Skip the setup. Access a powerful, pre-configured Colossal-AI environment on [**HPC-AI Cloud**](https://hpc-ai.com/?utm_source=github&utm_medium=social&utm_campaign=promotion-colossalai). 
 
-Sign up now and get $10 in credits!
+Train your models and scale your AI workload in one click! 
 
-Limited Academic Bonuses:
+* **NVIDIA Blackwell B200s**: Experience the next generation of AI performance ([See Benchmarks](https://hpc-ai.com/blog/b200)). Now available on cloud from **$2.47/hr**. 
+* **Cost-Effective H200 Cluster**: Get premier performance with on-demand rental from just **$1.99/hr**. 
 
-* Top up $1,000 and receive 300 credits
-* Top up $500 and receive 100 credits
+[**Get Started Now & Claim Your Free Credits →**](https://hpc-ai.com/?utm_source=github&utm_medium=social&utm_campaign=promotion-colossalai)
 
 <div align="center">
    <a href="https://hpc-ai.com/?utm_source=github&utm_medium=social&utm_campaign=promotion-colossalai">

From 6cf950a48a4df0486fb3758088c007baa502bb6a Mon Sep 17 00:00:00 2001
From: Yanjia0 <42895286+Yanjia0@users.noreply.github.com>
Date: Fri, 26 Sep 2025 11:32:19 +0800
Subject: [PATCH 2/4] Update README.md

image update
---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index c40e819b832d..109fc3a72fbc 100644
--- a/README.md
+++ b/README.md
@@ -38,7 +38,7 @@ Train your models and scale your AI workload in one click!
 
 <div align="center">
    <a href="https://hpc-ai.com/?utm_source=github&utm_medium=social&utm_campaign=promotion-colossalai">
-   <img src="https://github.com/hpcaitech/public_assets/blob/main/colossalai/img/2-2.gif" width="850" />
+   <img src="https://github.com/hpcaitech/public_assets/blob/main/colossalai/img/2-3.png" width="850" />
    </a>
 </div>
 

From 4ae9e8c8656876cc92dd206e211eb792aa6f0413 Mon Sep 17 00:00:00 2001
From: Yanjia0 <42895286+Yanjia0@users.noreply.github.com>
Date: Fri, 26 Sep 2025 11:56:19 +0800
Subject: [PATCH 3/4] Update README.md

add benchamrk
---
 README.md | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/README.md b/README.md
index 109fc3a72fbc..21c0c45b3546 100644
--- a/README.md
+++ b/README.md
@@ -42,6 +42,18 @@ Train your models and scale your AI workload in one click!
    </a>
 </div>
 
+### Colossal-AI Benchmark
+
+To see how these performance gains translate to real-world applications, we conducted a large language model training benchmark using Colossal-AI on Llama-like models. The tests were run on both 8-card and 16-card configurations for 7B and 70B models, respectively. 
+
+|              GPU              |  GPUs  | Model Size |    Parallelism    | Batch Size per DP | Seqlen | Throughput | TFLOPS/GPU  | Peak Mem(MiB)  |
+| :-----------------------------: | :--------: | :-------------: | :------------------: | :-----------: | :--------------: | :-------------: | :-------------: | :-------------: |
+|         H200            |     8     |      7B       |   zero2(dp8)     | 36 |        4096     |       17.13 samp/s     |       534.18     |       119040.02     |
+|         H200            |     16     |      70B       |   zero2     | 48 |        4096     |       3.27 samp/s     |       469.1     |       150032.23     |
+|         B200            |     8     |      7B       |   zero1(dp2)+tp2+pp4     | 128 |        4096     |       25.83 samp/s     |       805.69     |       100119.77     |
+|         H200            |     16     |      70B       |   zero1(dp2)+tp2+pp4     | 128 |        4096     |       5.66 samp/s     |       811.79     |       100072.02     |
+
+The results from the Colossal-AI benchmark provide the most practical insight. For the 7B model on 8 cards, the **B200 achieved a 50% higher throughput** and a significant increase in TFLOPS per GPU. For the 70B model on 16 cards, the B200 again demonstrated a clear advantage, with **over 70% higher throughput and TFLOPS per GPU**. These numbers show that the B200's performance gains translate directly to faster training times for large-scale models.
 
 ## Latest News
 * [2025/02] [DeepSeek 671B Fine-Tuning Guide Revealed—Unlock the Upgraded DeepSeek Suite with One Click, AI Players Ecstatic!](https://company.hpc-ai.com/blog/shocking-release-deepseek-671b-fine-tuning-guide-revealed-unlock-the-upgraded-deepseek-suite-with-one-click-ai-players-ecstatic)

From 0d1305b0e02bdfca000e547e3db1666e31986fbc Mon Sep 17 00:00:00 2001
From: "pre-commit-ci[bot]"
 <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date: Fri, 26 Sep 2025 03:58:35 +0000
Subject: [PATCH 4/4] [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci
---
 README.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/README.md b/README.md
index 21c0c45b3546..3977cdd25792 100644
--- a/README.md
+++ b/README.md
@@ -27,12 +27,12 @@
 
 ## Instantly Run Colossal-AI on Enterprise-Grade GPUs
 
-Skip the setup. Access a powerful, pre-configured Colossal-AI environment on [**HPC-AI Cloud**](https://hpc-ai.com/?utm_source=github&utm_medium=social&utm_campaign=promotion-colossalai). 
+Skip the setup. Access a powerful, pre-configured Colossal-AI environment on [**HPC-AI Cloud**](https://hpc-ai.com/?utm_source=github&utm_medium=social&utm_campaign=promotion-colossalai).
 
-Train your models and scale your AI workload in one click! 
+Train your models and scale your AI workload in one click!
 
-* **NVIDIA Blackwell B200s**: Experience the next generation of AI performance ([See Benchmarks](https://hpc-ai.com/blog/b200)). Now available on cloud from **$2.47/hr**. 
-* **Cost-Effective H200 Cluster**: Get premier performance with on-demand rental from just **$1.99/hr**. 
+* **NVIDIA Blackwell B200s**: Experience the next generation of AI performance ([See Benchmarks](https://hpc-ai.com/blog/b200)). Now available on cloud from **$2.47/hr**.
+* **Cost-Effective H200 Cluster**: Get premier performance with on-demand rental from just **$1.99/hr**.
 
 [**Get Started Now & Claim Your Free Credits →**](https://hpc-ai.com/?utm_source=github&utm_medium=social&utm_campaign=promotion-colossalai)
 
@@ -44,7 +44,7 @@ Train your models and scale your AI workload in one click!
 
 ### Colossal-AI Benchmark
 
-To see how these performance gains translate to real-world applications, we conducted a large language model training benchmark using Colossal-AI on Llama-like models. The tests were run on both 8-card and 16-card configurations for 7B and 70B models, respectively. 
+To see how these performance gains translate to real-world applications, we conducted a large language model training benchmark using Colossal-AI on Llama-like models. The tests were run on both 8-card and 16-card configurations for 7B and 70B models, respectively.
 
 |              GPU              |  GPUs  | Model Size |    Parallelism    | Batch Size per DP | Seqlen | Throughput | TFLOPS/GPU  | Peak Mem(MiB)  |
 | :-----------------------------: | :--------: | :-------------: | :------------------: | :-----------: | :--------------: | :-------------: | :-------------: | :-------------: |