From aeeb71a69e98d7f0e3e7622e720e29436f2794db Mon Sep 17 00:00:00 2001
From: CjhHa1 <cjh18671720497@outlook.com>
Date: Fri, 1 Sep 2023 15:56:10 +0800
Subject: [PATCH 1/2] complete fig

---
 colossalai/inference/README.md | 32 +++++++++++++++++++++++++-------
 1 file changed, 25 insertions(+), 7 deletions(-)

diff --git a/colossalai/inference/README.md b/colossalai/inference/README.md
index 7228c51aa484..bdc1006a1819 100644
--- a/colossalai/inference/README.md
+++ b/colossalai/inference/README.md
@@ -55,7 +55,7 @@ dependencies
 
 ```bash
 pytorch= 1.13.1 (gpu)
-cuda>= 11.6 
+cuda>= 11.6
 transformers= 4.30.2
 triton==2.0.0.dev20221202
 # for install vllm, please use this branch to install https://github.com/tiandiao123/vllm/tree/setup_branch
@@ -66,11 +66,11 @@ flash-attention
 
 ### Docker
 
-You can use docker run to use docker container to set-up environment 
+You can use docker run to use docker container to set-up environment
 
 ```
-# env: python==3.8, cuda 11.6, pytorch == 1.13.1 triton==2.0.0.dev20221202, vllm kernels support, flash-attention-2 kernels support 
-docker pull hpcaitech/colossalai-inference:v2 
+# env: python==3.8, cuda 11.6, pytorch == 1.13.1 triton==2.0.0.dev20221202, vllm kernels support, flash-attention-2 kernels support
+docker pull hpcaitech/colossalai-inference:v2
 docker run -it --gpus all --name ANY_NAME -v $PWD:/workspace -w /workspace hpcaitech/colossalai-inference:v2 /bin/bash
 
 ```
@@ -88,10 +88,28 @@ python xx
 
 ### environment:
 
-We conducted [benchmark tests](https://github.com/hpcaitech/ColossalAI/blob/main/colossalai/shardformer/examples/performance_benchmark.py) to evaluate the performance. We compared the inference `latency` and `throughputs` between `colossal-inference` and `torch`.
+We conducted multiple benchmark tests to evaluate the performance. We compared the inference `latency` and `throughputs` between `colossal-inference` and original `hugging-face torch fp16`.
 
-We set the batch size to 4, the number of attention heads to 8, and the head dimension to 64. `N_CTX` refers to the sequence length.
+For various models, experiments were conducted using multiple batch sizes under the consistent model configuration of `7 billion(7b)` parameters, `1024` input length, and 128 output length. The obtained results are as follows (due to time constraints, the evaluation has currently been performed solely on the `A100` single GPU performance; multi-GPU performance will be addressed in the future):
 
-In the case of using 2 GPUs, the results are as follows.
+### Single GPU Performance:
+
+#### Llama
+
+|       batch_size        |   8    |   16   |   32   |
+| :---------------------: | :----: | :----: | :----: |
+| hugging-face torch fp16 | 199.12 | 246.56 | 246.56 |
+|   colossal-inference    | 241.12 | 451.84 | 643.52 |
+
+![llama](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/inference/Infer-llama.png)
 
 ###
+
+|       batch_size        |   4    |   4    |
+| :---------------------: | :----: | :----: |
+| hugging-face torch fp16 | 145.28 | 189.68 |
+|   colossal-inference    | 187.48 | 323.28 |
+
+![bloom](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/inference/Infer-bloom.png)
+
+The results of more models are coming soon!

From 611f3589270a9e601a318be539cfdaef3017c1f9 Mon Sep 17 00:00:00 2001
From: Jianghai <72591262+CjhHa1@users.noreply.github.com>
Date: Fri, 1 Sep 2023 16:02:37 +0800
Subject: [PATCH 2/2] Update README.md

---
 colossalai/inference/README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/colossalai/inference/README.md b/colossalai/inference/README.md
index bdc1006a1819..591d3c93a220 100644
--- a/colossalai/inference/README.md
+++ b/colossalai/inference/README.md
@@ -105,7 +105,7 @@ For various models, experiments were conducted using multiple batch sizes under
 
 ###
 
-|       batch_size        |   4    |   4    |
+|       batch_size        |   4    |   8    |
 | :---------------------: | :----: | :----: |
 | hugging-face torch fp16 | 145.28 | 189.68 |
 |   colossal-inference    | 187.48 | 323.28 |