hpcaitech · tiandiao123 · Sep 1, 2023 · Sep 1, 2023 · Sep 1, 2023
diff --git a/colossalai/inference/README.md b/colossalai/inference/README.md
@@ -94,6 +94,8 @@ For various models, experiments were conducted using multiple batch sizes under
 
 ### Single GPU Performance:
 
+Currently the stats below are calculated based on A100 (single GPU), and we calculate token latency based on average values of context-forward and decoding forward process, which means we combine both of processes to calculate token generation times. We are actively developing new features and methods to furthur optimize the performance of LLM models. Please stay tuned. 
+
 #### Llama
 
 |       batch_size        |   8    |   16   |   32   |
@@ -103,7 +105,7 @@ For various models, experiments were conducted using multiple batch sizes under
 
 ![llama](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/inference/Infer-llama.png)
 
-###
+### Bloom
 
 |       batch_size        |   4    |   8    |
 | :---------------------: | :----: | :----: |