Drastically changed results for DeepSeek R1 0528 fp8 8k/1k

The previously reported results (taken from [here](https://newsletter.semianalysis.com/p/inferencemax-open-source-inference)):

<img width="1857" height="1121" alt="Image" src="https://github.com/user-attachments/assets/3eded888-98cc-43ed-8edd-546b9457ca03" />

The results from the website for the 11/04/2025, 03:41:04 UTC run:

<img width="1200" height="775" alt="Image" src="https://github.com/user-attachments/assets/7d8ccc4a-e55e-497e-a6a8-7ed0554fd061" />

The end-to-end latency results are generally lower (much lower for the GB200 NVL72), while the token throughput per GPU appears to be much diminished for anything other than the GB200 NVL72. Is this an expected change? If so, what is the reason?

If this is an expected change (e.g. due to a big fix or similar), it would be very helpful to have a high level changelog on the results website to note big changes - e.g. changing of inference engine that results in very different results, or a bugfix that has the same impact etc. i.e. something much higher level than what you would get trawling through the commit history (which has plenty of changes that have no expected impact on the reported results).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drastically changed results for DeepSeek R1 0528 fp8 8k/1k #192

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Drastically changed results for DeepSeek R1 0528 fp8 8k/1k #192

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions