The previously reported results (taken from here):
The results from the website for the 11/04/2025, 03:41:04 UTC run:
The end-to-end latency results are generally lower (much lower for the GB200 NVL72), while the token throughput per GPU appears to be much diminished for anything other than the GB200 NVL72. Is this an expected change? If so, what is the reason?
If this is an expected change (e.g. due to a big fix or similar), it would be very helpful to have a high level changelog on the results website to note big changes - e.g. changing of inference engine that results in very different results, or a bugfix that has the same impact etc. i.e. something much higher level than what you would get trawling through the commit history (which has plenty of changes that have no expected impact on the reported results).
The previously reported results (taken from here):
The results from the website for the 11/04/2025, 03:41:04 UTC run:
The end-to-end latency results are generally lower (much lower for the GB200 NVL72), while the token throughput per GPU appears to be much diminished for anything other than the GB200 NVL72. Is this an expected change? If so, what is the reason?
If this is an expected change (e.g. due to a big fix or similar), it would be very helpful to have a high level changelog on the results website to note big changes - e.g. changing of inference engine that results in very different results, or a bugfix that has the same impact etc. i.e. something much higher level than what you would get trawling through the commit history (which has plenty of changes that have no expected impact on the reported results).