Skip to content

Add ReadMe about MFU#2031

Merged
copybara-service[bot] merged 1 commit intomainfrom
mattdavidow-mfu-readme
Jul 31, 2025
Merged

Add ReadMe about MFU#2031
copybara-service[bot] merged 1 commit intomainfrom
mattdavidow-mfu-readme

Conversation

@gobbleturk
Copy link
Collaborator

@gobbleturk gobbleturk commented Jul 27, 2025

Description

Add a ReadMe section discussing Model flops utilization MFU (definition and how we report it).

We may want to add sections to this in the future (e.g. hardware utilizations or memory usage)

This is meant to help clarify the recent change about our attention flop calculation change (accounting for causality) in #1988

Note for reviewers: Click on display rich diff to see resultant markdown: https://screenshot.googleplex.com/9WxhjW8EV6PWJ9B

Tests

N/A readme

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed.

README.md Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not only causal, also chunked and local

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its an upper bound - doesn't mean its actually achievable, right ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, should we add something like this: "While achieving 100% is not practical due to many factors, the MFU score effectively shows how much room is left for optimization."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Note we've gotten 70% MFU before on v5p, I've heard 80%+ MFU (even bf16, probably also v5p), its theoretically possible to get pretty close

Copy link
Collaborator

@shralex shralex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Matt!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, should we add something like this: "While achieving 100% is not practical due to many factors, the MFU score effectively shows how much room is left for optimization."

@gobbleturk gobbleturk force-pushed the mattdavidow-mfu-readme branch from 000af45 to 853d8b0 Compare July 28, 2025 18:32
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to say anything more about local or chunked attention ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@gobbleturk gobbleturk force-pushed the mattdavidow-mfu-readme branch from 853d8b0 to 51317f8 Compare July 29, 2025 17:30
README.md Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dividing the attention the flops -> dividing the attention flops (no the)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@gobbleturk gobbleturk force-pushed the mattdavidow-mfu-readme branch from 51317f8 to 12141a3 Compare July 31, 2025 16:36
@gobbleturk gobbleturk requested a review from NuojCheng as a code owner July 31, 2025 16:36
@gobbleturk gobbleturk force-pushed the mattdavidow-mfu-readme branch 2 times, most recently from 4e2d5c0 to 924f4e0 Compare July 31, 2025 16:42
@gobbleturk gobbleturk force-pushed the mattdavidow-mfu-readme branch from 924f4e0 to df562c9 Compare July 31, 2025 16:43
@copybara-service copybara-service bot merged commit 816b876 into main Jul 31, 2025
20 checks passed
@copybara-service copybara-service bot deleted the mattdavidow-mfu-readme branch July 31, 2025 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments