Support bias correction in Adam and AdamW optimizers #1640

mt-caret · 2024-12-03T03:01:44Z

Proposed changes

The original implementation of AdamW doesn't include bias correction (#72). I found this causes problems when using it to learn a trivial task such as memorization using a GPT2-like architecture whereas the equivalent pytorch implementation doesn't exhibit this behavior; the changes in this PR resolve the issue.

I've manually confirmed that this matches Pytorch behavior up to some small floating point differences, and also replicated a simpler version of it in mlx tests which breaks for main but passes in my branch.

Checklist

Put an x in the boxes that apply.

I have read the CONTRIBUTING document
I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
I have added tests that prove my fix is effective or that my feature works
I have updated the necessary documentation (if needed)

guillaume-osmo · 2024-12-03T07:11:07Z

yes thanks for adding this option this is indeed impactful

angeloskath

Thanks for adding this. Several people have requested this so I think it is time to add it.

I left some comments regarding the implementation. I think it will be both cleaner and have the benefit of no extra cost in the case where we don't use mx.compile. Let me know what you think.

angeloskath · 2024-12-03T21:57:23Z

python/mlx/optimizers/optimizers.py

-        return parameter - lr * m / (mx.sqrt(v) + eps)
+        return parameter - step_size * m / (
+            mx.sqrt(v) / bias_correction2_sqrt + eps
+        ).astype(gradient.dtype)


Could you instead write this with a simple if? It would be faster in case someone is not compiling their step function. Namely something like the following:

if bias_correction: numerator = lr / (1 - b1**step) * m denominator = mx.sqrt(v) / (1 - b2**step) + eps return parameter - numerator / denominator else: return parameter - lr * m / (mx.sqrt(v) + eps)

That makes sense; done.

angeloskath · 2024-12-03T22:30:04Z

python/mlx/optimizers/optimizers.py


    Our Adam implementation follows the original paper and omits the bias
-    correction in the first and second moment estimates. In detail,
+    correction in the first and second moment estimates by default. In detail,


I would simply remove that comment and document the bias_correction argument below in the args.

I've adjusted the comment accordingly. Does this look reasonable to you?

angeloskath · 2024-12-03T22:30:35Z

python/mlx/optimizers/optimizers.py

-    correction in the first and second moments for AdamW. We update the weights
-    with a weight_decay (:math:`\lambda`) value:
+    correction in the first and second moments for AdamW by default. We update
+    the weights with a weight_decay (:math:`\lambda`) value:


Same as for Adam.

angeloskath

Looks good, thanks!

angeloskath · 2024-12-05T20:58:50Z

@mt-caret Could you skip the test if there is no PyTorch available? Then I can merge thanks!

mt-caret · 2024-12-05T23:34:38Z

I should've looked more carefully at the other tests which use torch, my bad. Fixed!

mt-caret · 2024-12-06T07:42:59Z

...and also applied the pre-commit thing 😅

Support bias correction in Adam and AdamW optimizers

f3c7de3

angeloskath requested changes Dec 3, 2024

View reviewed changes

respond to suggestions

71f5694

mt-caret requested a review from angeloskath December 4, 2024 05:48

angeloskath approved these changes Dec 5, 2024

View reviewed changes

skip test if pytorch doesn't exist

4f63272

apply pre-commit

91141b9

angeloskath merged commit fd3377d into ml-explore:main Dec 6, 2024
5 checks passed

mt-caret deleted the add-bias-correction-to-adam branch December 7, 2024 01:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support bias correction in Adam and AdamW optimizers #1640

Support bias correction in Adam and AdamW optimizers #1640

Uh oh!

mt-caret commented Dec 3, 2024

Uh oh!

guillaume-osmo commented Dec 3, 2024

Uh oh!

angeloskath left a comment

Uh oh!

angeloskath Dec 3, 2024

Uh oh!

mt-caret Dec 4, 2024

Uh oh!

angeloskath Dec 3, 2024

Uh oh!

mt-caret Dec 4, 2024

Uh oh!

angeloskath Dec 3, 2024

Uh oh!

angeloskath left a comment

Uh oh!

angeloskath commented Dec 5, 2024

Uh oh!

mt-caret commented Dec 5, 2024

Uh oh!

mt-caret commented Dec 6, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Support bias correction in Adam and AdamW optimizers #1640

Support bias correction in Adam and AdamW optimizers #1640

Uh oh!

Conversation

mt-caret commented Dec 3, 2024

Proposed changes

Checklist

Uh oh!

guillaume-osmo commented Dec 3, 2024

Uh oh!

angeloskath left a comment

Choose a reason for hiding this comment

Uh oh!

angeloskath Dec 3, 2024

Choose a reason for hiding this comment

Uh oh!

mt-caret Dec 4, 2024

Choose a reason for hiding this comment

Uh oh!

angeloskath Dec 3, 2024

Choose a reason for hiding this comment

Uh oh!

mt-caret Dec 4, 2024

Choose a reason for hiding this comment

Uh oh!

angeloskath Dec 3, 2024

Choose a reason for hiding this comment

Uh oh!

angeloskath left a comment

Choose a reason for hiding this comment

Uh oh!

angeloskath commented Dec 5, 2024

Uh oh!

mt-caret commented Dec 5, 2024

Uh oh!

mt-caret commented Dec 6, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants