Skip to content

Conversation

@jbarrow
Copy link
Contributor

@jbarrow jbarrow commented Dec 8, 2023

Implementation of AdamW, but without correction of the first and second moments. Following the convention of the Adam implementation in the repo.

If you want the paper convention of correction, that implementation is here:

https://github.com/jbarrow/mlxllama/blob/cb2a808eefafc340db91e7d8f97ca334f6afff2a/mlxllama/optim.py

Copy link
Member

@angeloskath angeloskath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks great.

Do you mind instead of repeating the Adam implementation to extend Adam and do sth like calling

return super().apply_single(gradient, parameter * (1 - lr * wd))

This way if we want to change something down the line we only need to change it in one place.

@jbarrow
Copy link
Contributor Author

jbarrow commented Dec 8, 2023

Alright, made that change and tested it locally in mlxllama

Copy link
Member

@angeloskath angeloskath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!

@angeloskath angeloskath merged commit 69a24e6 into ml-explore:main Dec 8, 2023
@awni awni mentioned this pull request Dec 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants