Skip to content

Update: Fast GeLU Approximation#744

Merged
awni merged 5 commits intoml-explore:mainfrom
nkasmanoff:main
Feb 27, 2024
Merged

Update: Fast GeLU Approximation#744
awni merged 5 commits intoml-explore:mainfrom
nkasmanoff:main

Conversation

@nkasmanoff
Copy link
Copy Markdown
Contributor

Proposed changes

Update the fast approximation for GeLU activation.

Checklist

Put an x in the boxes that apply.

  • I have read the CONTRIBUTING document
  • [x ] I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
  • [x*] I have added tests that prove my fix is effective or that my feature works
  • [x ] I have updated the necessary documentation (if needed)
  • Same tests in place for GeLU should already apply

Comment thread python/mlx/nn/layers/activations.py Outdated
- https://arxiv.org/pdf/1606.08415.pdf
"""
return x * mx.sigmoid(1.773 * x)
return x * mx.sigmoid(1.702 * x)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC @angeloskath added this, maybe you had another implementation in mind?

@awni
Copy link
Copy Markdown
Member

awni commented Feb 26, 2024

@nkasmanoff could you check the tests that failed?

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
@nkasmanoff
Copy link
Copy Markdown
Contributor Author

@nkasmanoff could you check the tests that failed?

@awni looks like they passed now? I committed your suggestion and when the tests re-ran so I couldn't see what the last failure was.

@awni
Copy link
Copy Markdown
Member

awni commented Feb 26, 2024

@nkasmanoff can you see this: https://app.circleci.com/pipelines/github/ml-explore/mlx/1325/workflows/d7cbb23f-363b-4867-b1e2-9e9880b28bc9/jobs/3510

@angeloskath
Copy link
Copy Markdown
Member

So I can add some context to this and then we can choose what to do. The x * mx.sigmoid(1.773 * x) approximation is simply one I trained with SGD (in the range (-6, 6) IIRC). It is better than 1.702 in every way, absolute error, squared error etc.

Now having said that and given the fact that using mx.compile exact gelu is as fast as the approximations, it might make sense to change those to match others in the literature. For instance this can be changed to 1.702 and the other one can be changed to PyTorch's tanh one.

Wdyt?

@nkasmanoff
Copy link
Copy Markdown
Contributor Author

My impression is we want the 1.702 for fast, only to ensure consistency in the MLX adaptations of models made in transformers

My only concern is that if we keep 1.772, this causes the vision encoder in LlaVA to have seemingly worse performance when asked about images, as well as fail the tests @mzbac set up with the Transformers implementation of llava.

https://github.com/ml-explore/mlx-examples/pull/461/files#diff-62e0686eab4fe2568b5497693e3094c7c1ba582d48215096b674141dbfee3474R95

@angeloskath
Copy link
Copy Markdown
Member

Yeah I agree, I think we should change it. Just to be clear though one could always just write a simple one line activation function. There is no need to use gelu_fast_approx.

@awni
Copy link
Copy Markdown
Member

awni commented Feb 27, 2024

I think in it's current form it's a bit of a trap since it sounds like the fast gelu that has become slightly standard. I would probably change and encourage people to use the regular gelu in most cases.

I've been under the impression the sigmoid is the same as the tanh just implemented with a sigmoid, but I don't think I ever verified it. Where did that one come from?

@angeloskath
Copy link
Copy Markdown
Member

That one came from me training a y = x σ(α x (1 + β x^2)). I guess I was quite bored when writing those back when :-) . This is the error profile in comparison to the tanh approximation

gelu_error

Copy link
Copy Markdown
Member

@awni awni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!!

@awni awni merged commit de3d246 into ml-explore:main Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants