Update: Fast GeLU Approximation by nkasmanoff · Pull Request #744 · ml-explore/mlx

nkasmanoff · 2024-02-26T14:26:33Z

Proposed changes

Update the fast approximation for GeLU activation.

Checklist

Put an x in the boxes that apply.

I have read the CONTRIBUTING document
[x ] I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
[x*] I have added tests that prove my fix is effective or that my feature works
[x ] I have updated the necessary documentation (if needed)

Same tests in place for GeLU should already apply

awni · 2024-02-26T14:35:16Z

+    - https://arxiv.org/pdf/1606.08415.pdf
    """
-    return x * mx.sigmoid(1.773 * x)
+    return x * mx.sigmoid(1.702 * x)


IIRC @angeloskath added this, maybe you had another implementation in mind?

awni · 2024-02-26T15:17:45Z

@nkasmanoff could you check the tests that failed?

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>

nkasmanoff · 2024-02-26T17:56:35Z

@nkasmanoff could you check the tests that failed?

@awni looks like they passed now? I committed your suggestion and when the tests re-ran so I couldn't see what the last failure was.

awni · 2024-02-26T18:28:24Z

@nkasmanoff can you see this: https://app.circleci.com/pipelines/github/ml-explore/mlx/1325/workflows/d7cbb23f-363b-4867-b1e2-9e9880b28bc9/jobs/3510

angeloskath · 2024-02-27T00:00:41Z

So I can add some context to this and then we can choose what to do. The x * mx.sigmoid(1.773 * x) approximation is simply one I trained with SGD (in the range (-6, 6) IIRC). It is better than 1.702 in every way, absolute error, squared error etc.

Now having said that and given the fact that using mx.compile exact gelu is as fast as the approximations, it might make sense to change those to match others in the literature. For instance this can be changed to 1.702 and the other one can be changed to PyTorch's tanh one.

Wdyt?

nkasmanoff · 2024-02-27T00:34:17Z

My impression is we want the 1.702 for fast, only to ensure consistency in the MLX adaptations of models made in transformers

My only concern is that if we keep 1.772, this causes the vision encoder in LlaVA to have seemingly worse performance when asked about images, as well as fail the tests @mzbac set up with the Transformers implementation of llava.

https://github.com/ml-explore/mlx-examples/pull/461/files#diff-62e0686eab4fe2568b5497693e3094c7c1ba582d48215096b674141dbfee3474R95

angeloskath · 2024-02-27T00:38:52Z

Yeah I agree, I think we should change it. Just to be clear though one could always just write a simple one line activation function. There is no need to use gelu_fast_approx.

awni · 2024-02-27T00:41:46Z

I think in it's current form it's a bit of a trap since it sounds like the fast gelu that has become slightly standard. I would probably change and encourage people to use the regular gelu in most cases.

I've been under the impression the sigmoid is the same as the tanh just implemented with a sigmoid, but I don't think I ever verified it. Where did that one come from?

angeloskath · 2024-02-27T01:14:46Z

That one came from me training a y = x σ(α x (1 + β x^2)). I guess I was quite bored when writing those back when :-) . This is the error profile in comparison to the tanh approximation

awni

Thank you!!

nkasmanoff added 3 commits February 26, 2024 09:21

add: fast gelu approx

f20853d

fix docs

5d94568

Update gelu_fast_approx function documentation

66fa314

awni reviewed Feb 26, 2024

View reviewed changes

Comment thread python/mlx/nn/layers/activations.py Outdated

awni reviewed Feb 26, 2024

View reviewed changes

Update python/mlx/nn/layers/activations.py

0252bc6

Co-authored-by: Awni Hannun <awni.hannun@gmail.com>

fix: test gelu

8e27752

A-Alanbri approved these changes Feb 27, 2024

View reviewed changes

awni approved these changes Feb 27, 2024

View reviewed changes

awni merged commit de3d246 into ml-explore:main Feb 27, 2024

angeloskath mentioned this pull request Feb 29, 2024

Add Starcoder 2 ml-explore/mlx-examples#502

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update: Fast GeLU Approximation#744

Update: Fast GeLU Approximation#744
awni merged 5 commits intoml-explore:mainfrom
nkasmanoff:main

nkasmanoff commented Feb 26, 2024

Uh oh!

Uh oh!

awni Feb 26, 2024

Uh oh!

awni commented Feb 26, 2024

Uh oh!

nkasmanoff commented Feb 26, 2024

Uh oh!

awni commented Feb 26, 2024

Uh oh!

angeloskath commented Feb 27, 2024

Uh oh!

nkasmanoff commented Feb 27, 2024

Uh oh!

angeloskath commented Feb 27, 2024

Uh oh!

awni commented Feb 27, 2024

Uh oh!

angeloskath commented Feb 27, 2024

Uh oh!

awni left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

nkasmanoff commented Feb 26, 2024

Proposed changes

Checklist

Uh oh!

Uh oh!

awni Feb 26, 2024

Choose a reason for hiding this comment

Uh oh!

awni commented Feb 26, 2024

Uh oh!

nkasmanoff commented Feb 26, 2024

Uh oh!

awni commented Feb 26, 2024

Uh oh!

angeloskath commented Feb 27, 2024

Uh oh!

nkasmanoff commented Feb 27, 2024

Uh oh!

angeloskath commented Feb 27, 2024

Uh oh!

awni commented Feb 27, 2024

Uh oh!

angeloskath commented Feb 27, 2024

Uh oh!

awni left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants