Add llama model to examples (#473)#559

Closed

iseeyuan wants to merge 1 commit intopytorch:mainfrom

iseeyuan:export-D49734019

Contributor

iseeyuan commented Oct 2, 2023

Summary:

imported-using-ghimport

Test Plan: Imported from OSS

Reviewed By: cccclai

Differential Revision: D49734019

Pulled By: iseeyuan

netlify bot commented Oct 2, 2023 •

edited

Loading

✅ Deploy Preview for resplendent-gnome-14e531 ready!

Name	Link
🔨 Latest commit	`9a03603`
🔍 Latest deploy log	https://app.netlify.com/sites/resplendent-gnome-14e531/deploys/651f0a452d80d90008a47098
😎 Deploy Preview	https://deploy-preview-559--resplendent-gnome-14e531.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

facebook-github-bot added the CLA Signed label

Contributor

facebook-github-bot commented Oct 2, 2023

This pull request was exported from Phabricator. Differential Revision: D49734019

facebook-github-bot added the fb-exported label

Contributor

facebook-github-bot commented Oct 2, 2023

This pull request was exported from Phabricator. Differential Revision: D49734019

9 similar comments

Contributor

facebook-github-bot commented Oct 3, 2023

This pull request was exported from Phabricator. Differential Revision: D49734019

Contributor

facebook-github-bot commented Oct 4, 2023

This pull request was exported from Phabricator. Differential Revision: D49734019

Contributor

facebook-github-bot commented Oct 4, 2023

This pull request was exported from Phabricator. Differential Revision: D49734019

Contributor

facebook-github-bot commented Oct 5, 2023

This pull request was exported from Phabricator. Differential Revision: D49734019

Contributor

facebook-github-bot commented Oct 5, 2023

This pull request was exported from Phabricator. Differential Revision: D49734019

Contributor

facebook-github-bot commented Oct 5, 2023

This pull request was exported from Phabricator. Differential Revision: D49734019

Contributor

facebook-github-bot commented Oct 5, 2023

This pull request was exported from Phabricator. Differential Revision: D49734019

Contributor

facebook-github-bot commented Oct 5, 2023

This pull request was exported from Phabricator. Differential Revision: D49734019

Contributor

facebook-github-bot commented Oct 5, 2023

This pull request was exported from Phabricator. Differential Revision: D49734019


          Add llama model to examples (#559)

9a03603

Summary: Pull Request resolved: #559

Test Plan: Imported from OSS

Reviewed By: guangy10

Differential Revision: D49734019

Pulled By: iseeyuan

fbshipit-source-id: 293b08e8ae7d0a3823ae1485f2bf635f02123951

Contributor

facebook-github-bot commented Oct 5, 2023

This pull request was exported from Phabricator. Differential Revision: D49734019

facebook-github-bot closed this in

07aaad4

facebook-github-bot added the Merged label

Contributor

facebook-github-bot commented Oct 5, 2023

@iseeyuan merged this pull request in 07aaad4.

Gasoonjia pushed a commit that referenced this pull request


          Removing duplicate HF issue message from README (#559)

ac46947

Co-authored-by: Michael Gschwind <61328285+mikekgfb@users.noreply.github.com>

Gasoonjia pushed a commit that referenced this pull request


          make --device fast the default (#515)

* make --device fast the default

* Update iOS.md (#517)

* Update iOS.md

* Update iOS.md

* Pip to pip3 (#504)

* remove macos-12 test

* pip to pip3

* break aoti CI jobs separately (#500)

* init

* fixes

* more fixes

* fixes

* fix

* fix

* bug fix

* add objcopy update

* suppress int8

* undefined variable

---------

Co-authored-by: Michael Gschwind <mikekg@meta.com>

* Support llama3 in chat in run.cpp  (#486)

* refactor chat runner in preparation for llama3

* add sketch for llama3 prompt template and move to returning tokens

* fix tiktoken

* fixes to chat

* add default llama_ver

* Add tests for quantize json, add cuda device specification and precision to cuda.json (#519)

* remove code for no KV Cache path (#527)

* Update ADVANCED-USERS.md (#529)

Update Advanced Users description to reflect changes in the repo since the description was initially created.

* runner-aoti on cuda (#531)

* runner-aoti on cuda

* transfer results back to CPU

* transfer results back to CPU

* runner-aoti on cuda

* Update runner_build.md (#530)

Update description of runner and build process in runner_build.md

* clean up runner code a little (#532)

* clean up runner code a little

* update

* update

* pull out generate loop in chat

* updates

* edit docs

* typo

* move int8 linear class and function into qops.py (#534)

* add dtype tests for runner-aoti + runner-et (#539)

* add dtype tests for runner-aoti + runner-et

* typo

* Quantized embedding (#536)

* move int8 linear class and function into qops.py

* move Quantized Embedding to qops.py

* Move Linear int4 to qops (#537)

* move int8 linear class and function into qops.py

* move Quantized Embedding to qops.py

* move int4 linear to qops

* Revert "add dtype tests for runner-aoti + runner-et (#539)" (#548)

This reverts commit a7a24577a65be67ac9ae4dc05452f35d9c49e5d1.

* fix generate for llama3 (#538)

* fix generate for llama3

* switch more things to C

* remove C++ header

* add delegation visualization instructions (#551)

* Add dtype runner aoti (#552)

* add dtype tests for runner-aoti + runner-et

* typo

* add dtype test runner-aoti

* test sdpa with fp16 (#553)

* test sdpa with fp16

* kv cache fp32

* typo

* update (#560)

* Only support newest versions of lm-eval (#556)

Summary:
remove support for lm-eval 0.3 to reduce the options we have

Test Plan:
CI

Reviewers:

Subscribers:

Tasks:

Tags:

* split cpu eval CI by dtype (#554)

* split cpu eval CI by dtype

* fix

* differentiate names with checks

* keep one name the same as old

* fix

* Removing duplicate HF issue message from README (#559)

Co-authored-by: Michael Gschwind <61328285+mikekgfb@users.noreply.github.com>

* doc updates (#567)

* Add VM-safe MPS check

---------

Co-authored-by: Anthony Shoumikhin <anthony@shoumikh.in>
Co-authored-by: metascroy <161522778+metascroy@users.noreply.github.com>
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Co-authored-by: lucylq <lfq@meta.com>
Co-authored-by: Jerry Zhang <jerryzh168@gmail.com>
Co-authored-by: Jack-Khuu <jack.khuu.7@gmail.com>

Gasoonjia pushed a commit that referenced this pull request


          Quantization, fp acceleration, and testing (#572)

c980472

* code beautification

* code beautification, move functions together

* make --device fast the default (#515)

* make --device fast the default

* Update iOS.md (#517)

* Update iOS.md

* Update iOS.md

* Pip to pip3 (#504)

* remove macos-12 test

* pip to pip3

* break aoti CI jobs separately (#500)

* init

* fixes

* more fixes

* fixes

* fix

* fix

* bug fix

* add objcopy update

* suppress int8

* undefined variable

---------

Co-authored-by: Michael Gschwind <mikekg@meta.com>

* Support llama3 in chat in run.cpp  (#486)

* refactor chat runner in preparation for llama3

* add sketch for llama3 prompt template and move to returning tokens

* fix tiktoken

* fixes to chat

* add default llama_ver

* Add tests for quantize json, add cuda device specification and precision to cuda.json (#519)

* remove code for no KV Cache path (#527)

* Update ADVANCED-USERS.md (#529)

Update Advanced Users description to reflect changes in the repo since the description was initially created.

* runner-aoti on cuda (#531)

* runner-aoti on cuda

* transfer results back to CPU

* transfer results back to CPU

* runner-aoti on cuda

* Update runner_build.md (#530)

Update description of runner and build process in runner_build.md

* clean up runner code a little (#532)

* clean up runner code a little

* update

* update

* pull out generate loop in chat

* updates

* edit docs

* typo

* move int8 linear class and function into qops.py (#534)

* add dtype tests for runner-aoti + runner-et (#539)

* add dtype tests for runner-aoti + runner-et

* typo

* Quantized embedding (#536)

* move int8 linear class and function into qops.py

* move Quantized Embedding to qops.py

* Move Linear int4 to qops (#537)

* move int8 linear class and function into qops.py

* move Quantized Embedding to qops.py

* move int4 linear to qops

* Revert "add dtype tests for runner-aoti + runner-et (#539)" (#548)

This reverts commit a7a24577a65be67ac9ae4dc05452f35d9c49e5d1.

* fix generate for llama3 (#538)

* fix generate for llama3

* switch more things to C

* remove C++ header

* add delegation visualization instructions (#551)

* Add dtype runner aoti (#552)

* add dtype tests for runner-aoti + runner-et

* typo

* add dtype test runner-aoti

* test sdpa with fp16 (#553)

* test sdpa with fp16

* kv cache fp32

* typo

* update (#560)

* Only support newest versions of lm-eval (#556)

Summary:
remove support for lm-eval 0.3 to reduce the options we have

Test Plan:
CI

Reviewers:

Subscribers:

Tasks:

Tags:

* split cpu eval CI by dtype (#554)

* split cpu eval CI by dtype

* fix

* differentiate names with checks

* keep one name the same as old

* fix

* Removing duplicate HF issue message from README (#559)

Co-authored-by: Michael Gschwind <61328285+mikekgfb@users.noreply.github.com>

* doc updates (#567)

* Add VM-safe MPS check

---------

Co-authored-by: Anthony Shoumikhin <anthony@shoumikh.in>
Co-authored-by: metascroy <161522778+metascroy@users.noreply.github.com>
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Co-authored-by: lucylq <lfq@meta.com>
Co-authored-by: Jerry Zhang <jerryzh168@gmail.com>
Co-authored-by: Jack-Khuu <jack.khuu.7@gmail.com>

* add unpacking support (#525)

* add unpacking support

* fix typos and linter

* perform parallel prefill when possible (#568)

* perform parallel prefill when possible

* typo

* disable hack

* remove print

* remove debug messages which prevent export

* fixes

* stream results in generate.py (#571)

* remove logging interfering with export

---------

Co-authored-by: Anthony Shoumikhin <anthony@shoumikh.in>
Co-authored-by: metascroy <161522778+metascroy@users.noreply.github.com>
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Co-authored-by: lucylq <lfq@meta.com>
Co-authored-by: Jerry Zhang <jerryzh168@gmail.com>
Co-authored-by: Jack-Khuu <jack.khuu.7@gmail.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed fb-exported Merged