Added more generic monkey patch function by shimizust · Pull Request #42 · linkedin/Liger-Kernel

shimizust · 2024-08-17T08:09:44Z

Summary

Added a more generic monkey patch function to be used primarily in transformers integration. Map the specified model_type to the corresponding monkey patch function.
Use of model_type (e.g. llama) will more broadly cover cases compared to specifying model architecture (e.g. LlamaForCausalLM, LlamaForQuestionAnswering, etc...)

Testing Done

run make test to ensure correctness
run make checkstyle to ensure code style
run make test-convergence to ensure convergence

jobuser [ ~/Liger-Kernel ]$ make checkstyle
flake8 .; flake8_status=$?; \
isort .; isort_status=$?; \
black .; black_status=$?; \
if [ $flake8_status -ne 0 ] || [ $isort_status -ne 0 ] || [ $black_status -ne 0 ]; then \
        exit 1; \
fi
Skipped 1 files
All done! ✨ 🍰 ✨
45 files left unchanged.
jobuser [ ~/Liger-Kernel ]$ make test
pytest --disable-warnings test/ --ignore=test/convergence
===================================================================================================================== test session starts ======================================================================================================================
platform linux -- Python 3.10.14, pytest-7.1.2, pluggy-1.0.0
rootdir: /home/jobuser/Liger-Kernel
plugins: lipy-config-base-30.6.1, lipy-fabric-35.2.3, lipy-test-8.0.52, datadir-1.3.1, lipy-mp-34.4.191
collected 114 items                                                                                                                                                                                                                                            

test/transformers/test_cross_entropy.py ..........................................................                                                                                                                                                       [ 50%]
test/transformers/test_fused_linear_cross_entropy.py ......                                                                                                                                                                                              [ 56%]
test/transformers/test_geglu.py ........                                                                                                                                                                                                                 [ 63%]
test/transformers/test_rms_norm.py ................                                                                                                                                                                                                      [ 77%]
test/transformers/test_rope.py ............                                                                                                                                                                                                              [ 87%]
test/transformers/test_swiglu.py ........                                                                                                                                                                                                                [ 94%]
test/transformers/test_trainer_integration.py ...                                                                                                                                                                                                        [ 97%]
test/transformers/test_transformers_monkey_patch.py .                                                                                                                                                                                                    [ 98%]
test/triton/test_triton_monkey_patch.py ..                                                                                                                                                                                                               [100%]

================================================================================================================ 114 passed in 63.54s (0:01:03) ================================================================================================================
jobuser [ ~/Liger-Kernel ]$ make test-convergence
HF_DATASETS_OFFLINE=1 pytest --disable-warnings test/convergence
===================================================================================================================== test session starts ======================================================================================================================
platform linux -- Python 3.10.14, pytest-7.1.2, pluggy-1.0.0
rootdir: /home/jobuser/Liger-Kernel
plugins: lipy-config-base-30.6.1, lipy-fabric-35.2.3, lipy-test-8.0.52, datadir-1.3.1, lipy-mp-34.4.191
collected 8 items                                                                                                                                                                                                                                              

test/convergence/test_mini_models.py ......                                                                                                                                                                                                              [ 75%]
test/convergence/test_mini_models_no_logits.py ..                                                                                                                                                                                                        [100%]

================================================================================================================= 8 passed in 92.32s (0:01:32) =================================================================================================================

JasonZhu1313 · 2024-08-17T14:46:10Z

    apply_liger_kernel_to_gemma,
    apply_liger_kernel_to_llama,
    apply_liger_kernel_to_mistral,


I am thinking to only expose one generic patch instead of all individual models'..

One reason to keep these apply_liger_kernel_to_{model_type} functions is to provide a more well-defined interface for each model type. Users can see documentation/type hints on exactly which kernels are supported vs. the generic method.

ByronHsu · 2024-08-17T16:59:03Z

may you elaborate

Use of model_type (e.g. llama) will more broadly cover cases compared to specifying model architecture (e.g. LlamaForCausalLM, LlamaForQuestionAnswering, etc...)

How is it related to casualLM, QA, etc?

Also, let's put the PR on hold at least for the first public release, we want to keep the public APIs intact

shimizust · 2024-08-19T06:57:30Z

may you elaborate

Use of model_type (e.g. llama) will more broadly cover cases compared to specifying model architecture (e.g. LlamaForCausalLM, LlamaForQuestionAnswering, etc...)

How is it related to casualLM, QA, etc?

Also, let's put the PR on hold at least for the first public release, we want to keep the public APIs intact

From my understanding, when a new model gets added to transformers there is the base model (e.g. LlamaModel) that has all the core nn.Modules. Then there are the task-specific variants like LlamaForCausalLM, LlamaForTokenClassification that reference the base model but change the head layer to accomplish a specific task.

The kernels generally are applicable to the core model layers defined in the base model. If someone wanted to train a LlamaForTokenClassification model, they would do something like:

model = LlamaForTokenClassification.from_pretrained("some_model_path", labels=...)
apply_liger_to_llama()

# Do training on the model

So by mapping liger kernel application to the model type (e.g. llama), this would cover all potential task-specific model arch variants (e.g. LlamaForCausalLM, LlamaForTokenClassification, LlamaForQuestionAnswering, etc.)

shimizust · 2024-08-19T07:05:53Z

may you elaborate

Use of model_type (e.g. llama) will more broadly cover cases compared to specifying model architecture (e.g. LlamaForCausalLM, LlamaForQuestionAnswering, etc...)

How is it related to casualLM, QA, etc?

Also, let's put the PR on hold at least for the first public release, we want to keep the public APIs intact

Sounds good, also this would still keep the existing APIs (see other comment) going forward

JasonZhu1313 · 2024-08-19T16:41:49Z

LGTM, we haven't tested the convergence of other classes which we can add a few more in convergence tests later on, though functionality wise it should work for other class.

shimizust added 2 commits August 17, 2024 08:01

Added new API for monkey patching

06cb673

checkstyle

2f4a984

shimizust marked this pull request as ready for review August 17, 2024 08:12

JasonZhu1313 reviewed Aug 17, 2024

View reviewed changes

Made apply_liger_kernel a private API

c1b22c7

lancerts approved these changes Aug 19, 2024

View reviewed changes

lancerts merged commit 9109842 into main Aug 19, 2024

ByronHsu mentioned this pull request Aug 19, 2024

Add MODEL_TO_LIGER_KERNEL_PATCHING_FUNC to minimize dependencies from external code #40

Closed

3 tasks

ByronHsu added the huggingface label Aug 19, 2024

ByronHsu deleted the sshimizu/monkey-patch-refactor branch August 23, 2024 06:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added more generic monkey patch function#42

Added more generic monkey patch function#42
lancerts merged 3 commits intomainfrom
sshimizu/monkey-patch-refactor

shimizust commented Aug 17, 2024 •

edited

Loading

Uh oh!

JasonZhu1313 Aug 17, 2024

Uh oh!

shimizust Aug 19, 2024

Uh oh!

JasonZhu1313 Aug 19, 2024

Uh oh!

ByronHsu commented Aug 17, 2024

Uh oh!

shimizust commented Aug 19, 2024 •

edited

Loading

Uh oh!

shimizust commented Aug 19, 2024

Uh oh!

JasonZhu1313 commented Aug 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

shimizust commented Aug 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing Done

Uh oh!

JasonZhu1313 Aug 17, 2024

Choose a reason for hiding this comment

Uh oh!

shimizust Aug 19, 2024

Choose a reason for hiding this comment

Uh oh!

JasonZhu1313 Aug 19, 2024

Choose a reason for hiding this comment

Uh oh!

ByronHsu commented Aug 17, 2024

Uh oh!

shimizust commented Aug 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shimizust commented Aug 19, 2024

Uh oh!

JasonZhu1313 commented Aug 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

shimizust commented Aug 17, 2024 •

edited

Loading

shimizust commented Aug 19, 2024 •

edited

Loading