Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Thanks, I'll review asap |
|
cc @winglian |
|
@bot /style |
|
Style fix bot fixed some files and pushed the changes. |
…ormers into minor-changes-trainer
ArthurZucker
left a comment
There was a problem hiding this comment.
I am down to keep aliases for now + deprecate (but its up to you if you can adds lots of coms about it)
Trying to refrain from too many breaks from v5 already
|
Can you run the TRL CI on this PR @qgallouedec ? Also @winglian, is there a way to run the axolotl CI just like we do for TRL ? One of my next step also would be to create nice tests like TRL so that we don't need to rely on TRL to catch regressions.
I think for these, it shouldn't be too breaking. There are other more sensible arg that will indeed require to deprecate if we want to ensure a smooth transition in a future PR. If any of those are too breaking, happy to put it back and do a proper deprecation cycle. |
|
/trl-ci |
|
This resulted in a regression in Sentence Transformers due to the removal of
|
What does this PR do?
This PR updates a few minor things from trainer. Some of them are breaking but I think it should be safe to do as I don't think anyone is subclassing them or using them separately.
Breaking:
propagate_args_to_deepspeedto standalone function inintegrations.deepspeed_fsdp_qlora_plugin_updatesto standalone function inintegrations.fsdp+ rename toupdate_fsdp_plugin_peftis_attention_mask_causalto standalone function intrainer_pt_utils.py_nested_gatherto standalone function intrainer_pt_utils.py_add_sm_patterns_to_gitignore-> not used at all_align_special_tokensto standalone function intrainer_pt_utils.pydeepspeed_sp_compute_losstointegrations.deepspeeddeepspeed_sp_compute_losstointegrations.deepspeed_save_tputosave_tpu_checkpointinintegrations.tpufilewrap_model_xla_fsdpto move logic in integrations.fsdp` fileNot breaking:
get_fsdp_ckpt_kwargstointegrations.fsdpsafe_globalstotrainer_utils.py_get_learning_rateinTraineras this method is actually used and not only in examples.