Skip to content

Mistral-related models for QnA#34045

Merged
ArthurZucker merged 8 commits intohuggingface:mainfrom
vasqu:mistral-for-qna
Oct 14, 2024
Merged

Mistral-related models for QnA#34045
ArthurZucker merged 8 commits intohuggingface:mainfrom
vasqu:mistral-for-qna

Conversation

@vasqu
Copy link
Copy Markdown
Contributor

@vasqu vasqu commented Oct 9, 2024

What does this PR do?

Adds question answering to mistral, mixtral, qwen2, qwen2moe. Either we take every model due to the copy statements or we need to ignore it in the copied checks. Based on #29168 but using copied from instead.

Motivation: We have a benchmark paper at https://github.com/LSX-UniWue/SuperGLEBer which uses the transformers QnA models for simplicity but due to it not being available in main, it's manually patched in. Would be great to see it getting into main!

Fixes #28908

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@LysandreJik @ArthurZucker

Comment on lines +1296 to +1314
# Copied from transformers.models.llama.modeling_llama.LlamaForQuestionAnswering.__init__ with Llama->Mistral,transformer->model
def __init__(self, config):
super().__init__(config)
self.model = MistralModel(config)
self.qa_outputs = nn.Linear(config.hidden_size, 2)

# Initialize weights and apply final processing
self.post_init()

# Copied from transformers.models.llama.modeling_llama.LlamaForQuestionAnswering.get_input_embeddings with transformer->model
def get_input_embeddings(self):
return self.model.embed_tokens

# Copied from transformers.models.llama.modeling_llama.LlamaForQuestionAnswering.set_input_embeddings with transformer->model
def set_input_embeddings(self, value):
self.model.embed_tokens = value

@add_start_docstrings_to_model_forward(MISTRAL_INPUTS_DOCSTRING)
# Copied from transformers.models.llama.modeling_llama.LlamaForQuestionAnswering.forward with Llama->Mistral, transformer->model
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying from each one individually due to llama having a wrong base model prefix which already led/leads to issues in the past: #30381. Currently, it makes copying force to include the base model prefix (ref. here) if using top-level copied from.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it's more of a stylistic choice: individual copies vs. include (unnecessary) base model prefix.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If #34061 gets merged, we can top-level copy from llama without any problems.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also use # Ignore copy on the single place where the copy does not match!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, perfect I'll change it later and ping you when ready ;)

Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general! would be nice to have a single # Copied from at the top of the class (either # Ignore copy or just don't copy from llama for one of them!)

Comment on lines +1296 to +1314
# Copied from transformers.models.llama.modeling_llama.LlamaForQuestionAnswering.__init__ with Llama->Mistral,transformer->model
def __init__(self, config):
super().__init__(config)
self.model = MistralModel(config)
self.qa_outputs = nn.Linear(config.hidden_size, 2)

# Initialize weights and apply final processing
self.post_init()

# Copied from transformers.models.llama.modeling_llama.LlamaForQuestionAnswering.get_input_embeddings with transformer->model
def get_input_embeddings(self):
return self.model.embed_tokens

# Copied from transformers.models.llama.modeling_llama.LlamaForQuestionAnswering.set_input_embeddings with transformer->model
def set_input_embeddings(self, value):
self.model.embed_tokens = value

@add_start_docstrings_to_model_forward(MISTRAL_INPUTS_DOCSTRING)
# Copied from transformers.models.llama.modeling_llama.LlamaForQuestionAnswering.forward with Llama->Mistral, transformer->model
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also use # Ignore copy on the single place where the copy does not match!

@vasqu
Copy link
Copy Markdown
Contributor Author

vasqu commented Oct 10, 2024

@ArthurZucker Changed it to top-level copied from now. Lmk if I should change something else.

)
# Copied from transformers.models.llama.modeling_llama.LlamaForQuestionAnswering with Llama->Mistral,LLAMA->MISTRAL,transformer->model
class MistralForQuestionAnswering(MistralPreTrainedModel):
base_model_prefix = "model"
Copy link
Copy Markdown
Contributor Author

@vasqu vasqu Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

base_model_prefix = "model" is due to the llama stuff I mentioned, otherwise the classes have different structures and copied from will fail in an error.

@ArthurZucker
Copy link
Copy Markdown
Collaborator

That's it! Merging 🤗

@ArthurZucker ArthurZucker merged commit 7434c0e into huggingface:main Oct 14, 2024
@vasqu vasqu deleted the mistral-for-qna branch October 14, 2024 09:46
BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024
* mistral qna start

* mixtral qna

* oops

* qwen2 qna

* qwen2moe qna

* add missing input embed methods

* add copied to all methods, can't directly from llama due to the prefix

* make top level copied from
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add MistralForQuestionAnswering

2 participants