Mismatched keyword argument names of llama make GA fix invalid

### System Info

- Transformers 4.47.0.dev0 (latest commit 33868a057c02f0368ba63bd1edb746be38fe3d90)

### Who can help?

@ArthurZucker @muellerzr 

### Information

- [X] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [X] My own task or dataset (give details below)

### Reproduction

https://github.com/huggingface/transformers/pull/33932 may breaks the logic for the trainer's `model_accepts_loss_kwargs`. The llama model would not receive a `num_items_in_batch` argument, making the fix of https://github.com/huggingface/transformers/pull/34283 invalid again

https://github.com/huggingface/transformers/blob/33868a057c02f0368ba63bd1edb746be38fe3d90/src/transformers/trainer.py#L605

Moreover, the names of keyword arguments are different for llama and other models, we might expect the same keyword arguments for different models.

https://github.com/huggingface/transformers/blob/33868a057c02f0368ba63bd1edb746be38fe3d90/src/transformers/models/llama/modeling_llama.py#L1146-L1161

https://github.com/huggingface/transformers/blob/33868a057c02f0368ba63bd1edb746be38fe3d90/src/transformers/models/gemma/modeling_gemma.py#L1015-L1030


### Expected behavior

The models' forward functions should have a consistent keyword argument list.

	def forward(
	self,
	input_ids: torch.LongTensor = None,
	attention_mask: Optional[torch.Tensor] = None,
	position_ids: Optional[torch.LongTensor] = None,
	past_key_values: Optional[Union[Cache, List[torch.FloatTensor]]] = None,
	inputs_embeds: Optional[torch.FloatTensor] = None,
	labels: Optional[torch.LongTensor] = None,
	use_cache: Optional[bool] = None,
	output_attentions: Optional[bool] = None,
	output_hidden_states: Optional[bool] = None,
	return_dict: Optional[bool] = None,
	cache_position: Optional[torch.LongTensor] = None,
	num_logits_to_keep: int = 0,
	**kwargs: Unpack[KwargsForCausalLM],
	) -> Union[Tuple, CausalLMOutputWithPast]:

	def forward(
	self,
	input_ids: torch.LongTensor = None,
	attention_mask: Optional[torch.Tensor] = None,
	position_ids: Optional[torch.LongTensor] = None,
	past_key_values: Optional[Union[Cache, List[torch.FloatTensor]]] = None,
	inputs_embeds: Optional[torch.FloatTensor] = None,
	labels: Optional[torch.LongTensor] = None,
	use_cache: Optional[bool] = None,
	output_attentions: Optional[bool] = None,
	output_hidden_states: Optional[bool] = None,
	return_dict: Optional[bool] = None,
	cache_position: Optional[torch.LongTensor] = None,
	num_logits_to_keep: int = 0,
	**loss_kwargs,
	) -> Union[Tuple, CausalLMOutputWithPast]:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mismatched keyword argument names of llama make GA fix invalid #34577

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Mismatched keyword argument names of llama make GA fix invalid #34577

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions