[Feature]: Last token pooling for causal embedding models

### What feature would you like to request?

The Qwen3 models will need something like this (this is taken from Qwen3 example):

```python
def last_token_pool(last_hidden_states: Tensor,
                 attention_mask: Tensor) -> Tensor:
    left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])
    if left_padding:
        return last_hidden_states[:, -1]
    else:
        sequence_lengths = attention_mask.sum(dim=1) - 1
        batch_size = last_hidden_states.shape[0]
        return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]
```

### Is there any additional information you would like to provide?

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Last token pooling for causal embedding models #529

What feature would you like to request?

Is there any additional information you would like to provide?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature]: Last token pooling for causal embedding models #529

Description

What feature would you like to request?

Is there any additional information you would like to provide?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions