Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 0 additions & 22 deletions docs/source/en/generation_strategies.md
Original file line number Diff line number Diff line change
Expand Up @@ -225,28 +225,6 @@ outputs = model.generate(**inputs, assistant_model=assistant_model, tokenizer=to
tokenizer.batch_decode(outputs, skip_special_tokens=True)
['Alice and Bob are sitting in a bar. Alice is drinking a beer and Bob is drinking a']
```
### Diverse beam search

[Diverse beam search](https://hf.co/papers/1610.02424) is a variant of beam search that produces more diverse output candidates to choose from. This strategy measures the dissimilarity of sequences and a penalty is applied if sequences are too similar. To avoid high computation costs, the number of beams is divided into groups.

Enable diverse beam search with the `num_beams`, `num_beam_groups` and `diversity_penalty` parameters (the `num_beams` parameter should be divisible by `num_beam_groups`).

```py
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, infer_device

device = infer_device()

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")
inputs = tokenizer("Hugging Face is an open-source company", return_tensors="pt").to(device)

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf", dtype=torch.float16).to(device)
# explicitly set to 100 because Llama2 generation length is 4096
outputs = model.generate(**inputs, max_new_tokens=50, num_beams=6, num_beam_groups=3, diversity_penalty=1.0, do_sample=False)
tokenizer.batch_decode(outputs, skip_special_tokens=True)
'Hugging Face is an open-source company 🤗\nWe are an open-source company. Our mission is to democratize AI and make it accessible to everyone. We believe that AI should be used for the benefit of humanity, not for the benefit of a'
```


## Custom generation methods

Expand Down
7 changes: 0 additions & 7 deletions docs/source/en/internal/generation_utils.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,9 +108,6 @@ generation.
[[autodoc]] ForcedEOSTokenLogitsProcessor
- __call__

[[autodoc]] HammingDiversityLogitsProcessor
- __call__

[[autodoc]] InfNanRemoveLogitsProcessor
- __call__

Expand Down Expand Up @@ -219,10 +216,6 @@ A [`Constraint`] can be used to force the generation to include specific tokens
- process
- finalize

[[autodoc]] BeamSearchScorer
- process
- finalize

[[autodoc]] ConstrainedBeamSearchScorer
- process
- finalize
Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/kv_cache.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ tokenizer = AutoTokenizer.from_pretrained(ckpt)
model = AutoModelForCausalLM.from_pretrained(ckpt, dtype=torch.float16, device_map="auto")
prompt = ["okay "*1000 + "Fun fact: The most"]
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
beams = { "num_beams": 40, "num_beam_groups": 40, "num_return_sequences": 40, "diversity_penalty": 1.0, "max_new_tokens": 23, "early_stopping": True, }
beams = { "num_beams": 40, "num_return_sequences": 20, "max_new_tokens": 23, "early_stopping": True, }
out = resilient_generate(model, **inputs, **beams)
responses = tokenizer.batch_decode(out[:,-28:], skip_special_tokens=True)
```
Expand Down
37 changes: 0 additions & 37 deletions docs/source/ja/generation_strategies.md
Original file line number Diff line number Diff line change
Expand Up @@ -241,43 +241,6 @@ time."\n\nHe added: "I am very proud of the work I have been able to do in the l
'Das Haus ist wunderbar.'
```

### Diverse beam search decoding

多様なビームサーチデコーディング戦略は、ビームサーチ戦略の拡張であり、選択肢からより多様なビームシーケンスを生成できるようにします。この仕組みの詳細については、[Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models](https://huggingface.co/papers/1610.02424) をご参照ください。このアプローチには、`num_beams`、`num_beam_groups`、および `diversity_penalty` という3つの主要なパラメータがあります。多様性ペナルティは、出力がグループごとに異なることを保証し、ビームサーチは各グループ内で使用されます。


```python
>>> from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

>>> checkpoint = "google/pegasus-xsum"
>>> prompt = (
... "The Permaculture Design Principles are a set of universal design principles "
... "that can be applied to any location, climate and culture, and they allow us to design "
... "the most efficient and sustainable human habitation and food production systems. "
... "Permaculture is a design system that encompasses a wide variety of disciplines, such "
... "as ecology, landscape design, environmental science and energy conservation, and the "
... "Permaculture design principles are drawn from these various disciplines. Each individual "
... "design principle itself embodies a complete conceptual framework based on sound "
... "scientific principles. When we bring all these separate principles together, we can "
... "create a design system that both looks at whole systems, the parts that these systems "
... "consist of, and how those parts interact with each other to create a complex, dynamic, "
... "living system. Each design principle serves as a tool that allows us to integrate all "
... "the separate parts of a design, referred to as elements, into a functional, synergistic, "
... "whole system, where the elements harmoniously interact and work together in the most "
... "efficient way possible."
... )

>>> tokenizer = AutoTokenizer.from_pretrained(checkpoint)
>>> inputs = tokenizer(prompt, return_tensors="pt")

>>> model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint)

>>> outputs = model.generate(**inputs, num_beams=5, num_beam_groups=5, max_new_tokens=30, diversity_penalty=1.0)
>>> tokenizer.decode(outputs[0], skip_special_tokens=True)
'The Design Principles are a set of universal design principles that can be applied to any location, climate and
culture, and they allow us to design the'
```

### Assisted Decoding

アシストデコーディングは、上記のデコーディング戦略を変更したもので、同じトークナイザー(理想的にははるかに小さなモデル)を使用して、いくつかの候補トークンを貪欲に生成するアシスタントモデルを使用します。その後、主要なモデルは候補トークンを1つの前向きパスで検証し、デコーディングプロセスを高速化します。現在、アシストデコーディングでは貪欲検索とサンプリングのみがサポートされており、バッチ入力はサポートされていません。アシストデコーディングの詳細については、[このブログ記事](https://huggingface.co/blog/assisted-generation) をご覧ください。
Expand Down
7 changes: 0 additions & 7 deletions docs/source/ja/internal/generation_utils.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,9 +139,6 @@ generation_output[:2]
[[autodoc]] ForcedEOSTokenLogitsProcessor
- __call__

[[autodoc]] HammingDiversityLogitsProcessor
- __call__

[[autodoc]] InfNanRemoveLogitsProcessor
- __call__

Expand Down Expand Up @@ -321,10 +318,6 @@ generation_output[:2]
- process
- finalize

[[autodoc]] BeamSearchScorer
- process
- finalize

[[autodoc]] ConstrainedBeamSearchScorer
- process
- finalize
Expand Down
38 changes: 0 additions & 38 deletions docs/source/ko/generation_strategies.md
Original file line number Diff line number Diff line change
Expand Up @@ -232,44 +232,6 @@ time."\n\nHe added: "I am very proud of the work I have been able to do in the l
'Das Haus ist wunderbar.'
```

### 다양한 빔 탐색 디코딩(Diverse beam search decoding)[[diverse-beam-search-decoding]]

다양한 빔 탐색(Decoding) 전략은 선택할 수 있는 더 다양한 빔 시퀀스 집합을 생성할 수 있게 해주는 빔 탐색 전략의 확장입니다. 이 방법은 어떻게 작동하는지 알아보려면, [다양한 빔 탐색: 신경 시퀀스 모델에서 다양한 솔루션 디코딩하기](https://huggingface.co/papers/1610.02424)를 참조하세요. 이 접근 방식은 세 가지 주요 매개변수를 가지고 있습니다: `num_beams`, `num_beam_groups`, 그리고 `diversity_penalty`. 다양성 패널티는 그룹 간에 출력이 서로 다르게 하기 위한 것이며, 각 그룹 내에서 빔 탐색이 사용됩니다.

```python
>>> from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

>>> checkpoint = "google/pegasus-xsum"
>>> prompt = (
... "The Permaculture Design Principles are a set of universal design principles "
... "that can be applied to any location, climate and culture, and they allow us to design "
... "the most efficient and sustainable human habitation and food production systems. "
... "Permaculture is a design system that encompasses a wide variety of disciplines, such "
... "as ecology, landscape design, environmental science and energy conservation, and the "
... "Permaculture design principles are drawn from these various disciplines. Each individual "
... "design principle itself embodies a complete conceptual framework based on sound "
... "scientific principles. When we bring all these separate principles together, we can "
... "create a design system that both looks at whole systems, the parts that these systems "
... "consist of, and how those parts interact with each other to create a complex, dynamic, "
... "living system. Each design principle serves as a tool that allows us to integrate all "
... "the separate parts of a design, referred to as elements, into a functional, synergistic, "
... "whole system, where the elements harmoniously interact and work together in the most "
... "efficient way possible."
... )

>>> tokenizer = AutoTokenizer.from_pretrained(checkpoint)
>>> inputs = tokenizer(prompt, return_tensors="pt")

>>> model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint)

>>> outputs = model.generate(**inputs, num_beams=5, num_beam_groups=5, max_new_tokens=30, diversity_penalty=1.0)
>>> tokenizer.decode(outputs[0], skip_special_tokens=True)
'The Design Principles are a set of universal design principles that can be applied to any location, climate and
culture, and they allow us to design the'
```

이 가이드에서는 다양한 디코딩 전략을 가능하게 하는 주요 매개변수를 보여줍니다. [`generate`] 메서드에 대한 고급 매개변수가 존재하므로 [`generate`] 메서드의 동작을 더욱 세부적으로 제어할 수 있습니다. 사용 가능한 매개변수의 전체 목록은 [API 문서](./main_classes/text_generation)를 참조하세요.

### 추론 디코딩(Speculative Decoding)[[speculative-decoding]]

추론 디코딩(보조 디코딩(assisted decoding)으로도 알려짐)은 동일한 토크나이저를 사용하는 훨씬 작은 보조 모델을 활용하여 몇 가지 후보 토큰을 생성하는 상위 모델의 디코딩 전략을 수정한 것입니다. 주 모델은 단일 전방 통과로 후보 토큰을 검증함으로써 디코딩 과정을 가속화합니다. `do_sample=True`일 경우, [추론 디코딩 논문](https://huggingface.co/papers/2211.17192)에 소개된 토큰 검증과 재샘플링 방식이 사용됩니다.
Expand Down
7 changes: 0 additions & 7 deletions docs/source/ko/internal/generation_utils.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,9 +131,6 @@ generation_output[:2]
[[autodoc]] ForcedEOSTokenLogitsProcessor
- __call__

[[autodoc]] HammingDiversityLogitsProcessor
- __call__

[[autodoc]] InfNanRemoveLogitsProcessor
- __call__

Expand Down Expand Up @@ -326,10 +323,6 @@ generation_output[:2]
- process
- finalize

[[autodoc]] BeamSearchScorer
- process
- finalize

[[autodoc]] ConstrainedBeamSearchScorer
- process
- finalize
Expand Down
7 changes: 0 additions & 7 deletions docs/source/zh/internal/generation_utils.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,9 +133,6 @@ generation_output[:2]
[[autodoc]] ForcedEOSTokenLogitsProcessor
- __call__

[[autodoc]] HammingDiversityLogitsProcessor
- __call__

[[autodoc]] InfNanRemoveLogitsProcessor
- __call__

Expand Down Expand Up @@ -316,10 +313,6 @@ generation_output[:2]
- process
- finalize

[[autodoc]] BeamSearchScorer
- process
- finalize

[[autodoc]] ConstrainedBeamSearchScorer
- process
- finalize
Expand Down
4 changes: 0 additions & 4 deletions src/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -411,7 +411,6 @@
"BayesianDetectorConfig",
"BayesianDetectorModel",
"BeamScorer",
"BeamSearchScorer",
"ClassifierFreeGuidanceLogitsProcessor",
"ConstrainedBeamSearchScorer",
"Constraint",
Expand All @@ -426,7 +425,6 @@
"ForcedBOSTokenLogitsProcessor",
"ForcedEOSTokenLogitsProcessor",
"GenerationMixin",
"HammingDiversityLogitsProcessor",
"InfNanRemoveLogitsProcessor",
"LogitNormalization",
"LogitsProcessor",
Expand Down Expand Up @@ -656,7 +654,6 @@
from .generation import BayesianDetectorConfig as BayesianDetectorConfig
from .generation import BayesianDetectorModel as BayesianDetectorModel
from .generation import BeamScorer as BeamScorer
from .generation import BeamSearchScorer as BeamSearchScorer
from .generation import ClassifierFreeGuidanceLogitsProcessor as ClassifierFreeGuidanceLogitsProcessor
from .generation import CompileConfig as CompileConfig
from .generation import ConstrainedBeamSearchScorer as ConstrainedBeamSearchScorer
Expand Down Expand Up @@ -687,7 +684,6 @@
from .generation import ForcedEOSTokenLogitsProcessor as ForcedEOSTokenLogitsProcessor
from .generation import GenerationConfig as GenerationConfig
from .generation import GenerationMixin as GenerationMixin
from .generation import HammingDiversityLogitsProcessor as HammingDiversityLogitsProcessor
from .generation import InfNanRemoveLogitsProcessor as InfNanRemoveLogitsProcessor
from .generation import LogitNormalization as LogitNormalization
from .generation import LogitsProcessor as LogitsProcessor
Expand Down
5 changes: 3 additions & 2 deletions src/transformers/configuration_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -1121,8 +1121,6 @@ def _get_global_generation_defaults() -> dict[str, Any]:
"do_sample": False,
"early_stopping": False,
"num_beams": 1,
"num_beam_groups": 1,
"diversity_penalty": 0.0,
"temperature": 1.0,
"top_k": 50,
"top_p": 1.0,
Expand All @@ -1141,6 +1139,9 @@ def _get_global_generation_defaults() -> dict[str, Any]:
"exponential_decay_length_penalty": None,
"suppress_tokens": None,
"begin_suppress_tokens": None,
# Deprecated arguments (moved to the Hub). TODO joao, manuel: remove in v4.62.0
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that if we remove this, initializing config arguments on the model will stop working, as in

model = BartForConditionalGeneration.from_pretrained(
"hf-internal-testing/tiny-random-bart",
max_length=10,
num_beams=2,
num_beam_groups=2,
num_return_sequences=2,
diversity_penalty=1.0,
eos_token_id=None,
return_dict_in_generate=True,
output_scores=True,
length_penalty=0.0,
)

"num_beam_groups": 1,
"diversity_penalty": 0.0,
}

def _get_non_default_generation_parameters(self) -> dict[str, Any]:
Expand Down
4 changes: 2 additions & 2 deletions src/transformers/dynamic_module_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -428,10 +428,10 @@ def get_cached_module_file(
importlib.invalidate_caches()
# Make sure we also have every file with relative
for module_needed in modules_needed:
if not (submodule_path / f"{module_needed}.py").exists():
if not ((submodule_path / module_file).parent / f"{module_needed}.py").exists():
get_cached_module_file(
pretrained_model_name_or_path,
f"{module_needed}.py",
f"{Path(module_file).parent / module_needed}.py",
cache_dir=cache_dir,
force_download=force_download,
resume_download=resume_download,
Expand Down
5 changes: 1 addition & 4 deletions src/transformers/generation/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,6 @@
_import_structure["beam_search"] = [
"BeamHypotheses",
"BeamScorer",
"BeamSearchScorer",
"ConstrainedBeamSearchScorer",
]
_import_structure["candidate_generator"] = [
Expand All @@ -63,7 +62,6 @@
"ExponentialDecayLengthPenalty",
"ForcedBOSTokenLogitsProcessor",
"ForcedEOSTokenLogitsProcessor",
"HammingDiversityLogitsProcessor",
"InfNanRemoveLogitsProcessor",
"LogitNormalization",
"LogitsProcessor",
Expand Down Expand Up @@ -209,7 +207,7 @@
pass
else:
from .beam_constraints import Constraint, ConstraintListState, DisjunctiveConstraint, PhrasalConstraint
from .beam_search import BeamHypotheses, BeamScorer, BeamSearchScorer, ConstrainedBeamSearchScorer
from .beam_search import BeamHypotheses, BeamScorer, ConstrainedBeamSearchScorer
from .candidate_generator import (
AssistedCandidateGenerator,
CandidateGenerator,
Expand All @@ -227,7 +225,6 @@
ExponentialDecayLengthPenalty,
ForcedBOSTokenLogitsProcessor,
ForcedEOSTokenLogitsProcessor,
HammingDiversityLogitsProcessor,
InfNanRemoveLogitsProcessor,
LogitNormalization,
LogitsProcessor,
Expand Down
Loading