Skip to content

[auto_docstring] needs to be only run on __doc__ #45056

Open
ArthurZucker wants to merge 3 commits intomainfrom
fix-auto-doc
Open

[auto_docstring] needs to be only run on __doc__ #45056
ArthurZucker wants to merge 3 commits intomainfrom
fix-auto-doc

Conversation

@ArthurZucker
Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker commented Mar 27, 2026

What does this PR do?

This is mega long due I wanted to check benches.
Its not super super huge but a win is a win

@ArthurZucker ArthurZucker marked this pull request as ready for review March 27, 2026 11:36
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ArthurZucker ArthurZucker marked this pull request as draft March 27, 2026 13:09
@ArthurZucker
Copy link
Copy Markdown
Collaborator Author

Benchmark Update 4 — Decoration speedup (warm process, without PyTorch)

Setup: same Python process, all imports and caches already warm (inspect signature cache, regex, auto-module). Both branches measured in the same process using explicit sys.path injection to bypass the editable install. 50 rounds × 3 real config classes.


Decoration cost per class

@auto_docstring call cost what it does
branch ~0.35 µs / class stores a _LazyDocClass closure
main ~1 106 µs / class generates the full docstring eagerly
ratio ~3 160×
branch: 0.001 ms / 3 classes  =  0.35 µs/class   ← just stores a closure
main:   3.317 ms / 3 classes  = 1106 µs/class    ← full generation happens here

Cached cls.__doc__ access after generation: ~60 ns/class on both (identical).


What this means for inference / training

main branch
from transformers import LlamaConfig pays ~1 ms to generate doc immediately pays ~0.35 µs to store a closure
model.forward(inputs) __doc__ never touched __doc__ never touched
LlamaConfig.__doc__ (explicit access) ~0 ns (already done) ~1 ms (generated once, then cached)
LlamaConfig.__doc__ again ~60 ns ~60 ns

Inference and training never read __doc__. On main, each from transformers import Xxx pays ~1 ms to generate the docstring whether or not it is ever used. On branch, that cost is deferred and only paid if .__doc__ is explicitly accessed.


Why this does not show up in cold-process import benchmarks

The ~1 ms generation cost is negligible compared to Python startup (~200 ms) + transformers package init (~600 ms) + optional PyTorch import (~1 500 ms). The cold-process noise floor is ~50 ms, so a ~1–5 ms per-class saving is invisible there. The benefit accumulates across all decorated classes but is swamped by startup variance in single-class measurements.

@ArthurZucker ArthurZucker marked this pull request as ready for review March 27, 2026 13:47
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates auto_docstring to defer class docstring generation until cls.__doc__ is first accessed (while keeping method/function docstrings generated eagerly), and adds benchmark coverage to measure import/doc-access impact.

Changes:

  • Introduces a lazy class-docstring descriptor and refactors docstring builders into “generate” helper functions.
  • Keeps method docstrings eager and updates generation to prefer the unwrapped (__wrapped__) function for source docstrings/signatures.
  • Adds a new tests/benchmarks suite (with a stub benchmark fixture fallback) to measure import/doc-access/from_pretrained timing.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 5 comments.

File Description
src/transformers/utils/auto_docstring.py Implements lazy class docstrings via a descriptor; refactors generation into helper functions and updates decorator docs.
tests/benchmarks/test_lazy_docstring_benchmarks.py Adds informational benchmarks for import time and docstring access paths, plus an optional slow from_pretrained benchmark.
tests/benchmarks/conftest.py Adds a stub benchmark fixture to gracefully skip benchmarks when pytest-benchmark isn’t installed.

Comment on lines +4438 to 4452
# Capture the raw source-code docstring **before** any lazy machinery is attached so
# that the generator closure can use it safely without risking re-entry.
original_doc = cls.__dict__.get("__doc__")

def _generator():
return _generate_class_docstring(
cls,
custom_intro=custom_intro,
custom_args=custom_args,
checkpoint=checkpoint,
_original_doc=original_doc,
)

_apply_lazy_doc(cls, _generator)
return cls
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

original_doc = cls.__dict__.get("__doc__") can capture the lazy descriptor itself if auto_class_docstring() is called more than once on the same class (or if the doc was already made lazy elsewhere). In that case _generate_class_docstring(..., _original_doc=original_doc) will later treat a non-str as the raw docstring and can break parsing/formatting. Consider normalizing here (only keep str/None, or if the existing value is _LazyDocClass, reuse its cached value / generator result safely) to make auto_class_docstring idempotent.

Copilot uses AI. Check for mistakes.
from transformers.utils.auto_docstring import auto_method_docstring

def _dummy(x: int, y: int = 0) -> int:
r"""x (`int`): First number.\ny (`int`, *optional*): Second number."""
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _dummy docstring is declared as a raw string containing a literal \n, so it will not contain an actual newline. If the goal is to simulate a typical multi-line docstring format for auto_method_docstring, use a real newline (multi-line triple-quoted string) so the benchmark reflects realistic parsing/formatting behavior.

Suggested change
r"""x (`int`): First number.\ny (`int`, *optional*): Second number."""
"""x (`int`): First number.
y (`int`, *optional*): Second number."""

Copilot uses AI. Check for mistakes.
Comment on lines +40 to +42
pytestmark = pytest.mark.skipif(
not HAS_BENCHMARK, reason="pytest-benchmark not installed (pip install pytest-benchmark)"
)
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are benchmarks that assert nothing, but if pytest-benchmark is installed they will run as normal tests in any full pytest invocation. To avoid accidental slow/side-effectful runs, consider adding an additional opt-in guard (e.g., skip unless an env var like RUN_BENCHMARKS=1 is set), or place/rename the file so it isn’t collected by default.

Copilot uses AI. Check for mistakes.
Comment on lines +84 to +91
# Reset the lazy state so every round re-generates.
from transformers.utils.auto_docstring import auto_class_docstring

def setup():
auto_class_docstring(BaseImageProcessor)

def access():
return BaseImageProcessor.__doc__
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This benchmark mutates BaseImageProcessor in-place by reapplying auto_class_docstring, which can permanently replace __doc__ for the rest of the process (and affect any subsequent tests in the same session). Consider snapshotting/restoring the original BaseImageProcessor.__dict__.get("__doc__") around the benchmark, or using an isolated throwaway class/module for the benchmark to avoid leaking global state.

Suggested change
# Reset the lazy state so every round re-generates.
from transformers.utils.auto_docstring import auto_class_docstring
def setup():
auto_class_docstring(BaseImageProcessor)
def access():
return BaseImageProcessor.__doc__
# Use a throwaway subclass so we don't mutate the global BaseImageProcessor.
class _BenchBaseImageProcessor(BaseImageProcessor):
pass
# Reset the lazy state so every round re-generates.
from transformers.utils.auto_docstring import auto_class_docstring
def setup():
auto_class_docstring(_BenchBaseImageProcessor)
def access():
return _BenchBaseImageProcessor.__doc__

Copilot uses AI. Check for mistakes.
Comment on lines +4091 to +4102
class _LazyDocClass:
"""
Descriptor stored directly in ``cls.__dict__['__doc__']`` to defer class docstring
generation until the first ``cls.__doc__`` access.

Python's ``type.__doc__`` C-level getter checks whether the stored value has a
``__get__`` method and, if so, calls it — exactly like normal descriptor dispatch.
This lets us intercept ``cls.__doc__`` without changing the class's metaclass.

On the first access the generator is invoked, the result is cached, and the descriptor
replaces itself with the plain string so that all subsequent lookups are zero-overhead.
"""
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new lazy class-docstring mechanism is a behavior change with subtle interactions (e.g. inspect.getdoc, repeated decoration, and ensuring no generation happens until __doc__ is accessed). There are existing tests/utils/test_auto_docstring.py end-to-end tests, but none that assert the laziness property itself; adding a focused unit test would help prevent regressions.

Copilot generated this review using guidance from repository custom instructions.
@Cyrilvallez
Copy link
Copy Markdown
Member

cc @yonigozlan!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants