chore: speedup modular converter (~30%) by tarekziade · Pull Request #45046 · huggingface/transformers

tarekziade · 2026-03-27T08:31:58Z

What does this PR do?

First optimization

Adding a module-source cache in utils/modular_model_converter.py. When the converter needs the same imported modeling file more than once, it now reuses the already-read source and parsed LibCST tree, invalidated by the file’s mtime, instead of reopening and reparsing the file.

Second optimization

Removing the metadata-heavy visit path and making the lightweight traversal the default.

Instead of asking LibCST metadata whether a node is module-level and what its source position is, we track suite depth and node order directly during traversal, which avoids the MetadataWrapper overhead.

Together, those changes cut runtime by about 30% on the sampled batch. (3 runs on qwen3, olmo2, starcoder2)

tarekziade · 2026-03-27T08:32:24Z

  python utils/profile_modular_conversion.py \
    src/transformers/models/conditional_detr/modular_conditional_detr.py \
    --repeat 1 \
    --compare-old-wrapper \
    --verify-equal \
    --snakeviz \
    --output-dir /tmp/codex-profiles/conditional_detr_modular_profile

HuggingFaceDocBuilderDev · 2026-03-27T08:41:44Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

tarekziade · 2026-03-27T10:24:19Z

So far on a 7-file sample:

before: 13.863s
after: 7.032s

49.3% faster.

Cyrilvallez · 2026-04-23T03:19:21Z

Hey! Very nice to start exploring conversion speed!
I did not profile myself or run your profiler scripts, so would just like to confirm here what are the worst points to tackle first. Because I see some very nice and easy wins here, but others bring harder to follow code that may not prove to be so maintainable.

The main points I see from the PR, and the tradeoffs:

initial caching of the modules for reuse -> nice and easy in theory, but we start many processes to run conversion in parallel, so the different processes will NOT share their cache -> did you check it actually makes any difference? I.e. once a process in the pool starts, it will be able to keep its cache when starting with another input?
Always keeping track of the node depth and order in the file instead of calling self.get_metadata(cst.metadata.ParentNodeProvider, node) -> nice and easy win, should improve quite a bit
imports -> hard to read/understand -> does it really provide big speedups?

The most obvious point is see here is keeping track of depth/order directly instead of calling get_metadata (and thus removing the MetadataWrapper) which is quite slow as cst needs to keep track of much more stuff. Is it the biggest source of improvement?

tarekziade · 2026-04-23T06:36:37Z

Of the three optimisations, this is the benchmark:

Module cache. Yes, it helps even in the worst case where each pool worker gets a single file (~3–6% from intra-file dep-graph re-parsing). Across multiple files sharing a base, the benefit grows (~9% here on a random 12-file batch). It's the smallest of the three wins though.
Fast mapper visit (depth/order tracking, no ParentNodeProvider/MetadataWrapper), That's the biggest win by far, ~55% of the total speedup.
Fast import analysis (symtable + AST instead of CST ScopeProvider) It's ~16% of the total, ~2s on this sample.

So yeah, keeping fast mappers would rip most of the benefits. Agreed the fast import analysis adds complexity.

For the module cache, it's a small win but I don't think it adds complexity. I've experimented with dumping the CST tree on disk and reload it, but it's not a huge gain so I dropped the idea. cross-processes cache is also not useful since the cst work is pretty isolated per process. I also tried to raised the file-per-worker ratio but that does not help.

I've spotted another place to cache: cache the dependency analysis per base module (not just the parsed tree).
Will try that

tarekziade · 2026-04-23T08:56:44Z

I've spotted another place to cache: cache the dependency analysis per base module (not just the parsed tree).
Will try that

nope... dead end.

the current version is 44.7% faster

Cyrilvallez

Nice! I like this version better, we keep the code "simple" while keeping most of the bemefits!

Cyrilvallez · 2026-04-24T05:02:10Z

+    mtime_ns = os.stat(file_path).st_mtime_ns
+    cached = _MODULE_SOURCE_CACHE.get(cache_key)
+    if ENABLE_MODULE_SOURCE_CACHE and cached is not None and cached[0] == mtime_ns:
+        return cached[1], cached[2]


Why do we cache the time here?

it's not the time it's the file's st_mtime_ns that we use to determine if it was modified. It's used for a small performance gain and also the profiler.

The gains for the execution side are minimal, if we remove the profiler we can probably rip off this part as well

15% gain on reads

Cyrilvallez · 2026-04-24T05:03:24Z

+ENABLE_MODULE_SOURCE_CACHE = True
+ENABLE_FAST_MAPPER_VISIT = True
+_MODULE_SOURCE_CACHE = {}


Let's make it always True no? Wdyt?

sure, these were here to enable side-by-side comparisons with the benchmark script.

tarekziade · 2026-04-24T07:06:50Z

Nice! I like this version better, we keep the code "simple" while keeping most of the bemefits!

ok I'll remove the profiler script and make this a landable PR thanks for your reviews!

Cyrilvallez

Thanks a lot! Let's simply make sure we don't see any diffs anywhere by running
python utils/modular_model_converter.py all and python utils/modular_model_converter.py examples (the examples behave as tests for some edge cases) before merging, then let's go!

Cyrilvallez · 2026-04-24T09:30:57Z

+def get_needed_imports(body: dict[str, dict], all_imports: list[cst.CSTNode]) -> list[cst.CSTNode]:
+    """Get all the imports needed in the `body`, from the list of `all_imports`.
+    `body` is a dict with the following structure `{str: {"insert_idx": int, "node": cst.CSTNode}}`.
+    """
+    new_body = [k[1]["node"] for k in sorted(body.items(), key=lambda x: x[1]["insert_idx"])]
+    wrapper = MetadataWrapper(cst.Module(body=all_imports + new_body), unsafe_skip_copy=True)
+    scopes = set(wrapper.resolve(ScopeProvider).values())
+    import_ref_count = defaultdict(lambda: 0)
+    for scope in scopes:
+        for assignment in scope.assignments:
+            node = assignment.node
+            if isinstance(assignment, cst.metadata.Assignment) and isinstance(node, (cst.Import, cst.ImportFrom)):
+                ref_count = len(assignment.references)
+                name = assignment.name
+                import_ref_count[name] = max(ref_count, import_ref_count[name])
+    # Similar imports may be redefined, and only used between their 1st and 2nd definition so if we already have
+    # a ref count > 0 at any point, the imports is actually used
+    unused_imports = {name for name, count in import_ref_count.items() if count <= 0 or name in body}
+    return _build_needed_imports(all_imports, unused_imports)
+


nit: if it's 100% the same would prefer to quickly revert as well to simplify history of the file!

…with a metadata-free mapper visit that tracks top-level context and source order directly, eliminating expensive LibCST ParentNodeProvider/PositionProvider passes while preserving byte-identical output

* investigate modular conversion speedups * second optim / cache * Python-only fast path based on symtable plus a lightweight AST pass * replaced the imported-module analysis path in convert_modular_file() with a metadata-free mapper visit that tracks top-level context and source order directly, eliminating expensive LibCST ParentNodeProvider/PositionProvider passes while preserving byte-identical output * revert some speedups * removed the profiler and cleanup the converter touse both changes * make the change closer to main

tarekziade self-assigned this Mar 27, 2026

tarekziade marked this pull request as draft March 27, 2026 08:32

tarekziade requested a review from Cyrilvallez March 27, 2026 16:33

tarekziade mentioned this pull request Mar 27, 2026

refactor: shard checkers #45064

Draft

Cyrilvallez reviewed Apr 24, 2026

View reviewed changes

tarekziade force-pushed the tarekziade-modular-speedup branch from 1fabbfd to fa786d8 Compare April 24, 2026 07:12

tarekziade changed the title ~~investigate modular conversion speedups~~ chore: speedup modular converter (~30%) Apr 24, 2026

tarekziade marked this pull request as ready for review April 24, 2026 07:17

Cyrilvallez approved these changes Apr 24, 2026

View reviewed changes

tarekziade added 7 commits April 27, 2026 08:29

investigate modular conversion speedups

19787af

second optim / cache

31c15a3

Python-only fast path based on symtable plus a lightweight AST pass

e975252

replaced the imported-module analysis path in convert_modular_file() …

361b0a3

…with a metadata-free mapper visit that tracks top-level context and source order directly, eliminating expensive LibCST ParentNodeProvider/PositionProvider passes while preserving byte-identical output

revert some speedups

13e7969

removed the profiler and cleanup the converter touse both changes

adf5450

make the change closer to main

82ba8df

tarekziade force-pushed the tarekziade-modular-speedup branch from fa786d8 to 82ba8df Compare April 27, 2026 06:37

tarekziade enabled auto-merge April 27, 2026 06:39

tarekziade added this pull request to the merge queue Apr 27, 2026

Merged via the queue into main with commit 99cdd43 Apr 27, 2026
19 checks passed

tarekziade deleted the tarekziade-modular-speedup branch April 27, 2026 06:54

Conversation

tarekziade commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

tarekziade commented Mar 27, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Mar 27, 2026

Uh oh!

tarekziade commented Mar 27, 2026

Uh oh!

Cyrilvallez commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tarekziade commented Apr 23, 2026

Uh oh!

tarekziade commented Apr 23, 2026

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

tarekziade Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

tarekziade Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

tarekziade Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

tarekziade commented Apr 24, 2026

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tarekziade commented Mar 27, 2026 •

edited

Loading

Cyrilvallez commented Apr 23, 2026 •

edited

Loading