chore: speedup modular converter (~30%)#45046
Conversation
|
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
So far on a 7-file sample: before: 13.863s 49.3% faster. |
|
Hey! Very nice to start exploring conversion speed! The main points I see from the PR, and the tradeoffs:
The most obvious point is see here is keeping track of depth/order directly instead of calling |
|
Of the three optimisations, this is the benchmark:
So yeah, keeping fast mappers would rip most of the benefits. Agreed the fast import analysis adds complexity. For the module cache, it's a small win but I don't think it adds complexity. I've experimented with dumping the CST tree on disk and reload it, but it's not a huge gain so I dropped the idea. cross-processes cache is also not useful since the cst work is pretty isolated per process. I also tried to raised the file-per-worker ratio but that does not help. I've spotted another place to cache: cache the dependency analysis per base module (not just the parsed tree). |
nope... dead end. the current version is 44.7% faster |
Cyrilvallez
left a comment
There was a problem hiding this comment.
Nice! I like this version better, we keep the code "simple" while keeping most of the bemefits!
| mtime_ns = os.stat(file_path).st_mtime_ns | ||
| cached = _MODULE_SOURCE_CACHE.get(cache_key) | ||
| if ENABLE_MODULE_SOURCE_CACHE and cached is not None and cached[0] == mtime_ns: | ||
| return cached[1], cached[2] |
There was a problem hiding this comment.
Why do we cache the time here?
There was a problem hiding this comment.
it's not the time it's the file's st_mtime_ns that we use to determine if it was modified. It's used for a small performance gain and also the profiler.
The gains for the execution side are minimal, if we remove the profiler we can probably rip off this part as well
There was a problem hiding this comment.
15% gain on reads
| ENABLE_MODULE_SOURCE_CACHE = True | ||
| ENABLE_FAST_MAPPER_VISIT = True | ||
| _MODULE_SOURCE_CACHE = {} |
There was a problem hiding this comment.
Let's make it always True no? Wdyt?
There was a problem hiding this comment.
sure, these were here to enable side-by-side comparisons with the benchmark script.
ok I'll remove the profiler script and make this a landable PR thanks for your reviews! |
1fabbfd to
fa786d8
Compare
Cyrilvallez
left a comment
There was a problem hiding this comment.
Thanks a lot! Let's simply make sure we don't see any diffs anywhere by running
python utils/modular_model_converter.py all and python utils/modular_model_converter.py examples (the examples behave as tests for some edge cases) before merging, then let's go!
| def get_needed_imports(body: dict[str, dict], all_imports: list[cst.CSTNode]) -> list[cst.CSTNode]: | ||
| """Get all the imports needed in the `body`, from the list of `all_imports`. | ||
| `body` is a dict with the following structure `{str: {"insert_idx": int, "node": cst.CSTNode}}`. | ||
| """ | ||
| new_body = [k[1]["node"] for k in sorted(body.items(), key=lambda x: x[1]["insert_idx"])] | ||
| wrapper = MetadataWrapper(cst.Module(body=all_imports + new_body), unsafe_skip_copy=True) | ||
| scopes = set(wrapper.resolve(ScopeProvider).values()) | ||
| import_ref_count = defaultdict(lambda: 0) | ||
| for scope in scopes: | ||
| for assignment in scope.assignments: | ||
| node = assignment.node | ||
| if isinstance(assignment, cst.metadata.Assignment) and isinstance(node, (cst.Import, cst.ImportFrom)): | ||
| ref_count = len(assignment.references) | ||
| name = assignment.name | ||
| import_ref_count[name] = max(ref_count, import_ref_count[name]) | ||
| # Similar imports may be redefined, and only used between their 1st and 2nd definition so if we already have | ||
| # a ref count > 0 at any point, the imports is actually used | ||
| unused_imports = {name for name, count in import_ref_count.items() if count <= 0 or name in body} | ||
| return _build_needed_imports(all_imports, unused_imports) | ||
|
|
There was a problem hiding this comment.
nit: if it's 100% the same would prefer to quickly revert as well to simplify history of the file!
…with a metadata-free mapper visit that tracks top-level context and source order directly, eliminating expensive LibCST ParentNodeProvider/PositionProvider passes while preserving byte-identical output
fa786d8 to
82ba8df
Compare
* investigate modular conversion speedups * second optim / cache * Python-only fast path based on symtable plus a lightweight AST pass * replaced the imported-module analysis path in convert_modular_file() with a metadata-free mapper visit that tracks top-level context and source order directly, eliminating expensive LibCST ParentNodeProvider/PositionProvider passes while preserving byte-identical output * revert some speedups * removed the profiler and cleanup the converter touse both changes * make the change closer to main
What does this PR do?
First optimization
Adding a module-source cache in
utils/modular_model_converter.py. When the converter needs the same imported modeling file more than once, it now reuses the already-read source and parsed LibCST tree, invalidated by the file’s mtime, instead of reopening and reparsing the file.Second optimization
Removing the metadata-heavy visit path and making the lightweight traversal the default.
Instead of asking LibCST metadata whether a node is module-level and what its source position is, we track suite depth and node order directly during traversal, which avoids the MetadataWrapper overhead.
Together, those changes cut runtime by about 30% on the sampled batch. (3 runs on qwen3, olmo2, starcoder2)