Fix quadratic command-line parsing#2724
Merged
Julow merged 4 commits intoocaml-ppx:mainfrom Sep 22, 2025
Merged
Conversation
There is only a single instance anyway.
Otherwise the config parser would run for every argument, resulting in quadratic parsing.
Julow
reviewed
Sep 22, 2025
Collaborator
Julow
left a comment
There was a problem hiding this comment.
Awesome !
Just by curiosity, what is your use case in having 500+ input files ?
Contributor
Author
|
I am working on a fuzzer for OCaml syntax, a use-case with as many input files as one wants :). However, 80 files is enough to feel a slow-down when running multiple ocamlformat in parallel (with 80 files, there is already ~50k calls to stat, which causes a significant increases to kernel load when many processes are hammering the FS simultaneously). |
Julow
approved these changes
Sep 22, 2025
Collaborator
Julow
left a comment
There was a problem hiding this comment.
Awesome :) Thanks a lot !
It seems that you will be extensively testing OCamlformat as well. Don't hesitate to reach out to me if you need onboarding into the codebase.
Julow
added a commit
to Julow/opam-repository
that referenced
this pull request
Oct 24, 2025
….28.1) CHANGES: ### Highlight - \* Support for OCaml 5.4 (ocaml-ppx/ocamlformat#2717, ocaml-ppx/ocamlformat#2720, ocaml-ppx/ocamlformat#2732, ocaml-ppx/ocamlformat#2733, ocaml-ppx/ocamlformat#2735, @Julow, @Octachron, @cod1r, @EmileTrotignon) OCamlformat now supports OCaml 5.4 syntax. Module packing of the form `((module M) : (module S))` are no longer rewritten to `(module M : S)` because these are now two different syntaxes. - \* Reduce indentation after `|> map (fun` (ocaml-ppx/ocamlformat#2694, @EmileTrotignon) Notably, the indentation no longer depends on the length of the infix operator, for example: ```ocaml (* before *) v |>>>>>> map (fun x -> x ) (* after *) v |>>>>>> map (fun x -> x ) ``` `@@ match` can now also be on one line. ### Added - Added option `module-indent` option (ocaml-ppx/ocamlformat#2711, @HPRIOR) to control the indentation of items within modules. This affects modules and signatures. For example, module-indent=4: ```ocaml module type M = sig type t val f : (string * int) list -> int end ``` - `exp-grouping=preserve` is now the default in `default` and `ocamlformat` profiles. This means that its now possible to use `begin ... end` without tweaking ocamlformat. (ocaml-ppx/ocamlformat#2716, @EmileTrotignon) ### Deprecated - Starting in this release, ocamlformat can use cmdliner >= 2.0.0. When that is the case, the tool no longer accepts unambiguous option names prefixes. For example, `--max-iter` is not accepted anymore, you have to pass the full option `--max-iters`. This does not apply to the keys in the `.ocamlformat` configuration files, which have always required the full name. See dbuenzli/cmdliner#200. (ocaml-ppx/ocamlformat#2680, @emillon) ### Changed - \* The formatting of infix extensions is now consistent with regular formatting by construction. This reduces indentation in `f @@ match%e` expressions to the level of indentation in `f @@ match`. Other unknown inconsistencies might also be fixed. (ocaml-ppx/ocamlformat#2676, @EmileTrotignon) - \* The spacing of infix attributes is now consistent across keywords. Every keyword but `begin` `function`, and `fun` had attributes stuck to the keyword: `match[@A]`, but `fun [@A]`. Now its also `fun[@A]`. (ocaml-ppx/ocamlformat#2676, @EmileTrotignon) - \* The formatting of`let a = b in fun ...` is now consistent with other contexts like `a ; fun ...`. A check for the syntax `let a = fun ... in ...` was made more precise. (ocaml-ppx/ocamlformat#2705, @EmileTrotignon) - \* `|> begin`, `~arg:begin`, `begin if`, `lazy begin`, `begin match`, `begin fun` and `map li begin fun` can now be printed on the same line, with one less indentation level for the body of the inner expression. (ocaml-ppx/ocamlformat#2664, ocaml-ppx/ocamlformat#2666, ocaml-ppx/ocamlformat#2671, ocaml-ppx/ocamlformat#2672, ocaml-ppx/ocamlformat#2681, ocaml-ppx/ocamlformat#2685, ocaml-ppx/ocamlformat#2693, @EmileTrotignon) For example : ```ocaml (* before *) begin fun x -> some code end (* after *) begin fun x -> some code end ``` - \* `break-struct=natural` now also applies to `sig ... end`. (ocaml-ppx/ocamlformat#2682, @EmileTrotignon) ### Fixed - Fixed `wrap-comments=true` not working with the janestreet profile (ocaml-ppx/ocamlformat#2645, @Julow) Asterisk-prefixed comments are also now formatted the same way as with the default profile. - Fixed `nested-match=align` not working with `match%ext` (ocaml-ppx/ocamlformat#2648, @EmileTrotignon) - Fixed the AST generated for bindings of the form `let pattern : type = function ...` (ocaml-ppx/ocamlformat#2651, @v-gb) - Print valid syntax for the corner case (1).a (ocaml-ppx/ocamlformat#2653, @v-gb) - `Ast_mapper.default_mapper` now iterates on the location of `in` in `let+ .. in ..` (ocaml-ppx/ocamlformat#2658, @v-gb) - Fix missing parentheses in `let+ (Cstr _) : _ = _` (ocaml-ppx/ocamlformat#2661, @Julow) This caused a crash as the generated code wasn't valid syntax. - Fix bad indentation of `let%ext { ...` (ocaml-ppx/ocamlformat#2663, @EmileTrotignon) with `dock-collection-brackets` enabled. - ocamlformat is now more robust when used as a library to print modified ASTs (ocaml-ppx/ocamlformat#2659, @v-gb) - Fix crash due to edge case with asterisk-prefixed comments (ocaml-ppx/ocamlformat#2674, @Julow) - Fix crash when formatting `mld` files that cannot be lexed as ocaml (e.g. containing LaTeX or C code) (ocaml-ppx/ocamlformat#2684, @emillon) - \* Fix double parens around module constraint in functor application : `module M = F ((A : T))` becomes `module M = F (A : T)`. (ocaml-ppx/ocamlformat#2678, @EmileTrotignon) - Fix misplaced `;;` due to interaction with floating doc comments. (ocaml-ppx/ocamlformat#2691, @EmileTrotignon) - The formatting of attributes of expression is now aware of the attributes infix or postix positions: `((fun [@A] x -> y) [@b])` is formatted without moving attributes. (ocaml-ppx/ocamlformat#2676, @EmileTrotignon) - `begin%e ... end` and `begin [@A] ... end` nodes are always preserved. (ocaml-ppx/ocamlformat#2676, @EmileTrotignon) - `begin end` syntax for `()` is now preserved. (ocaml-ppx/ocamlformat#2676, @EmileTrotignon) - Fix a crash on `type 'a t = A : 'a. {a: 'a} -> 'a t`. (ocaml-ppx/ocamlformat#2710, @EmileTrotignon) - Fix a crash where `type%e nonrec t = t` was formatted as `type nonrec%e t = t`, which is invalid syntax. (ocaml-ppx/ocamlformat#2712, @EmileTrotignon) - Fix commandline parsing being quadratic in the number of arguments (ocaml-ppx/ocamlformat#2724, @let-def) - \* Fix `;;` being added after a documentation comment (ocaml-ppx/ocamlformat#2683, @EmileTrotignon) This results in more `;;` being inserted, for example: ```ocaml (* before *) print_endline "foo" let a = 3 (* after *) print_endline "foo" ;; let a = 3 ``` - Fix dropped comment in `if then (* comment *) begin .. end` (ocaml-ppx/ocamlformat#2734, @Julow)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Command‑line parsing occurs in two passes: one pass parses the actions, and a second pass parses the configuration.
Because the configuration parser is invoked for every input file, the overall parsing time grows quadratically with the number of files.
This patch memoizes the configuration so that the parser runs only once. The configuration changes are still applied to each input file, preserving the original ocamlformat behaviour while making the runtime independent of the number of inputs.
Below are two plots that illustrate the performance before and after the change. They show the time spent and the number of stat calls as functions of the number of input files, clearly demonstrating the quadratic behaviour that the patch removes.