precompute tag to_string variants#19
Conversation
|
Thanks for this! I like where this is going. On the question of how to build the list of tags, I guess wikipedia pages are as good as any as a data source, but it might also be worthwhile to get a sense of what tags are commonly used in larger pages, and pages that are more like web apps. (Maybe the phoenix or elixir websites would be good candidates?). I get the impression that this macro won't add very much overhead to the runtime (all things considered) so we could err on the side of including more tags in the list. One other thing: it looks like the CI build is failing on the |
|
I ran a quick script to compute a bit more data, and this is what it came up with: I tweaked the tagset that gets generated fns in 5427145 a bit in response to this data as well as this survey of what tags pages tend to contain. The script source used to generate the data: #!/usr/bin/env elixir
Mix.install([:req, :easyhtml, {:util, git: "https://github.com/ckampfe/util.git"}])
links = [
"https://elixir-lang.org/",
"https://www.phoenixframework.org/",
"https://www.nytimes.com/",
]
links
|> Task.async_stream(fn link ->
IO.puts("fetching #{link}")
{link, Req.get!(link)}
end)
|> Enum.map(fn {:ok, {link, res}} ->
IO.puts("fetched #{link}")
Map.fetch!(res, :body)
end)
|> Enum.map(fn body ->
EasyHTML.parse!(body)
end)
|> Enum.flat_map(fn parsed ->
Util.traverse(List.first(parsed.nodes),
fn
{_tag, _attrs, children} when is_list(children) ->
true
_ -> false
end,
fn {_tag, _attrs, children} -> children end
)
end)
|> Enum.group_by(fn {tag, _, _} -> tag; _ -> nil end)
|> Enum.map(fn {tag, values} -> {tag, Enum.count(values)} end)
|> Enum.filter(fn {tag, _count} -> !is_nil(tag) end)
|> Enum.sort_by(fn {_tag, count} -> count end)
|> Enum.reverse()
|> Enum.take(25)
|> IO.inspect() |
|
This all looks great, thanks! I'm going to merge it today and release it in 2.0.0 |
I'm one of those people who likes to tinker with performance, so I got to tinkering a bit and this is what I came up with. If this type of thing is more complication than you feel is right for the project, no worries, I had fun doing it, I just wanted to offer it up for discussion/ideas just in case you felt it was something you wanted.
Anyway, the idea is this: in
sneeze, users are going to be serializing the same of tags over and over again. Especially in large templates there can be hundreds or even thousands of the same tags. I did some crude evidence gathering by pulling down 20 wikipedia pages at random, parsing them, and aggregating the counts of the tags they contained, and sure enough there is a ton of tag repetition, especially on the heaviest hitters, likea,li,span,div,td, andul. (Is wikipedia a representative sample? I let you be the judge!)Rather than serialize the same atoms over and over again via
to_string, we can precompute the string representation of those atoms, store them, and return that precomputed string via a pattern match, like so:Returning the literal
"a"has the effect of"a"becoming a compiled constant, and subsequent code that callstag_to_string(:a)referencing that constant rather than have to stringify:awith each call (as per https://elixirforum.com/t/beam-optimization-for-functions-with-static-return-type/1868/2)So I set up some
bencheebenchmarks, with the benchmarking script looking like this:And the benchmarked template looking like this:
I ran each branch a few times each on OTP 26 and Elixir 1.15 with:
mix deps.clean --all && mix clean && mix deps.get && MIX_ENV=bench mix run bench.exs.Full disclosure: these benchmarks were on my laptop on battery power. I can try to run on mains power or a linux machine later if that's of interest.
Results on branch
main:Results on branch
precompute-tag-strings(this branch):resulting in these aggregate speedups:
Further/open questions:
In any case thanks again for this library, I just wanted to share some tinkering I did in case you thought it was of use.