Skip to content

precompute tag to_string variants#19

Merged
JuneKelly merged 3 commits into
JuneKelly:mainfrom
ckampfe:precompute-tag-strings
Jul 10, 2023
Merged

precompute tag to_string variants#19
JuneKelly merged 3 commits into
JuneKelly:mainfrom
ckampfe:precompute-tag-strings

Conversation

@ckampfe
Copy link
Copy Markdown
Contributor

@ckampfe ckampfe commented Jun 28, 2023

I'm one of those people who likes to tinker with performance, so I got to tinkering a bit and this is what I came up with. If this type of thing is more complication than you feel is right for the project, no worries, I had fun doing it, I just wanted to offer it up for discussion/ideas just in case you felt it was something you wanted.

Anyway, the idea is this: in sneeze, users are going to be serializing the same of tags over and over again. Especially in large templates there can be hundreds or even thousands of the same tags. I did some crude evidence gathering by pulling down 20 wikipedia pages at random, parsing them, and aggregating the counts of the tags they contained, and sure enough there is a ton of tag repetition, especially on the heaviest hitters, like a, li, span, div, td, and ul. (Is wikipedia a representative sample? I let you be the judge!)

Rather than serialize the same atoms over and over again via to_string, we can precompute the string representation of those atoms, store them, and return that precomputed string via a pattern match, like so:

defp tag_to_string(:a) do
  "a"
end

Returning the literal "a" has the effect of "a" becoming a compiled constant, and subsequent code that calls tag_to_string(:a) referencing that constant rather than have to stringify :a with each call (as per https://elixirforum.com/t/beam-optimization-for-functions-with-static-return-type/1868/2)

So I set up some benchee benchmarks, with the benchmarking script looking like this:

layout = SneezeBench.layout("some title", "some content")

Benchee.run(
  %{
    "render/1" => fn -> Sneeze.render(layout) end,
    "render_iodata/1" => fn -> Sneeze.render_iodata(layout) end
  },
  time: 10,
  memory_time: 2
)

And the benchmarked template looking like this:

defmodule SneezeBench do
  def layout(title, content) do
    [
      [:__@raw_html, "<!DOCTYPE html>"],
      [
        :head,
        [:meta, %{charset: "utf-8"}],
        [:meta, %{content: "IE=edge,chrome=1", "http-equiv": "X-UA-Compatible"}],
        [:title, [:__@raw_html, title]],
        [:meta, %{content: "width=device-width", name: "viewport"}],
        [:link, %{rel: "icon", href: "favicon-min.png", type: "image.png"}]
      ],
      [
        :body,
        [
          :div,
          %{class: "container"},
          [
            :div,
            %{class: "site"},
            [
              :div,
              %{class: "header"},
              [:h1, %{class: "title"}, [:a, %{href: "index.html"}, "Clark Kampfe"]],
              [:a, %{class: "extra", href: "about.html"}, "about"],
              " ",
              [:a, %{class: "extra", href: "resume.html"}, "resumé"]
            ],
            content,
            [
              :div,
              %{class: "footer"},
              [
                :div,
                %{class: "contact"},
                [
                  :p,
                  [:a, %{href: "https://github.com/ckampfe"}, "github"],
                  " ",
                  [:a, %{href: "https://twitter.com/ckampfe"}, "twitter"],
                  " ",
                  [:a, %{href: "/feed"}, "rss"]
                ]
              ]
            ]
          ]
        ]
      ]
    ]
  end
end

I ran each branch a few times each on OTP 26 and Elixir 1.15 with: mix deps.clean --all && mix clean && mix deps.get && MIX_ENV=bench mix run bench.exs.

Full disclosure: these benchmarks were on my laptop on battery power. I can try to run on mains power or a linux machine later if that's of interest.

Results on branch main:

Generated sneeze_bench app
Operating System: macOS
CPU Information: Apple M1 Max
Number of Available Cores: 10
Available memory: 64 GB
Elixir 1.15.0
Erlang 26.0.1

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 10 s
memory time: 2 s
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 28 s

Benchmarking render/1 ...
Benchmarking render_iodata/1 ...

Name                      ips        average  deviation         median         99th %
render_iodata/1      204.69 K        4.89 μs   ±294.69%        4.13 μs        7.88 μs
render/1             140.47 K        7.12 μs   ±137.40%        6.54 μs       12.54 μs

Comparison:
render_iodata/1      204.69 K
render/1             140.47 K - 1.46x slower +2.23 μs

Memory usage statistics:

Name               Memory usage
render_iodata/1        11.60 KB
render/1               11.67 KB - 1.01x memory usage +0.0703 KB

**All measurements for memory usage were the same**

Results on branch precompute-tag-strings (this branch):

Generated sneeze_bench app
Operating System: macOS
CPU Information: Apple M1 Max
Number of Available Cores: 10
Available memory: 64 GB
Elixir 1.15.0
Erlang 26.0.1

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 10 s
memory time: 2 s
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 28 s

Benchmarking render/1 ...
Benchmarking render_iodata/1 ...

Name                      ips        average  deviation         median         99th %
render_iodata/1      248.28 K        4.03 μs   ±441.95%        3.29 μs        6.17 μs
render/1             155.87 K        6.42 μs   ±229.69%        5.63 μs        9.58 μs

Comparison:
render_iodata/1      248.28 K
render/1             155.87 K - 1.59x slower +2.39 μs

Memory usage statistics:

Name               Memory usage
render_iodata/1        10.78 KB
render/1               10.80 KB - 1.00x memory usage +0.0156 KB

**All measurements for memory usage were the same**

resulting in these aggregate speedups:

branch render/1 throughput increase render_iodata/1 throughput increase
main 1.0x 1.0x
this branch ~1.11x ~1.21x

Further/open questions:

  • The sample template is very small and simple, so I am curious of the effect of this work on (much) larger templates. I think it's reasonable to guess that the results would be even further exaggerated for larger templates, but that's just a guess, I've only run with the above template.
  • I based the tag_to_string function generation order on randomly gathered wikipedia pages. Is this tag order reasonable? Would more memoized tags result in a speedup? Are there too many as it is? I have no idea.

In any case thanks again for this library, I just wanted to share some tinkering I did in case you thought it was of use.

@JuneKelly
Copy link
Copy Markdown
Owner

JuneKelly commented Jun 28, 2023

Thanks for this! I like where this is going.

On the question of how to build the list of tags, I guess wikipedia pages are as good as any as a data source, but it might also be worthwhile to get a sense of what tags are commonly used in larger pages, and pages that are more like web apps. (Maybe the phoenix or elixir websites would be good candidates?).

I get the impression that this macro won't add very much overhead to the runtime (all things considered) so we could err on the side of including more tags in the list.

One other thing: it looks like the CI build is failing on the format stage. Would you mind running make format and pushing the changes? Thanks!

@ckampfe
Copy link
Copy Markdown
Contributor Author

ckampfe commented Jun 28, 2023

I ran a quick script to compute a bit more data, and this is what it came up with:

at [ 17:32:41 ] ➜ ./aggregate_tags.exs
fetching https://elixir-lang.org/
fetching https://www.phoenixframework.org/
fetching https://www.nytimes.com/
fetched https://elixir-lang.org/
fetched https://www.phoenixframework.org/
fetched https://www.nytimes.com/
[
  {"div", 1278},
  {"span", 332},
  {"a", 252},
  {"li", 182},
  {"section", 123},
  {"p", 116},
  {"h3", 100},
  {"figure", 94},
  {"img", 82},
  {"svg", 75},
  {"picture", 51},
  {"script", 49},
  {"meta", 44},
  {"link", 43},
  {"button", 41},
  {"figcaption", 36},
  {"source", 29},
  {"noscript", 29},
  {"path", 21},
  {"polygon", 20},
  {"h4", 19},
  {"ul", 17},
  {"i", 14},
  {"code", 14},
  {"nav", 12}
]

I tweaked the tagset that gets generated fns in 5427145 a bit in response to this data as well as this survey of what tags pages tend to contain.

The script source used to generate the data:

#!/usr/bin/env elixir

Mix.install([:req, :easyhtml, {:util, git: "https://github.com/ckampfe/util.git"}])

links = [
  "https://elixir-lang.org/",
  "https://www.phoenixframework.org/",
  "https://www.nytimes.com/",
]

links
|> Task.async_stream(fn link ->
  IO.puts("fetching #{link}")
  {link, Req.get!(link)}
end)
|> Enum.map(fn {:ok, {link, res}} ->
  IO.puts("fetched #{link}")
  Map.fetch!(res, :body)
end)
|> Enum.map(fn body ->
  EasyHTML.parse!(body)
end)
|> Enum.flat_map(fn parsed ->
  Util.traverse(List.first(parsed.nodes),
    fn
      {_tag, _attrs, children} when is_list(children) -> 
        true
      _ -> false
    end,
    fn {_tag, _attrs, children} -> children end
  )
end)
|> Enum.group_by(fn {tag, _, _} -> tag; _ -> nil end)
|> Enum.map(fn {tag, values} -> {tag, Enum.count(values)} end)
|> Enum.filter(fn {tag, _count} -> !is_nil(tag) end)
|> Enum.sort_by(fn {_tag, count} -> count end)
|> Enum.reverse()
|> Enum.take(25)
|> IO.inspect()

@JuneKelly
Copy link
Copy Markdown
Owner

This all looks great, thanks!

I'm going to merge it today and release it in 2.0.0

@JuneKelly JuneKelly merged commit a5bb0a1 into JuneKelly:main Jul 10, 2023
@ckampfe ckampfe deleted the precompute-tag-strings branch July 10, 2023 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants