Skip to content

Significant mmap overhead in Lua plugin #7423

@ema

Description

@ema

While investigating suspiciously high system CPU usage on a few servers running ATS, we observed that the traffic_server process was causing more than 50 thousand mmap calls per second, all of which with length 65536.

# pid 31316 is traffic_server
$ sudo bpftrace -e 'kprobe:do_mmap /pid == 31316/ { @[arg2]=count(); } interval:s:1 { exit(); }'
Attaching 2 probes...
@[65536]: 64988

By looking at LuaJIT's source code we found that the function mcode_alloc_at is responsible for calling mmap, so we obtained a few backtraces for that function with the following SystemTap script:

probe process("/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0").function("mcode_alloc_at") {
  print_ubacktrace()
  exit()
}

The script must be executed as follows to properly include all symbols:

$ sudo stap -v -d /usr/bin/traffic_server -d /usr/lib/trafficserver/modules/tslua.so mcode_alloc_at.stp

Most backtraces looked either like this:

 0x7fb44df01338 : mcode_allocarea+0xb8/0x160 [/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
 0x7fb44df19d54 : lj_asm_trace+0x25b4/0x3460 [/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
 0x7fb44df381d1 : trace_state+0x201/0x1340 [/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
 0x7fb44def2146 : lj_vm_cpcall+0x4a/0x5e [/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
 0x7fb44def864b : lj_trace_ins+0x5b/0x70 [/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
 0x7fb44df410dc : lj_dispatch_call+0x21c/0x250 [/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
 0x7fb44def37ba : lj_vm_hotcall+0x1e/0x43 [/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
 0x7fb44df3f22c : lua_pcall+0x5c/0x80 [/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
 0x7fb52800d9a3 : globalHookHandler+0x313/0x450 [/usr/lib/trafficserver/modules/tslua.so]

Or like this:

 0x7fb44df01338 : mcode_allocarea+0xb8/0x160 [/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
 0x7fb44df19d54 : lj_asm_trace+0x25b4/0x3460 [/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
 0x7fb44df381d1 : trace_state+0x201/0x1340 [/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
 0x7fb44def2146 : lj_vm_cpcall+0x4a/0x5e [/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
 0x7fb44def864b : lj_trace_ins+0x5b/0x70 [/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
 0x7fb44df34e17 : lj_dispatch_ins+0x177/0x1e0 [/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
 0x7fb44def3741 : lj_vm_inshook+0x2d/0x44 [/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
 0x7fb44df48ec1 : lj_cf_string_gsub+0x2e1/0x440 [/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
 0x7fb44def1dd7 : lj_BC_FUNCC+0x34/0x59 [/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
 0x7fb52801a0fe : ts_lua_http_cont_handler+0x15e/0x410 [/usr/lib/trafficserver/modules/tslua.so]

I think the former traces are from tslua used as a global plugin, the latter for remap rules.

Almost all our servers were affected by this, with the exception of a handful were traffic_server almost never called mmap. We do not have any valid theory explaining the difference, given that the servers run the same software stack and configuration.
However, the discrepancy allowed us to observe that on systems not calling mmap, the various Lua JIT trace calls (eg: lj_trace_ins) were also not called. We hence tried to disable the JIT compiler and only rely on the interpreter by adding jit.off(true, true) to all our Lua scripts. The result was that indeed mmap stopped being called, system CPU usage went down significantly, and the performance of ATS increased visibly.

ats-sys-cpu

We think that what might be happening is that the Lua JIT compiler keeps on identifying hot code and compiling it, instead of retaining and reusing the built code, or something along those lines. It would be great to:

  • better understand why the compiler is kicking in so often
  • have a configuration setting to turn off the compiler, similarly to what luajit's -joff does: https://luajit.org/running.html

In plugin.config we have:

tslua.so /etc/trafficserver/lua/default.lua --enable-reload

remap.config: https://phabricator.wikimedia.org/P13769

All lua scripts here.

Our bug tracking system has further details on the investigation.

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions