While investigating suspiciously high system CPU usage on a few servers running ATS, we observed that the traffic_server process was causing more than 50 thousand mmap calls per second, all of which with length 65536.
# pid 31316 is traffic_server
$ sudo bpftrace -e 'kprobe:do_mmap /pid == 31316/ { @[arg2]=count(); } interval:s:1 { exit(); }'
Attaching 2 probes...
@[65536]: 64988
By looking at LuaJIT's source code we found that the function mcode_alloc_at is responsible for calling mmap, so we obtained a few backtraces for that function with the following SystemTap script:
probe process("/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0").function("mcode_alloc_at") {
print_ubacktrace()
exit()
}
The script must be executed as follows to properly include all symbols:
$ sudo stap -v -d /usr/bin/traffic_server -d /usr/lib/trafficserver/modules/tslua.so mcode_alloc_at.stp
Most backtraces looked either like this:
0x7fb44df01338 : mcode_allocarea+0xb8/0x160 [/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
0x7fb44df19d54 : lj_asm_trace+0x25b4/0x3460 [/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
0x7fb44df381d1 : trace_state+0x201/0x1340 [/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
0x7fb44def2146 : lj_vm_cpcall+0x4a/0x5e [/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
0x7fb44def864b : lj_trace_ins+0x5b/0x70 [/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
0x7fb44df410dc : lj_dispatch_call+0x21c/0x250 [/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
0x7fb44def37ba : lj_vm_hotcall+0x1e/0x43 [/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
0x7fb44df3f22c : lua_pcall+0x5c/0x80 [/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
0x7fb52800d9a3 : globalHookHandler+0x313/0x450 [/usr/lib/trafficserver/modules/tslua.so]
Or like this:
0x7fb44df01338 : mcode_allocarea+0xb8/0x160 [/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
0x7fb44df19d54 : lj_asm_trace+0x25b4/0x3460 [/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
0x7fb44df381d1 : trace_state+0x201/0x1340 [/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
0x7fb44def2146 : lj_vm_cpcall+0x4a/0x5e [/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
0x7fb44def864b : lj_trace_ins+0x5b/0x70 [/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
0x7fb44df34e17 : lj_dispatch_ins+0x177/0x1e0 [/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
0x7fb44def3741 : lj_vm_inshook+0x2d/0x44 [/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
0x7fb44df48ec1 : lj_cf_string_gsub+0x2e1/0x440 [/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
0x7fb44def1dd7 : lj_BC_FUNCC+0x34/0x59 [/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
0x7fb52801a0fe : ts_lua_http_cont_handler+0x15e/0x410 [/usr/lib/trafficserver/modules/tslua.so]
I think the former traces are from tslua used as a global plugin, the latter for remap rules.
Almost all our servers were affected by this, with the exception of a handful were traffic_server almost never called mmap. We do not have any valid theory explaining the difference, given that the servers run the same software stack and configuration.
However, the discrepancy allowed us to observe that on systems not calling mmap, the various Lua JIT trace calls (eg: lj_trace_ins) were also not called. We hence tried to disable the JIT compiler and only rely on the interpreter by adding jit.off(true, true) to all our Lua scripts. The result was that indeed mmap stopped being called, system CPU usage went down significantly, and the performance of ATS increased visibly.

We think that what might be happening is that the Lua JIT compiler keeps on identifying hot code and compiling it, instead of retaining and reusing the built code, or something along those lines. It would be great to:
- better understand why the compiler is kicking in so often
- have a configuration setting to turn off the compiler, similarly to what luajit's -joff does: https://luajit.org/running.html
In plugin.config we have:
tslua.so /etc/trafficserver/lua/default.lua --enable-reload
remap.config: https://phabricator.wikimedia.org/P13769
All lua scripts here.
Our bug tracking system has further details on the investigation.
While investigating suspiciously high system CPU usage on a few servers running ATS, we observed that the traffic_server process was causing more than 50 thousand mmap calls per second, all of which with length 65536.
By looking at LuaJIT's source code we found that the function mcode_alloc_at is responsible for calling mmap, so we obtained a few backtraces for that function with the following SystemTap script:
The script must be executed as follows to properly include all symbols:
Most backtraces looked either like this:
Or like this:
I think the former traces are from tslua used as a global plugin, the latter for remap rules.
Almost all our servers were affected by this, with the exception of a handful were traffic_server almost never called mmap. We do not have any valid theory explaining the difference, given that the servers run the same software stack and configuration.
However, the discrepancy allowed us to observe that on systems not calling mmap, the various Lua JIT trace calls (eg: lj_trace_ins) were also not called. We hence tried to disable the JIT compiler and only rely on the interpreter by adding jit.off(true, true) to all our Lua scripts. The result was that indeed mmap stopped being called, system CPU usage went down significantly, and the performance of ATS increased visibly.
We think that what might be happening is that the Lua JIT compiler keeps on identifying hot code and compiling it, instead of retaining and reusing the built code, or something along those lines. It would be great to:
In plugin.config we have:
remap.config: https://phabricator.wikimedia.org/P13769
All lua scripts here.
Our bug tracking system has further details on the investigation.