Skip to content
This repository was archived by the owner on Jun 2, 2025. It is now read-only.
This repository was archived by the owner on Jun 2, 2025. It is now read-only.

Add feature NOT to open (/proc/../overcommit_memory, /sys/../transparent_hugepage/enabled") on jemalloc initialization #1426

@altmind

Description

@altmind

There exist some third party libraries, that hook into open() and syscall_open(), one such example is openonload that does that so it can accelerate tcp stack using special user-space drivers. Unfortunately, inside open() handler, onload allocates some memory with malloc.

This happens when initializing statically-built-in jemalloc 4.5, usually very early in program initialization stage, before we reach main():

malloc_init() -> syscall_open/open("/proc/sys/vm/overcommit_memory") -> onload_open ->citp_do_init->citp_syscall_init->citp_find_all_sys_calls->dlsym->_dlerror_run->calloc->ialloc_body->malloc_init
Partial stacktrace(stack overflowed): https://gist.github.com/altmind/406a26e06ca7b2bacf9eff27af002a0a

If jemalloc is built with --enable-lazy-lock, this leads to stack overflow and program crash. If built without this flag, this leads to a deadlock:

(gdb) bt
#0  0x00007f19763014ed in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f19762fce01 in _L_lock_1093 () from /lib64/libpthread.so.0
#2  0x00007f19762fcda2 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00000000008e1544 in je_malloc_mutex_lock (tsdn=0x0, mutex=0xcab000 <init_lock>) at include/jemalloc/internal/mutex.h:101
#4  malloc_init_hard () at src/jemalloc.c:1486
#5  0x00000000008e5d75 in malloc_init () at src/jemalloc.c:317
#6  ialloc_body (slow_path=true, usize=<synthetic pointer>, tsdn=<synthetic pointer>, zero=true, size=<optimized out>) at src/jemalloc.c:1583
#7  calloc (num=<optimized out>, size=<optimized out>) at src/jemalloc.c:1824
#8  0x00007f197691f550 in _dlerror_run () from /lib64/libdl.so.2
#9  0x00007f197691f058 in dlsym () from /lib64/libdl.so.2
#10 0x00007f1976d33f17 in citp_find_all_sys_calls () at ../../../../../src/include/onload/declare_syscalls.h.tmpl:40
#11 citp_syscall_init () at ../../../../../src/lib/transport/unix/sys.c:103
#12 0x00007f1976d338b6 in citp_do_init (max_init_level=max_init_level@entry=2) at ../../../../../src/lib/transport/unix/startup.c:692
#13 0x00007f1976d396db in onload_open (pathname=pathname@entry=0x9d9270 "/proc/sys/vm/overcommit_memory", flags=0) at ../../../../../src/lib/transport/unix/sockcall_intercept.c:2126
#14 0x00007f1976d3b241 in onload_syscall (nr=nr@entry=2) at ../../../../../src/lib/transport/unix/sockcall_intercept.c:2730
#15 0x0000000000919075 in os_overcommits_proc () at src/pages.c:252
#16 je_pages_boot () at src/pages.c:294
#17 0x00000000008e1643 in malloc_init_hard_a0_locked () at src/jemalloc.c:1366
#18 malloc_init_hard () at src/jemalloc.c:1493
#19 0x00000000008e3795 in malloc_init () at src/jemalloc.c:317
#20 ialloc_body (slow_path=true, usize=<synthetic pointer>, tsdn=<synthetic pointer>, zero=false, size=<optimized out>) at src/jemalloc.c:1583
#21 malloc (size=<optimized out>) at src/jemalloc.c:1647
#22 0x00007f1973bd20ca in strdup () from /lib64/libc.so.6
#23 0x00007f1972371b69 in ?? () from /lib64/libselinux.so.1
#24 0x00007f1972371c0f in ?? () from /lib64/libselinux.so.1
#25 0x00007f1976fdb903 in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2
#26 0x00007f1976fcd15a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#27 0x0000000000000001 in ?? ()
#28 0x00007ffcd2fcce1a in ?? ()
#29 0x0000000000000000 in ?? ()

I understand that its may be not jemalloc problem itself, but it simply does not live well with onload, built and used with openonload recommendations.

In the end of the day, its a bold assumption that stdlib open() or syscall_open() do not call malloc in them. Here, this assumptions are not satisfied.

This problem can be alleviated if we add a way to disable open() call - with env option or, as a last resort with a DEFINE.
The alternative is - if malloc_init detects its being called recursively, it should return something reasonable immediately, so outer call of malloc_init can finish the real jemalloc initialization later(bootstrap).

There seems to be quite some problems with malloc_init deadlocking on very odd conditions, for example #916 #329 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=918742

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions