-
Notifications
You must be signed in to change notification settings - Fork 733
Description
Hi, I hit an intermittent abort inside pthread_mutex_lock() during a research-oriented multi-threaded stress test I run as part of my concurrency testing work. The stress test links against HTTrack's own implementation of hts_mutexlock() (from src/htsthread.c). I was not able to get a clean, deterministic black-box reproducer in the full webhttrack app, so I kept the report focused on a minimal reproducer and a source-level analysis pointing to hts_mutexlock() in src/htsthread.c.
In short: hts_mutexlock() does lazy initialization when *mutex == HTSMUTEX_INIT (NULL), but that initialization is not synchronized. If two threads try to lock the same htsmutex for the first time at the same time, the init path can race, leading to undefined behavior (on my system: an abort inside glibc).
Environment
- OS: Linux (glibc/pthreads)
- Compiler:
gcc - Repro: minimal PoC (re-implements HTTrack's
hts_mutexlock/hts_mutexinitlogic)
Observed behavior
Under concurrency, I can trigger an abort like:
pthread_mutex_lock.c: Assertion `mutex->__data.__owner == 0' failed.
This looks like the underlying pthread_mutex_t state is getting corrupted or the program is not consistently locking the same mutex object across threads.
Minimal reproduction (PoC)
To avoid relying on a full project build, I wrote a tiny PoC that re-implements the core logic used by HTTrack:
if (*mutex == NULL) hts_mutexinit(mutex);- then immediately
pthread_mutex_lock(&(*mutex)->handle);
The PoC starts many threads which all attempt to lock the same htsmutex for the first time (starting from NULL), to maximize the race window.
PoC code (inline)
/*
* Minimal reproducer for an unsynchronized lazy-init mutex wrapper.
*
* Build:
* gcc -O2 -pthread poc_mutex_lazy_init_race.c -o /tmp/httrack_mutex_race_poc
*/
#include <pthread.h>
#include <sched.h>
#include <stdlib.h>
typedef struct htsmutex_s { pthread_mutex_t handle; } htsmutex_s, *htsmutex;
#define HTSMUTEX_INIT NULL
static void hts_mutexinit(htsmutex *m) {
htsmutex_s *p = (htsmutex_s*)malloc(sizeof(*p));
if (!p) abort();
pthread_mutex_init(&p->handle, 0);
*m = p;
}
static void hts_mutexlock(htsmutex *m) {
if (*m == HTSMUTEX_INIT) hts_mutexinit(m); /* <-- unsynchronized lazy init */
pthread_mutex_lock(&(*m)->handle);
}
static void hts_mutexrelease(htsmutex *m) {
pthread_mutex_unlock(&(*m)->handle);
}
static htsmutex g_lock = HTSMUTEX_INIT;
static pthread_barrier_t g_bar;
static void *worker(void *arg) {
(void)arg;
pthread_barrier_wait(&g_bar);
for (int i = 0; i < 50000; i++) {
hts_mutexlock(&g_lock);
if ((i & 0xff) == 0) sched_yield();
hts_mutexrelease(&g_lock);
}
return NULL;
}
int main(void) {
const int n = 64;
pthread_t th[n];
pthread_barrier_init(&g_bar, NULL, (unsigned)n);
for (int i = 0; i < n; i++) pthread_create(&th[i], NULL, worker, NULL);
for (int i = 0; i < n; i++) pthread_join(th[i], NULL);
return 0;
}Build
gcc -O2 -pthread poc_mutex_lazy_init_race.c -o /tmp/httrack_mutex_race_pocRun
/tmp/httrack_mutex_race_pocOn my Linux machine, this is fairly reliable at triggering the pthread_mutex_lock assertion abort shown above.
Suspected root cause (source location)
hts_mutexlock() on Linux does:
HTSEXT_API void hts_mutexlock(htsmutex * mutex) {
assertf(mutex != NULL);
if (*mutex == HTSMUTEX_INIT) { /* must be initialized */
hts_mutexinit(mutex);
}
assertf(*mutex != NULL);
#ifndef _WIN32
pthread_mutex_lock(&(*mutex)->handle);
#endif
}The key issue is:
- when
*mutex == NULL, it allocates/initializes a newhtsmutex_s - but this path has no synchronization (no global lock /
pthread_once/ atomic CAS) to ensure the initialization is performed exactly once
So if two threads call hts_mutexlock(&m) concurrently when m == NULL, it can go like:
- T1 sees
m == NULLand starts init - T2 also sees
m == NULLand starts init - both allocate different
htsmutex_sobjects and race to store tom
This makes mutual exclusion unreliable. In my case it manifests as a crash/assertion abort; in general it may also lead to two threads "locking" different underlying mutexes and entering a critical section concurrently.
Why this seems plausible in normal usage
The API itself explicitly supports *mutex == NULL by performing initialization inside hts_mutexlock(), so "locking without explicit init" seems to be part of the intended semantics.
webhttrack also uses this pattern (static mutex set to HTSMUTEX_INIT and then first-use init via hts_mutexlock()):
src/htsweb.c:static htsmutex pingMutex = HTSMUTEX_INIT;- background thread
client_ping()lockspingMutex - main thread locks it as well after each
accept() - note: in current
client_ping()implementation it mostly sleeps while the parent process is alive; it starts callinggetPingId()(and thus lockingpingMutex) after it detects the parent is gone, so the overlap is more likely later in runtime (after parent death) rather than during early startup
So this can happen in normal operation (probability depends on scheduling/load). More generally, any htsmutex that relies on first-use initialization is affected if its first lock can occur concurrently.
Expected behavior
hts_mutexlock() should be safe under concurrent first use:
- the lazy-init path should be thread-safe (initialize exactly once)
- should not crash or break mutual exclusion
Questions
- Is
hts_mutexlock()'s lazy-init intended to be safe under multi-threading? If yes, then it looks like the init path needs synchronization. - Is there a recommended way to initialize
htsmutexobjects (e.g. require explicit init in a single-threaded phase)?