Skip to content

Intermittent abort: hts_mutexlock() lazy-initialization of htsmutex is not thread-safe (race on first lock) #297

@yxscc

Description

@yxscc

Hi, I hit an intermittent abort inside pthread_mutex_lock() during a research-oriented multi-threaded stress test I run as part of my concurrency testing work. The stress test links against HTTrack's own implementation of hts_mutexlock() (from src/htsthread.c). I was not able to get a clean, deterministic black-box reproducer in the full webhttrack app, so I kept the report focused on a minimal reproducer and a source-level analysis pointing to hts_mutexlock() in src/htsthread.c.

In short: hts_mutexlock() does lazy initialization when *mutex == HTSMUTEX_INIT (NULL), but that initialization is not synchronized. If two threads try to lock the same htsmutex for the first time at the same time, the init path can race, leading to undefined behavior (on my system: an abort inside glibc).


Environment

  • OS: Linux (glibc/pthreads)
  • Compiler: gcc
  • Repro: minimal PoC (re-implements HTTrack's hts_mutexlock/hts_mutexinit logic)

Observed behavior

Under concurrency, I can trigger an abort like:

pthread_mutex_lock.c: Assertion `mutex->__data.__owner == 0' failed.

This looks like the underlying pthread_mutex_t state is getting corrupted or the program is not consistently locking the same mutex object across threads.


Minimal reproduction (PoC)

To avoid relying on a full project build, I wrote a tiny PoC that re-implements the core logic used by HTTrack:

  • if (*mutex == NULL) hts_mutexinit(mutex);
  • then immediately pthread_mutex_lock(&(*mutex)->handle);

The PoC starts many threads which all attempt to lock the same htsmutex for the first time (starting from NULL), to maximize the race window.

PoC code (inline)

/*
 * Minimal reproducer for an unsynchronized lazy-init mutex wrapper.
 *
 * Build:
 *   gcc -O2 -pthread poc_mutex_lazy_init_race.c -o /tmp/httrack_mutex_race_poc
 */
#include <pthread.h>
#include <sched.h>
#include <stdlib.h>

typedef struct htsmutex_s { pthread_mutex_t handle; } htsmutex_s, *htsmutex;
#define HTSMUTEX_INIT NULL

static void hts_mutexinit(htsmutex *m) {
  htsmutex_s *p = (htsmutex_s*)malloc(sizeof(*p));
  if (!p) abort();
  pthread_mutex_init(&p->handle, 0);
  *m = p;
}

static void hts_mutexlock(htsmutex *m) {
  if (*m == HTSMUTEX_INIT) hts_mutexinit(m); /* <-- unsynchronized lazy init */
  pthread_mutex_lock(&(*m)->handle);
}

static void hts_mutexrelease(htsmutex *m) {
  pthread_mutex_unlock(&(*m)->handle);
}

static htsmutex g_lock = HTSMUTEX_INIT;
static pthread_barrier_t g_bar;

static void *worker(void *arg) {
  (void)arg;
  pthread_barrier_wait(&g_bar);
  for (int i = 0; i < 50000; i++) {
    hts_mutexlock(&g_lock);
    if ((i & 0xff) == 0) sched_yield();
    hts_mutexrelease(&g_lock);
  }
  return NULL;
}

int main(void) {
  const int n = 64;
  pthread_t th[n];
  pthread_barrier_init(&g_bar, NULL, (unsigned)n);
  for (int i = 0; i < n; i++) pthread_create(&th[i], NULL, worker, NULL);
  for (int i = 0; i < n; i++) pthread_join(th[i], NULL);
  return 0;
}

Build

gcc -O2 -pthread poc_mutex_lazy_init_race.c -o /tmp/httrack_mutex_race_poc

Run

/tmp/httrack_mutex_race_poc

On my Linux machine, this is fairly reliable at triggering the pthread_mutex_lock assertion abort shown above.


Suspected root cause (source location)

hts_mutexlock() on Linux does:

HTSEXT_API void hts_mutexlock(htsmutex * mutex) {
  assertf(mutex != NULL);
  if (*mutex == HTSMUTEX_INIT) {        /* must be initialized */
    hts_mutexinit(mutex);
  }
  assertf(*mutex != NULL);
#ifndef _WIN32
  pthread_mutex_lock(&(*mutex)->handle);
#endif
}

The key issue is:

  • when *mutex == NULL, it allocates/initializes a new htsmutex_s
  • but this path has no synchronization (no global lock / pthread_once / atomic CAS) to ensure the initialization is performed exactly once

So if two threads call hts_mutexlock(&m) concurrently when m == NULL, it can go like:

  1. T1 sees m == NULL and starts init
  2. T2 also sees m == NULL and starts init
  3. both allocate different htsmutex_s objects and race to store to m

This makes mutual exclusion unreliable. In my case it manifests as a crash/assertion abort; in general it may also lead to two threads "locking" different underlying mutexes and entering a critical section concurrently.


Why this seems plausible in normal usage

The API itself explicitly supports *mutex == NULL by performing initialization inside hts_mutexlock(), so "locking without explicit init" seems to be part of the intended semantics.

webhttrack also uses this pattern (static mutex set to HTSMUTEX_INIT and then first-use init via hts_mutexlock()):

  • src/htsweb.c: static htsmutex pingMutex = HTSMUTEX_INIT;
  • background thread client_ping() locks pingMutex
  • main thread locks it as well after each accept()
  • note: in current client_ping() implementation it mostly sleeps while the parent process is alive; it starts calling getPingId() (and thus locking pingMutex) after it detects the parent is gone, so the overlap is more likely later in runtime (after parent death) rather than during early startup

So this can happen in normal operation (probability depends on scheduling/load). More generally, any htsmutex that relies on first-use initialization is affected if its first lock can occur concurrently.


Expected behavior

hts_mutexlock() should be safe under concurrent first use:

  • the lazy-init path should be thread-safe (initialize exactly once)
  • should not crash or break mutual exclusion

Questions

  • Is hts_mutexlock()'s lazy-init intended to be safe under multi-threading? If yes, then it looks like the init path needs synchronization.
  • Is there a recommended way to initialize htsmutex objects (e.g. require explicit init in a single-threaded phase)?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions