Intermittent abort: `hts_mutexlock()` lazy-initialization of `htsmutex` is not thread-safe (race on first lock)

Hi, I hit an intermittent abort inside `pthread_mutex_lock()` during a research-oriented multi-threaded stress test I run as part of my concurrency testing work. The stress test links against HTTrack's own implementation of `hts_mutexlock()` (from `src/htsthread.c`). I was not able to get a clean, deterministic black-box reproducer in the full `webhttrack` app, so I kept the report focused on a minimal reproducer and a source-level analysis pointing to `hts_mutexlock()` in `src/htsthread.c`.

In short: `hts_mutexlock()` does lazy initialization when `*mutex == HTSMUTEX_INIT (NULL)`, but that initialization is not synchronized. If two threads try to lock the same `htsmutex` for the first time at the same time, the init path can race, leading to undefined behavior (on my system: an abort inside glibc).

---

## Environment

- OS: Linux (glibc/pthreads)
- Compiler: `gcc`
- Repro: minimal PoC (re-implements HTTrack's `hts_mutexlock/hts_mutexinit` logic)

---

## Observed behavior

Under concurrency, I can trigger an abort like:

```
pthread_mutex_lock.c: Assertion `mutex->__data.__owner == 0' failed.
```

This looks like the underlying `pthread_mutex_t` state is getting corrupted or the program is not consistently locking the same mutex object across threads.

---

## Minimal reproduction (PoC)

To avoid relying on a full project build, I wrote a tiny PoC that re-implements the core logic used by HTTrack:

- `if (*mutex == NULL) hts_mutexinit(mutex);`
- then immediately `pthread_mutex_lock(&(*mutex)->handle);`

The PoC starts many threads which all attempt to lock the same `htsmutex` for the first time (starting from `NULL`), to maximize the race window.

### PoC code (inline)

```c
/*
 * Minimal reproducer for an unsynchronized lazy-init mutex wrapper.
 *
 * Build:
 *   gcc -O2 -pthread poc_mutex_lazy_init_race.c -o /tmp/httrack_mutex_race_poc
 */
#include <pthread.h>
#include <sched.h>
#include <stdlib.h>

typedef struct htsmutex_s { pthread_mutex_t handle; } htsmutex_s, *htsmutex;
#define HTSMUTEX_INIT NULL

static void hts_mutexinit(htsmutex *m) {
  htsmutex_s *p = (htsmutex_s*)malloc(sizeof(*p));
  if (!p) abort();
  pthread_mutex_init(&p->handle, 0);
  *m = p;
}

static void hts_mutexlock(htsmutex *m) {
  if (*m == HTSMUTEX_INIT) hts_mutexinit(m); /* <-- unsynchronized lazy init */
  pthread_mutex_lock(&(*m)->handle);
}

static void hts_mutexrelease(htsmutex *m) {
  pthread_mutex_unlock(&(*m)->handle);
}

static htsmutex g_lock = HTSMUTEX_INIT;
static pthread_barrier_t g_bar;

static void *worker(void *arg) {
  (void)arg;
  pthread_barrier_wait(&g_bar);
  for (int i = 0; i < 50000; i++) {
    hts_mutexlock(&g_lock);
    if ((i & 0xff) == 0) sched_yield();
    hts_mutexrelease(&g_lock);
  }
  return NULL;
}

int main(void) {
  const int n = 64;
  pthread_t th[n];
  pthread_barrier_init(&g_bar, NULL, (unsigned)n);
  for (int i = 0; i < n; i++) pthread_create(&th[i], NULL, worker, NULL);
  for (int i = 0; i < n; i++) pthread_join(th[i], NULL);
  return 0;
}
```

### Build

```bash
gcc -O2 -pthread poc_mutex_lazy_init_race.c -o /tmp/httrack_mutex_race_poc
```

### Run

```bash
/tmp/httrack_mutex_race_poc
```

On my Linux machine, this is fairly reliable at triggering the `pthread_mutex_lock` assertion abort shown above.

---

## Suspected root cause (source location)

`hts_mutexlock()` on Linux does:

```193:205:targets/httrack/src/htsthread.c
HTSEXT_API void hts_mutexlock(htsmutex * mutex) {
  assertf(mutex != NULL);
  if (*mutex == HTSMUTEX_INIT) {        /* must be initialized */
    hts_mutexinit(mutex);
  }
  assertf(*mutex != NULL);
#ifndef _WIN32
  pthread_mutex_lock(&(*mutex)->handle);
#endif
}
```

The key issue is:

- when `*mutex == NULL`, it allocates/initializes a new `htsmutex_s`
- but this path has no synchronization (no global lock / `pthread_once` / atomic CAS) to ensure the initialization is performed exactly once

So if two threads call `hts_mutexlock(&m)` concurrently when `m == NULL`, it can go like:

1. T1 sees `m == NULL` and starts init
2. T2 also sees `m == NULL` and starts init
3. both allocate different `htsmutex_s` objects and race to store to `m`

This makes mutual exclusion unreliable. In my case it manifests as a crash/assertion abort; in general it may also lead to two threads "locking" different underlying mutexes and entering a critical section concurrently.

---

## Why this seems plausible in normal usage

The API itself explicitly supports `*mutex == NULL` by performing initialization inside `hts_mutexlock()`, so "locking without explicit init" seems to be part of the intended semantics.

`webhttrack` also uses this pattern (static mutex set to `HTSMUTEX_INIT` and then first-use init via `hts_mutexlock()`):

- `src/htsweb.c`: `static htsmutex pingMutex = HTSMUTEX_INIT;`
- background thread `client_ping()` locks `pingMutex`
- main thread locks it as well after each `accept()`
- note: in current `client_ping()` implementation it mostly sleeps while the parent process is alive; it starts calling `getPingId()` (and thus locking `pingMutex`) after it detects the parent is gone, so the overlap is more likely later in runtime (after parent death) rather than during early startup

So this can happen in normal operation (probability depends on scheduling/load). More generally, any `htsmutex` that relies on first-use initialization is affected if its first lock can occur concurrently.

---

## Expected behavior

`hts_mutexlock()` should be safe under concurrent first use:

- the lazy-init path should be thread-safe (initialize exactly once)
- should not crash or break mutual exclusion

---

## Questions

- Is `hts_mutexlock()`'s lazy-init intended to be safe under multi-threading? If yes, then it looks like the init path needs synchronization.
- Is there a recommended way to initialize `htsmutex` objects (e.g. require explicit init in a single-threaded phase)?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Intermittent abort: `hts_mutexlock()` lazy-initialization of `htsmutex` is not thread-safe (race on first lock) #297

Environment

Observed behavior

Minimal reproduction (PoC)

PoC code (inline)

Build

Run

Suspected root cause (source location)

Why this seems plausible in normal usage

Expected behavior

Questions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Intermittent abort: hts_mutexlock() lazy-initialization of htsmutex is not thread-safe (race on first lock) #297

Description

Environment

Observed behavior

Minimal reproduction (PoC)

PoC code (inline)

Build

Run

Suspected root cause (source location)

Why this seems plausible in normal usage

Expected behavior

Questions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Intermittent abort: `hts_mutexlock()` lazy-initialization of `htsmutex` is not thread-safe (race on first lock) #297