Skip to content

Lock contention when re2:RE2 object is shared between threads #569

@michalsieron

Description

@michalsieron

Is it expected that absl::*::Mutex::(un)lock_shared operations will dominate runtime when a re2::RE2 object is shared among threads?

I noticed this issue when analyzing why switch to RE2 causes significant slowdown in falconindy/pkgfile#72. It's so bad, that doing the work in a single thread was faster than splitting it among threads. perf shows 80-90% of time is spent in the lock operations (falconindy/pkgfile#72 (comment)).

Here is a small reproducer: https://gist.github.com/michalsieron/a076eacb5abe7acc826105844022e270

#include <string>
#include <thread>
#include <vector>

#ifdef USE_PCRE
#include <pcre.h>
std::pair<pcre *, pcre_extra *> prepare_regex(const std::string& pattern) {
    const char *err;
    int offset;

    auto re = pcre_compile(pattern.c_str(), 0, &err, &offset, nullptr);
    if (re == nullptr) std::abort();
    auto re_extra = pcre_study(re, PCRE_STUDY_JIT_COMPILE, &err);
    if (err) std::abort();
    return std::pair<pcre *, pcre_extra *>(re, re_extra);
}
#else
#include <re2/re2.h>
std::unique_ptr<re2::RE2> prepare_regex(const std::string& pattern) {
    auto re = std::make_unique<re2::RE2>(pattern);
    if (!re->ok()) std::abort();
    return re;
}
#endif

int main() {
    std::string line = "example string";
    std::string pattern = "i won't match";

#ifdef SHARED
    auto re = prepare_regex(pattern);
#endif

    const auto MAX_REPS = 1'000'000;
    const auto num_workers = std::min<int>(std::thread::hardware_concurrency(), 64);
    std::vector<std::thread> workers;

    for (int i = 0; i < num_workers; i++) {
        workers.push_back(std::thread([&] {
#ifndef SHARED
            auto re = prepare_regex(pattern);
#endif
            for (int rep = 0; rep < MAX_REPS; rep++)
#ifdef USE_PCRE
                pcre_exec(re.first, re.second, line.c_str(), line.size(), 0, PCRE_NO_UTF16_CHECK, nullptr, 0);
#else
                re2::RE2::PartialMatch(line, *re);
#endif
        }));
    }

    for (auto& worker : workers)
        worker.join();
}

It supports either RE2 or PCRE (pass -DUSE_PCRE) and will either share (-DSHARED) or not the regex object.
Compile with g++ repro.cpp -O2 [-DSHARED] {-DUSE_PCRE -lpcre|-lre2}
Here are my results from testing with hyperfine:

Command Mean [ms] Min [ms] Max [ms] Relative
build/re2-separate 173.2 ± 12.2 163.8 217.5 1.63 ± 0.15
build/re2-shared 966.8 ± 116.4 861.5 1170.4 9.10 ± 1.21
build/pcre-shared 123.6 ± 18.1 104.2 170.8 1.16 ± 0.18
build/pcre-separate 106.3 ± 6.0 96.1 119.2 1.00

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions