-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Description
Is it expected that absl::*::Mutex::(un)lock_shared operations will dominate runtime when a re2::RE2 object is shared among threads?
I noticed this issue when analyzing why switch to RE2 causes significant slowdown in falconindy/pkgfile#72. It's so bad, that doing the work in a single thread was faster than splitting it among threads. perf shows 80-90% of time is spent in the lock operations (falconindy/pkgfile#72 (comment)).
Here is a small reproducer: https://gist.github.com/michalsieron/a076eacb5abe7acc826105844022e270
#include <string>
#include <thread>
#include <vector>
#ifdef USE_PCRE
#include <pcre.h>
std::pair<pcre *, pcre_extra *> prepare_regex(const std::string& pattern) {
const char *err;
int offset;
auto re = pcre_compile(pattern.c_str(), 0, &err, &offset, nullptr);
if (re == nullptr) std::abort();
auto re_extra = pcre_study(re, PCRE_STUDY_JIT_COMPILE, &err);
if (err) std::abort();
return std::pair<pcre *, pcre_extra *>(re, re_extra);
}
#else
#include <re2/re2.h>
std::unique_ptr<re2::RE2> prepare_regex(const std::string& pattern) {
auto re = std::make_unique<re2::RE2>(pattern);
if (!re->ok()) std::abort();
return re;
}
#endif
int main() {
std::string line = "example string";
std::string pattern = "i won't match";
#ifdef SHARED
auto re = prepare_regex(pattern);
#endif
const auto MAX_REPS = 1'000'000;
const auto num_workers = std::min<int>(std::thread::hardware_concurrency(), 64);
std::vector<std::thread> workers;
for (int i = 0; i < num_workers; i++) {
workers.push_back(std::thread([&] {
#ifndef SHARED
auto re = prepare_regex(pattern);
#endif
for (int rep = 0; rep < MAX_REPS; rep++)
#ifdef USE_PCRE
pcre_exec(re.first, re.second, line.c_str(), line.size(), 0, PCRE_NO_UTF16_CHECK, nullptr, 0);
#else
re2::RE2::PartialMatch(line, *re);
#endif
}));
}
for (auto& worker : workers)
worker.join();
}It supports either RE2 or PCRE (pass -DUSE_PCRE) and will either share (-DSHARED) or not the regex object.
Compile with g++ repro.cpp -O2 [-DSHARED] {-DUSE_PCRE -lpcre|-lre2}
Here are my results from testing with hyperfine:
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
build/re2-separate |
173.2 ± 12.2 | 163.8 | 217.5 | 1.63 ± 0.15 |
build/re2-shared |
966.8 ± 116.4 | 861.5 | 1170.4 | 9.10 ± 1.21 |
build/pcre-shared |
123.6 ± 18.1 | 104.2 | 170.8 | 1.16 ± 0.18 |
build/pcre-separate |
106.3 ± 6.0 | 96.1 | 119.2 | 1.00 |
dbermond
Metadata
Metadata
Assignees
Labels
No labels