Improve channel lookup and logging#2048
Conversation
Fetch the channel count at the time it changes, and propagate the value to the logging, instead of having the log process fetch the value. This improves the log accuracy when multiple channels connect at the same time.
| quint32 otherAddr = other.InetAddr.toIPv4Address(); | ||
|
|
||
| return (int) thisAddr - (int) otherAddr; | ||
| return thisAddr < otherAddr ? -1 : thisAddr > otherAddr ? 1 : 0; |
There was a problem hiding this comment.
I saw that in IPv6 branch uses memcmp, I suppose that can be used as well for consistency?
There was a problem hiding this comment.
That would probably make more logical sense.
There was a problem hiding this comment.
IPv6 uses memcmp() because the addresses are 16 bytes long. IPv4 addresses can be compared whole as they fit a 32-bit integer, so that seems more efficient than memcmp() which would be comparing a byte at a time (I assume).
There was a problem hiding this comment.
I'd expect memcmp to be optimised for the platform its running on and the data sizes being compared.
There was a problem hiding this comment.
Here’s some quick and dirty benchmark. I am using Google Benchmark library (apt install libbenchmark-dev):
#include <benchmark/benchmark.h>
#include <string.h>
#include <stdlib.h>
static void BM_ManualComparison(benchmark::State &state)
{
srand(1);
for (auto _ : state)
{
state.PauseTiming();
unsigned int a = rand();
unsigned int b = rand();
state.ResumeTiming();
benchmark::DoNotOptimize(a < b ? -1 : a > b ? 1 : 0);
}
}
BENCHMARK(BM_ManualComparison);
static void BM_Memcmp(benchmark::State &state)
{
srand(1);
for (auto _ : state)
{
state.PauseTiming();
unsigned int a = rand();
unsigned int b = rand();
state.ResumeTiming();
benchmark::DoNotOptimize(memcmp(&a, &b, sizeof(unsigned int)));
}
}
BENCHMARK(BM_Memcmp);
BENCHMARK_MAIN();Running benchmark with -O3 resulted in roughly the same amount of CPU time per iteration. Your results may vary:
$ g++ -lbenchmark bench.cpp -O3 && ./a.out
Running ./a.out
Run on (1 X 2900 MHz CPU )
CPU Caches:
L1 Data 64K (x1)
L1 Instruction 64K (x1)
L2 Unified 512K (x1)
L3 Unified 16384K (x1)
***WARNING*** Library was built as DEBUG. Timings may be affected.
-----------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------
BM_ManualComparison 329 ns 325 ns 2138302
BM_Memcmp 326 ns 325 ns 2130022There was a problem hiding this comment.
That's certainly an interesting result! :)
On a little-endian architecture such as Intel, the comparison by memcmp() must necessarily give different results than a direct comparison of a pair of int or unsigned int variables. In this particular application that doesn't actually matter, as the order is immaterial, provided it is consistent.
Personally, in this instance, I prefer the comparison of unsigned int directly (as in the current PR) rather than memcmp(), but not strongly enough to argue about it. If the consensus is to prefer memcmp(), then fair enough.
I repeated the test on my RPi4:
tony@pi:~/benchmark $ g++ -lbenchmark bench.cpp -O3 && ./a.out
2021-09-28 17:54:48
Running ./a.out
Run on (4 X 1500 MHz CPU s)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
***WARNING*** Library was built as DEBUG. Timings may be affected.
-----------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------
BM_ManualComparison 2458 ns 2432 ns 285846
BM_Memcmp 2443 ns 2425 ns 289761
tony@pi:~/benchmark $ ./a.out
2021-09-28 17:55:26
Running ./a.out
Run on (4 X 1500 MHz CPU s)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
***WARNING*** Library was built as DEBUG. Timings may be affected.
-----------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------
BM_ManualComparison 2464 ns 2434 ns 288050
BM_Memcmp 2487 ns 2463 ns 286765So again there is not much to choose, and the two runs were different in the opposite way!
There was a problem hiding this comment.
Personally, I am fine with either way; I don't have a strong opinion — just pointing out that using memcmp will make both IPv4 and IPv6 code more consistent with negligible performance penalty, although consistency in this specific case may not be so important to warrant a change.
(By the way I am already deploying the code with patches from this PR, as it is right now, to my servers for testing tomorrow.)
Short description of changes
In
CHostAddress::Compare(), does an explicit comparison of IPv4 addresses instead of just subtracting them. This avoids a potential signed overflow inconsistency.Also improves the logging of connections by fetching the channel count at the time of connection and passing it through to the logging, instead of fetching it later in the logging thread.
Context: Fixes an issue?
May fix #2046
Does this change need documentation? What needs to be documented and how?
No, code improvement only
Status of this Pull Request
Working, but needs testing.
What is missing until this pull request can be merged?
Checklist