-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-15062: [C++] Add memory information to current spans #12702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-15062: [C++] Add memory information to current spans #12702
Conversation
|
|
de15410 to
c0431e0
Compare
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please return int64_t instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unfortunately Linux-specific, we'll need a portable implementation (is there a third-party library that we can use to make our life easier?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a portable implementation as part of a PR here: https://github.com/apache/arrow/pull/11426/files#diff-5d7d9a549780da2ec4baba08a43db3e1d524b70d6e8ae4f29cc8bfe831357f9cR178-R199
We didn't end up pursuing the PR but you might be able to borrow some from that particular function.
That being said, I wonder if it would be more valuable to report default_memory_pool()->bytes_allocated(). sysinfo is going to give us statistics on the entire server and freeram is sometimes misleading (e.g. if there are a lot of dirty pages then freeram could be high while at the same time we might be thrashing swap).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@westonpace I get the bytes from the memory pool in the first span, but in the next ones I extracted the statistics from the sysinfo for report the RSS, I looked the PR which you mentioned, and there the method return just the FreeRam and for calculate the Current Ram in used I still need to use sysinfo.total_memory, but after that PR get merged I can add another function for get that, and don't use system includes here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, I see now that you are grabbing all three numbers. GetMemoryUsed is less interesting than GetMemoryUsedByProcess in my opinion but I don't think it would hurt to report it.
but after that PR get merged I can add another function for get that, and don't use system includes here.
That PR that I linked to will not be merged. We ended up pursuing a different approach. So we don't want to wait on it. However, you're welcome to copy the approach that was being taken for this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure it's not 1024, rather?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- You'll need to check for errors here; 2) would it be easier to use
std::fstream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this is probably Linux-specific...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We try to write C++ code not C code, so this function should take a util::string_view IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which one? The one returned by the function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value obtained from the system. This function was re-implemented and documented in the io_utils.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These functions should go into arrow/util/io_util.{h,cc} instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These calls are potentially costly since they will be issueing system calls (and perhaps even read a file). I'm not sure this is desirable. @lidavidm What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this seems it will be quite expensive. Do we need information that's this fine-grained?
We could cache the values, and only update them if they're out of date. Or we could actually record the memory pool's statistics instead. Also, this information isn't really thread- or span- specific. Maybe it could be done by a background thread on a fixed schedule, especially if it's process memory and not memory pool allocations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Background thread on a fixed schedule was the original ask but I think there was some concern around having two streams of telemetry information. As a start I think only using memory_pool_bytes would be sufficient. Longer term it could be handy to have some idea of the RSS of the process just to have some sense for how we are doing on fragmentation and non-pool allocations. Since a system call is expensive I wonder if there would be a way to debounce this information and only report it at most every 10ms or on some configurable threshold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@westonpace I can create the background thread for report the system RAM, but where can I place it? inside the memory pool constructor? and using the same exporter or creating a new one just for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, I am not asking for a background thread in that comment. It is ok either way. I have no preference either way. I only mentioned the background thread in the comment to explain the context.
In both cases (background thread or span processor) I now think we want to move the initialization into the query testing tool (which unfortunately, is not yet merged). This means the feature will not be enabled for library users but I think that is ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should certainly avoid repeating this everywhere, and instead find a way to factor out those common span attributes (perhaps using a dedicated macro?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be done by a span processor, though that has a couple caveats:
- Processors are configured as part of the tracing setup, by the end user application (generally not by a library),
- IIRC, processors may not run immediately (one of the processors batches spans and processes them on a background thread, I think)
I think some span recorders can also be configured to attach this sort of information as well? (Though again, the timestamps won't line up anymore.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that these memory statistics are process-wide anyways maybe it makes sense to only enable collection of memory statistics in our own profiling tools. For example, when running the query tester we can add a span processor that collects memory statistics.
Then the actual library code would only instrument things specific to the library.
That only leaves the batching concern but, glancing at the description, and knowing nothing of the OT spec, I think the span processor itself might batch its output but its input (the span start/stop calls) won't be batched.
c0431e0 to
2d5e1fd
Compare
7632150 to
002d369
Compare
002d369 to
4cbb7fe
Compare
westonpace
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for writing detailed methods for fetching the RSS. That will be very useful. It doesn't look like we are using that yet. Perhaps the plan is to wait and integrate it in later into the query tester? I think it is ok but can you create a follow-up PR for the feature that would use this?
Also, we will need test cases for GetCurrentRSS before this merges. I don't think it needs to be very complicated (e.g. we don't need to verify, in a unit test, that the RSS is exactly correct, only that it is non-zero and doesn't crash).
cpp/src/arrow/util/io_util.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Can we merge these two ifdef sections?
cpp/src/arrow/util/io_util.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just use a dummy implementation in this case that returns 0 rather than prevent compilation? Actually, it looks like that might be what you do already so can you just remove this error?
cpp/src/arrow/util/io_util.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this branch actually get used in the implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was unused and was removed.
cpp/src/arrow/util/io_util.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /// \brief Get the current memory used by current process in bytes | |
| /// | |
| /// This function support Windows and Linux | |
| /// \brief Get the current memory used by the current process in bytes | |
| /// | |
| /// This function supports Windows, Linux, and Mac and will return 0 otherwise |
r/src/arrowExports.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure these changes are related. Maybe a rebase will get them to go away?
cpp/src/arrow/util/io_util.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general we try to avoid directly copy/pasting chunks of code from SO for licensing reasons but I don't know that we ever got an authoritative answer for this. Either way, I think things are paraphrased enough by this point it isn't a concern.
cpp/src/arrow/util/io_util.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use #if __APPLE__ quite a bit in the code base. However, we never use __MACH__. Do you know what this is adding that __APPLE__ doesn't already specify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://sourceforge.net/p/predef/wiki/OperatingSystems/ here it uses both conditions, but it seems to be ok if I just let
__APPLE__
cpp/src/arrow/util/io_util.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this include is used anywhere.
cpp/src/arrow/util/io_util.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fclose returns a return code. If we want to ignore it (might be ok here, not sure in what situations that close would fail and what we would even do about it) then it might be best to wrap in ARROW_UNUSED so it is explicit or add a comment.
cpp/src/arrow/util/io_util.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return (int64_t)info.WorkingSetSize; | |
| return static_cast<int64_t>(info.WorkingSetSize); |
For consistency with the rest of the codebase here (and elsewhere in this file) we should prefer C++ style casts:
7d06d7b to
b6f507a
Compare
cpp/src/arrow/util/io_util.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we log the error using e.g. ARROW_LOG(WARNING)?
cpp/src/arrow/util/io_util.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here: log the error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two things: 1) can we use the C++ includes where available (e.g. <cstdio> instead of <stdio.h>)? 2) can we keep the system includes alphabetically ordered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why were these includes added to this particular file anyways? Is this a case where we weren't following IWYU and you are making it more clear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adding this by hand everywhere, can we define for example a START_COMPUTE_SPAN macro that does it automatically?
5c35cd1 to
e85c765
Compare
e85c765 to
433202a
Compare
cpp/src/arrow/util/io_util.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, __linux__ is sufficient.
cpp/src/arrow/util/io_util.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplify this code a bit, also the ifstream destructor should close the file for you:
| if (fp.is_open()) { | |
| fp >> rss; | |
| } else { | |
| ARROW_LOG(WARNING) << "Can't resolve RSS value from /proc/self/statm"; | |
| return static_cast<int64_t>(0L); | |
| } | |
| if (fp.is_open()) fp.close(); | |
| return static_cast<int64_t>(rss) * static_cast<int64_t>(sysconf(_SC_PAGESIZE)); | |
| if (fp) { | |
| fp >> rss; | |
| return rss * sysconf(_SC_PAGESIZE); | |
| } else { | |
| ARROW_LOG(WARNING) << "Can't resolve RSS value from /proc/self/statm"; | |
| return 0; | |
| } |
cpp/src/arrow/util/io_util.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: no need for an explicit cast.
| return static_cast<int64_t>(0L); // Unsupported. | |
| return 0; // Unsupported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AlvinJ15 there's still unaddressed feedback, can you make sure everything is addressed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lidavidm now was solved.
cpp/src/arrow/util/io_util.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: no need for an explicit cast
| return static_cast<int64_t>(0L); | |
| return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how those work in OpenTelemetry, but this seems to close the scope at the end of the macro, is this right? @lidavidm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we shouldn't create the OT scope in a C++ scope here, because the OT scope object controls how long a span is active.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbrobbel i put the scopes because auto opentelemetry_scope##__LINE__ when i use two START_SPAN it gives and conflict error in the variable name declaration, the opentelemetry_scope seems to be unused and when I put the scope for avoid the variable name declaration conflict seems to work fine https://paste.ofcode.org/gGUZ3Kk9H9HuyeLdJCpCX7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scope is an RAII guard, you wouldn't generally actually use it. But effectively it makes no sense to put a scope inside a C++ block by itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand why this starting two spans (or starting the same span twice)? @lidavidm Does this look right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this seems to create two spans, and immediately closes the first one. I don't think this is what we want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see now. START_SPAN is just activating the span. I think what you want is a macro to attach a new attribute to the span, not repeating the START_SPAN macro.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried, to attach the new attribute to the current _VA_ARGS but there's no solution on how to remove the curly brackets {} and insert the new attribute and then pass it to the START_SPAN, I tried even create a std::vector<> insert the new attribute and convert it to std::initializer_list or create the OT::nostd::span<> manually but nothing worked
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just call span->SetAttribute. Look at the header file or docs. https://github.com/open-telemetry/opentelemetry-cpp/blob/main/api/include/opentelemetry/trace/span.h#L46-L50
|
@AlvinJ15 can you demonstrate the kind of output this gives? |
@lidavidm this is the output https://paste.ofcode.org/gGUZ3Kk9H9HuyeLdJCpCX7 |
433202a to
dd7a099
Compare
lidavidm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Mostly I'm just wondering if we can limit the number of macros we need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this define for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to rearrange all these macros, right? You can keep the original START_SPAN macros and just have START_COMPUTE_SPAN be START_SPAN + a call to SetAttribute. Spans are mutable, there's no requirement that you fully configure them before using them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly I don't want to have a whole bunch of macros for little things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or actually, maybe just have START_SPAN and CONFIGURE_COMPUTE_SPAN or something. That way we have composable macros instead of having to define duplicate macros for every scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created those macros just for don't repeat the same code twice, but yes I can delete them and copy those lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow, these macros have repeated code right now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, but should we perhaps namespace this attribute name? "arrow.memory_pool_bytes" or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note how the official ones are semi-namespaced: https://github.com/open-telemetry/opentelemetry-cpp/blob/4f5505f9cb45326299067cfbd53db21198970aa8/sdk/include/opentelemetry/sdk/resource/experimental_semantic_conventions.h
69ba070 to
5852ec8
Compare
westonpace
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sticking with this. The XYZ_COMPUTE_SPAN macros will help a lot.
cpp/src/arrow/util/io_util.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| #elif defined(__linux__) || defined(__linux) || defined(linux) || defined(__gnu_linux__) | |
| #elif defined(__linux__) |
cpp/src/arrow/util/io_util_test.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| #elif defined(__linux__) || defined(__linux) || defined(linux) || defined(__gnu_linux__) | |
| #elif defined(__linux__) |
lidavidm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine to me, though if we really cared about duplicating code, we can just have
#define SETUP_COMPUTE_SPAN(span) span->SetAttribute(...);
...
START_SPAN(span);
SETUP_COMPUTE_SPAN(span);at the very least, we should have
#define START_COMPUTE_SPAN(span, ...) \
START_SPAN(span, __VA_ARGS__); \
span->SetAttribute(...);5852ec8 to
a88df9b
Compare
…memory_pool and RSS
… will be implemented
a88df9b to
0ffe46c
Compare
r/src/arrowExports.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to commit this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rebased.
ae06417 to
05148fa
Compare
05148fa to
968716a
Compare
|
@pitrou this could be merged? |
|
@AlvinJ15 Yes, definitely. Thank you for being persistent! |
|
Benchmark runs are scheduled for baseline = 1cd8a85 and contender = 6a7ee37. 6a7ee37 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Add memory information to the current spans, bytes in memory_pool and RSS