Improve mmap retry logic.#3714
Conversation
ff3a507 to
7a761e0
Compare
@sjamesr Sorry that I didn't read the code carefully at the first time, now I read it again and am a little confused: |
can this mmap fail with EAGAIN or EINTR? on which platform? are you trying to fix a real problem? or something theoretical? |
|
honestly speaking, i don't understand why we bother to retry on mmap failure here at all. |
In my memory, we retry 5 times here just to make mmap have more opportunity to succeed. But it may be slow if we retry unconditionally or without checking the errno, so this PR tries to improve it, it mmaps again only when the errno is EINTR or EAGAIN. |
did it actually improve the chance of success?
when mmap fails, i expect it fails immediately.
does mmap return EINTR/EAGAIN? on which platforms? |
yes, the memory may be occupied by other processes or threads temporarily and freed when retrying, e.g., when MAP_FAILED is returned and the errno is: EAGAIN The file has been locked, or too much memory has been locked (see setrlimit(2)).
ENOMEM No memory is available.
ENOMEM The process's maximum number of mappings would have been exceeded.@sjamesr also pointed that the errno can be EINTR which happens in bursts.
Here the PR tries to reduce the retry count so as to return failure more quickly.
I can find errno may be EAGAIN in the linux mmap manual: |
well, other threads free the resource while we are making several system calls? also, with that logic, we should retry many of other operations, including open(), malloc(), etc, shouldn't we? also, if we want to give other threads to free resources, i guess it makes more sense to sleep a bit before making a retry. also, with that logic, isn't this PR a regression because we won't retry on ENOMEM anymore?
is it worth to save a few system calls here?
EAGAIN is about locked memory, right? |
As you know, there may be one or some extra requirements/flags for mmap, e.g., MAP_32BIT, PROT_EXEC, MADV_HUGEPAGE and mmap with 8GB size for linear memory when hw bound check is enabled, per my understanding, it is easier to return failure than other operations, so we let it retry several times. BTW, for memory64, the request size to mmap may be much more larger.
Yes, sleeping a bit before retry seems better.
Yes, I also asked the issue #3714 (comment)
I am not sure, I guess it can save some time.
I am not familiar with the mlock, I guess wamr's mmap isn't used in that situation, but not sure whether there are other processes using it. Here is the answer of "when will mmap return EAGAIN" getting from ChatGPT: |
yes, mmap can fail. MADV_HUGEPAGE might have a system-level limit. i dunno if it ends up with EAGAIN or some other errors.
typically there are system-level limit and process-level limit for locked memory.
sorry, i don't trust chatgpt in general. my points are:
|
|
Yes, thanks for the feedback, I am not so sure whether the retrying will help a lot but I think we had better keep it, since it doesn't hurt much but it would be good if it helps, and as you see, this PR even keeps retrying when the errno is EINTR. And maybe we can apply some enhancements for it. |
|
@yamt The motivation for this change was to work around a very rare test failure we (Google) were observing with wamr on Google's production Linux. I'm not the original author of the change, and I've been unable to reproduce the test failure without this patch. The original author said EINTR can "occur in bursts", I'm still trying to track down in our version of Linux whether/how mmap can return EINTR. |
7a761e0 to
2c3ec3a
Compare
does it involve oom killing? |
| break; | ||
| if (errno == EINTR) | ||
| continue; | ||
| if (errno != EAGAIN) { |
There was a problem hiding this comment.
Could you also retry on ENOMEM and update the comment? According to the discussion, we had better also retry on it: if (error != EAGAIN && error != NOMEM).
2c3ec3a to
835f4f8
Compare
* Only retry on EAGAIN or EINTR. * On EINTR, don't count it against the retry budget. EINTR can happen in bursts. * Log the errno on failure, and don't conditionalize that logging on BH_ENABLE_TRACE_MMAP. In other parts of the code, error logging is not conditional on that define, while turning on that tracing define makes things overly verbose.
835f4f8 to
1db8b55
Compare
Update the comment
Only retry on EAGAIN or EINTR.
On EINTR, don't count it against the retry budget. EINTR can happen in bursts.
Log the errno on failure, and don't conditionalize that logging on BH_ENABLE_TRACE_MMAP. In other parts of the code, error logging is not conditional on that define, while turning on that tracing define makes things overly verbose.