-
Notifications
You must be signed in to change notification settings - Fork 1.5k
arch/arm64: Fix clock drift from tick timer #13984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@jlaitine could you review and try this patch? |
jlaitine
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this patch, it is just putting back the issues which were just fixed. Is there some problem which this is trying to solve?
| int nxsem_tickwait_uninterruptible(FAR sem_t *sem, uint32_t delay) | ||
| { | ||
| clock_t end = clock_systime_ticks() + delay + 1; | ||
| clock_t end = clock_systime_ticks() + delay; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has nothing to do with timer drift; this change adds back the bug where tickwait_uninterruptible may wake up one tick too early in case it receives a signal.
|
Maybe post details of the issue and some test to re-produce it? |
jlaitine
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this just breaks things more
|
I am using this test case to reproduce the bug. #include <nuttx/nuttx.h>
#include <nuttx/config.h>
#include <stdio.h>
#define read_sysreg(reg) \
({ \
uint64_t __val; \
__asm__ volatile ("mrs %0, " STRINGIFY(reg) \
: "=r" (__val) :: "memory"); \
__val; \
})
static inline uint64_t arm64_arch_timer_count(void)
{
return read_sysreg(cntvct_el0);
}
static inline uint64_t arm64_arch_timer_get_cntfrq(void)
{
return read_sysreg(cntfrq_el0);
}
static inline uint64_t time_us(void)
{
return arm64_arch_timer_count() * 1000000 / arm64_arch_timer_get_cntfrq();
}
inline uint64_t minutes_in_ticks(uint64_t mins)
{
return mins * 60 * 1000000 / CONFIG_USEC_PER_TICK;
}
int main(int argc, FAR char *argv[])
{
sem_t sem;
uint64_t t1;
uint64_t t2;
sem_init(&sem, 0, 0);
for (int64_t i = 0; i < minutes_in_ticks(10); i++)
{
t1 = time_us();
nxsem_tickwait(&sem, 1);
t2 = time_us();
if (i % 100 == 0)
{
printf("%lu %lu %lu\n",t1,t2,t2-t1);
}
if (t2 - t1 < CONFIG_USEC_PER_TICK)
{
printf("ERROR: slept %lu us instead of minimum %u\n", t2-t1, CONFIG_USEC_PER_TICK);
}
assert(t2-t1 > CONFIG_USEC_PER_TICK);
}
return 0;
}
|
|
What is the +1 bug you are referring to? Are you getting an error print and assertion with the test case above? Please also post the output of the above test case (the time) when it fails, to help me understand the issue better. I'll also try to reproduce it again. I'll have a closer look later today! |
|
Now I am able to re-produce the issue you are reporting in qemu, but not in real HW. The issue in qemu is, that sometimes servicing the timer interrupt is delayed until close to the end of the tick, or past it. This issue must be related to real-time behaviour of qemu. The same would happen also in a real HW, in case of interrupt storm or in case of very long critical section, delaying the timer interrupt. There is no way around that issue! In case servicing the timer interrupt is late, it just MUST proceed several ticks at once to keep up. The same would happen if you e.g. break the target with a debugger, and the timer is freely running in the background. Sorry to say, but I believe this patch is not right. As I explained before, every time you set the timer comparator, you must set the next value to be exactly the previous_comparator_time + tick_time. So you need to know the previous time, which was set somehow. If you do as suggested in this patch, setting it to current_time() + tick time, you add a delay of EDIT: Removed the comment about the race condition, I believe that is not an issue in arm64 |
|
A summary: I don't think that the tick timer can be reliably tested in qemu at all. The timer counting vs. interrupt latency is nowhere close to real-time or even predictable. You can make it fail fast by just shortening the tick time, and make it more reliable by lengthening it, allowing more time for qemu to handle the timer interrupt. |
In the previous implementation, when the current timer count just exceeds the next tick time, it will still cause clock drift. This commit fixed clock drift by using the abosolute ticks (clock_systime_tick() + ticks). Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>
|
Thank you for the comments. I am trying to make another patch to fix this problem. |
Summary
In the previous implementation, when the current timer count just exceeds the next tick time, it will still cause clock drift. This commit fixed clock drift by using the abosolute ticks (clock_systime_tick() + ticks).
Impact
Fixed timing errors on ARMv8A platform.
Testing
Tested on QEMU/armv8a with the test case in
https://github.com/apache/nuttx/pull/13674.