arch/arm64: Fix clock drift from tick timer #13984

Fix-Point · 2024-10-09T09:12:31Z

Summary

In the previous implementation, when the current timer count just exceeds the next tick time, it will still cause clock drift. This commit fixed clock drift by using the abosolute ticks (clock_systime_tick() + ticks).

Impact

Fixed timing errors on ARMv8A platform.

Testing

Tested on QEMU/armv8a with the test case in https://github.com/apache/nuttx/pull/13674.

xiaoxiang781216 · 2024-10-09T14:11:42Z

@jlaitine could you review and try this patch?

jlaitine

I don't understand this patch, it is just putting back the issues which were just fixed. Is there some problem which this is trying to solve?

jlaitine · 2024-10-09T15:10:55Z

sched/semaphore/sem_tickwait.c

 int nxsem_tickwait_uninterruptible(FAR sem_t *sem, uint32_t delay)
 {
-  clock_t end = clock_systime_ticks() + delay + 1;
+  clock_t end = clock_systime_ticks() + delay;


This has nothing to do with timer drift; this change adds back the bug where tickwait_uninterruptible may wake up one tick too early in case it receives a signal.

arch/arm64/src/common/arm64_arch_timer.c

jlaitine · 2024-10-09T15:29:38Z

Maybe post details of the issue and some test to re-produce it?

jlaitine

I think this just breaks things more

Fix-Point · 2024-10-10T01:41:17Z

I am using this test case to reproduce the bug.

#include <nuttx/nuttx.h>
#include <nuttx/config.h>
#include <stdio.h>

#define read_sysreg(reg)                         \
  ({                                             \
    uint64_t __val;                              \
    __asm__ volatile ("mrs %0, " STRINGIFY(reg)  \
                    : "=r" (__val) :: "memory"); \
    __val;                                       \
  })

static inline uint64_t arm64_arch_timer_count(void)
{
  return read_sysreg(cntvct_el0);
}

static inline uint64_t arm64_arch_timer_get_cntfrq(void)
{
  return read_sysreg(cntfrq_el0);
}

static inline uint64_t time_us(void)
{
  return arm64_arch_timer_count() * 1000000 / arm64_arch_timer_get_cntfrq();
}

inline uint64_t minutes_in_ticks(uint64_t mins)
{
  return mins * 60 * 1000000 / CONFIG_USEC_PER_TICK;
}

int main(int argc, FAR char *argv[])
{
  sem_t sem;
  uint64_t t1;
  uint64_t t2;
  sem_init(&sem, 0, 0);

  for (int64_t i = 0; i < minutes_in_ticks(10); i++)
    {
      t1 = time_us();
      nxsem_tickwait(&sem, 1);
      t2 = time_us();

      if (i % 100 == 0)
        {
          printf("%lu %lu %lu\n",t1,t2,t2-t1);
        }

      if (t2 - t1 < CONFIG_USEC_PER_TICK)
        {
          printf("ERROR: slept %lu us instead of minimum %u\n", t2-t1, CONFIG_USEC_PER_TICK);
        }

      assert(t2-t1 > CONFIG_USEC_PER_TICK); 
    }

  return 0;
}

The problem that tickwait_uninterruptible may wake up one tick too early is caused by the clock drift. I tried the following scenarios:

Without this patch and delay + 1, bug reproduced.
With this patch, bug fixed.
Without Arm64 tick timer fixes #13674, bug reproduced.
Without Arm64 tick timer fixes #13674 but with this patch, bug fixed.

The previous method of aligning down to ticks has a problem. If the current count obtained when setting the timer is aligned down and is 1 larger than the current system tick, it will still cause clock drift. The key is not to use the count obtained when setting the timer as a basis for delay, instead we should use current system tick.

jlaitine · 2024-10-10T04:02:22Z

What is the +1 bug you are referring to? Are you getting an error print and assertion with the test case above?
Re-starting the timer after systick has already passed means that servicing the tick interrupt took longer than the tick. This would be fatal in many ways, but I agree that the case is not handled in tick timer. However, the solution is not to add drift by reading the current time in the isr and adding the tick time to that. This would just cause clock drift; as I explained earlier.

Please also post the output of the above test case (the time) when it fails, to help me understand the issue better. I'll also try to reproduce it again.

I'll have a closer look later today!

jlaitine · 2024-10-10T07:01:10Z

Now I am able to re-produce the issue you are reporting in qemu, but not in real HW.

The issue in qemu is, that sometimes servicing the timer interrupt is delayed until close to the end of the tick, or past it. This issue must be related to real-time behaviour of qemu.

The same would happen also in a real HW, in case of interrupt storm or in case of very long critical section, delaying the timer interrupt.

There is no way around that issue! In case servicing the timer interrupt is late, it just MUST proceed several ticks at once to keep up. The same would happen if you e.g. break the target with a debugger, and the timer is freely running in the background.

Sorry to say, but I believe this patch is not right. As I explained before, every time you set the timer comparator, you must set the next value to be exactly the previous_comparator_time + tick_time. So you need to know the previous time, which was set somehow.

If you do as suggested in this patch, setting it to current_time() + tick time, you add a delay of
deltaT = current_time() - previous_time;
to each tick, causing timer drift.

EDIT: Removed the comment about the race condition, I believe that is not an issue in arm64

jlaitine · 2024-10-10T07:40:13Z

A summary: I don't think that the tick timer can be reliably tested in qemu at all. The timer counting vs. interrupt latency is nowhere close to real-time or even predictable. You can make it fail fast by just shortening the tick time, and make it more reliable by lengthening it, allowing more time for qemu to handle the timer interrupt.

In the previous implementation, when the current timer count just exceeds the next tick time, it will still cause clock drift. This commit fixed clock drift by using the abosolute ticks (clock_systime_tick() + ticks). Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>

Fix-Point · 2024-10-10T09:28:59Z

Thank you for the comments. I am trying to make another patch to fix this problem.

github-actions bot added Arch: arm64 Issues related to ARM64 (64-bit) architecture Area: OS Components OS Components issues Size: XS The size of the change in this PR is very small labels Oct 9, 2024

xiaoxiang781216 approved these changes Oct 9, 2024

View reviewed changes

jlaitine reviewed Oct 9, 2024

View reviewed changes

jlaitine suggested changes Oct 9, 2024

View reviewed changes

Fix-Point force-pushed the mybran17 branch from 33f81ad to 99debc1 Compare October 10, 2024 08:31

github-actions bot added Size: S The size of the change in this PR is small and removed Size: XS The size of the change in this PR is very small labels Oct 10, 2024

Fix-Point force-pushed the mybran17 branch from 99debc1 to 8b9a8b9 Compare October 10, 2024 08:34

Fix-Point closed this Oct 10, 2024

Fix-Point deleted the mybran17 branch October 15, 2024 02:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

arch/arm64: Fix clock drift from tick timer #13984

arch/arm64: Fix clock drift from tick timer #13984

Uh oh!

Fix-Point commented Oct 9, 2024

Uh oh!

xiaoxiang781216 commented Oct 9, 2024

Uh oh!

jlaitine left a comment

Uh oh!

jlaitine Oct 9, 2024

Uh oh!

Uh oh!

jlaitine commented Oct 9, 2024

Uh oh!

jlaitine left a comment

Uh oh!

Fix-Point commented Oct 10, 2024

Uh oh!

jlaitine commented Oct 10, 2024 •

edited

Loading

Uh oh!

jlaitine commented Oct 10, 2024 •

edited

Loading

Uh oh!

jlaitine commented Oct 10, 2024 •

edited

Loading

Uh oh!

Fix-Point commented Oct 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

arch/arm64: Fix clock drift from tick timer #13984

arch/arm64: Fix clock drift from tick timer #13984

Uh oh!

Conversation

Fix-Point commented Oct 9, 2024

Summary

Impact

Testing

Uh oh!

xiaoxiang781216 commented Oct 9, 2024

Uh oh!

jlaitine left a comment

Choose a reason for hiding this comment

Uh oh!

jlaitine Oct 9, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jlaitine commented Oct 9, 2024

Uh oh!

jlaitine left a comment

Choose a reason for hiding this comment

Uh oh!

Fix-Point commented Oct 10, 2024

Uh oh!

jlaitine commented Oct 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jlaitine commented Oct 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jlaitine commented Oct 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Fix-Point commented Oct 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jlaitine commented Oct 10, 2024 •

edited

Loading

jlaitine commented Oct 10, 2024 •

edited

Loading

jlaitine commented Oct 10, 2024 •

edited

Loading