Skip to content

Conversation

@Fix-Point
Copy link
Contributor

Summary

In the previous implementation, when the current timer count just exceeds the next tick time, it will still cause clock drift. This commit fixed clock drift by using the abosolute ticks (clock_systime_tick() + ticks).

Impact

Fixed timing errors on ARMv8A platform.

Testing

Tested on QEMU/armv8a with the test case in https://github.com/apache/nuttx/pull/13674.

@github-actions github-actions bot added Arch: arm64 Issues related to ARM64 (64-bit) architecture Area: OS Components OS Components issues Size: XS The size of the change in this PR is very small labels Oct 9, 2024
@xiaoxiang781216
Copy link
Contributor

@jlaitine could you review and try this patch?

Copy link
Contributor

@jlaitine jlaitine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this patch, it is just putting back the issues which were just fixed. Is there some problem which this is trying to solve?

int nxsem_tickwait_uninterruptible(FAR sem_t *sem, uint32_t delay)
{
clock_t end = clock_systime_ticks() + delay + 1;
clock_t end = clock_systime_ticks() + delay;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has nothing to do with timer drift; this change adds back the bug where tickwait_uninterruptible may wake up one tick too early in case it receives a signal.

@jlaitine
Copy link
Contributor

jlaitine commented Oct 9, 2024

Maybe post details of the issue and some test to re-produce it?

Copy link
Contributor

@jlaitine jlaitine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this just breaks things more

@Fix-Point
Copy link
Contributor Author

I am using this test case to reproduce the bug.

#include <nuttx/nuttx.h>
#include <nuttx/config.h>
#include <stdio.h>

#define read_sysreg(reg)                         \
  ({                                             \
    uint64_t __val;                              \
    __asm__ volatile ("mrs %0, " STRINGIFY(reg)  \
                    : "=r" (__val) :: "memory"); \
    __val;                                       \
  })

static inline uint64_t arm64_arch_timer_count(void)
{
  return read_sysreg(cntvct_el0);
}

static inline uint64_t arm64_arch_timer_get_cntfrq(void)
{
  return read_sysreg(cntfrq_el0);
}

static inline uint64_t time_us(void)
{
  return arm64_arch_timer_count() * 1000000 / arm64_arch_timer_get_cntfrq();
}

inline uint64_t minutes_in_ticks(uint64_t mins)
{
  return mins * 60 * 1000000 / CONFIG_USEC_PER_TICK;
}

int main(int argc, FAR char *argv[])
{
  sem_t sem;
  uint64_t t1;
  uint64_t t2;
  sem_init(&sem, 0, 0);

  for (int64_t i = 0; i < minutes_in_ticks(10); i++)
    {
      t1 = time_us();
      nxsem_tickwait(&sem, 1);
      t2 = time_us();

      if (i % 100 == 0)
        {
          printf("%lu %lu %lu\n",t1,t2,t2-t1);
        }

      if (t2 - t1 < CONFIG_USEC_PER_TICK)
        {
          printf("ERROR: slept %lu us instead of minimum %u\n", t2-t1, CONFIG_USEC_PER_TICK);
        }

      assert(t2-t1 > CONFIG_USEC_PER_TICK); 
    }

  return 0;
}
  1. The problem that tickwait_uninterruptible may wake up one tick too early is caused by the clock drift. I tried the following scenarios:
  1. The previous method of aligning down to ticks has a problem. If the current count obtained when setting the timer is aligned down and is 1 larger than the current system tick, it will still cause clock drift. The key is not to use the count obtained when setting the timer as a basis for delay, instead we should use current system tick.

@jlaitine
Copy link
Contributor

jlaitine commented Oct 10, 2024

What is the +1 bug you are referring to? Are you getting an error print and assertion with the test case above?
Re-starting the timer after systick has already passed means that servicing the tick interrupt took longer than the tick. This would be fatal in many ways, but I agree that the case is not handled in tick timer. However, the solution is not to add drift by reading the current time in the isr and adding the tick time to that. This would just cause clock drift; as I explained earlier.

Please also post the output of the above test case (the time) when it fails, to help me understand the issue better. I'll also try to reproduce it again.

I'll have a closer look later today!

@jlaitine
Copy link
Contributor

jlaitine commented Oct 10, 2024

Now I am able to re-produce the issue you are reporting in qemu, but not in real HW.

The issue in qemu is, that sometimes servicing the timer interrupt is delayed until close to the end of the tick, or past it. This issue must be related to real-time behaviour of qemu.

The same would happen also in a real HW, in case of interrupt storm or in case of very long critical section, delaying the timer interrupt.

There is no way around that issue! In case servicing the timer interrupt is late, it just MUST proceed several ticks at once to keep up. The same would happen if you e.g. break the target with a debugger, and the timer is freely running in the background.

Sorry to say, but I believe this patch is not right. As I explained before, every time you set the timer comparator, you must set the next value to be exactly the previous_comparator_time + tick_time. So you need to know the previous time, which was set somehow.

If you do as suggested in this patch, setting it to current_time() + tick time, you add a delay of
deltaT = current_time() - previous_time;
to each tick, causing timer drift.

EDIT: Removed the comment about the race condition, I believe that is not an issue in arm64

@jlaitine
Copy link
Contributor

jlaitine commented Oct 10, 2024

A summary: I don't think that the tick timer can be reliably tested in qemu at all. The timer counting vs. interrupt latency is nowhere close to real-time or even predictable. You can make it fail fast by just shortening the tick time, and make it more reliable by lengthening it, allowing more time for qemu to handle the timer interrupt.

@github-actions github-actions bot added Size: S The size of the change in this PR is small and removed Size: XS The size of the change in this PR is very small labels Oct 10, 2024
In the previous implementation, when the current timer count just exceeds the next tick time, it will still cause clock drift. This commit fixed clock drift by using the abosolute ticks (clock_systime_tick() + ticks).

Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>
@Fix-Point
Copy link
Contributor Author

Thank you for the comments. I am trying to make another patch to fix this problem.

@Fix-Point Fix-Point closed this Oct 10, 2024
@Fix-Point Fix-Point deleted the mybran17 branch October 15, 2024 02:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Arch: arm64 Issues related to ARM64 (64-bit) architecture Area: OS Components OS Components issues Size: S The size of the change in this PR is small

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants