-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Accelerate access to interrupt status #13486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@hujun260, Can you enrich the |
The benefits are very significant. 1 Obtain the CPU index However, now it only requires a single CPU instruction. |
https://developer.arm.com/documentation/100026/0104/smr1465219161191 Do we have relevant performance test? For example, how many cycles does it take to call up_set_current_regs()/up_current_regs() 10,000 times with/out this PR? |
Firstly, irq masking cannot be removed here due to the crucial reason that we must ensure no scheduling occurs for the current task The current implementation need at least 3 executions of msr/mrs instructions plus 4 normal instructions, making this optimization evident. After optimization, only a single mrs instruction is needed, with no additional overhead. Unfortunately, we haven't conducted tests specifically for this single optimization point alone. |
Have you checked the assembly code? The previous implementation only reads the Affinity ID through MRC (1 cycle) and does not call MCR. Now your implementation uses MCR instruction to save the current regs with higher overhead than before.
Could you provide performance diagram before and after adding this commit? Or API level test? Why you conclude that the performance is better than before without any test? My concern is that MCR may perform worse than it currently does, but I'm not sure, could you help confirm this? |
Yes, you are right, I forgot SMP mode, which does require disabling interrupts. I am currently using AMP/BMP mode, the performance is much higher than SMP |
I did a test, in armv7-a arch, 200 million cycles before: after: |
Signed-off-by: hujun5 <hujun5@xiaomi.com>
Signed-off-by: hujun5 <hujun5@xiaomi.com>
resson: using percpu storage for g_current_regs or leveraging interrupt status registers to determine if code is running within an interrupt context can enhance performance. Signed-off-by: hujun5 <hujun5@xiaomi.com>
This is continue work of apache#13486 Discussion here: apache#13486 (comment) 1. move cp15.h to arch public 2. replace cp15 instruct to macros align operation 3. add memory barrier to avoid compiler optimization Signed-off-by: chao an <anchao@lixiang.com>
This is continue work of apache#13486 Discussion here: apache#13486 (comment) 1. move cp15.h to arch public 2. replace cp15 instruct to macros to align operation 3. add memory barrier to avoid compiler optimization Signed-off-by: chao an <anchao@lixiang.com>
This is continue work of apache#13486 Discussion here: apache#13486 (comment) 1. move cp15.h to arch public 2. replace cp15 instruct to macros to align operation 3. add memory barrier to avoid compiler optimization Signed-off-by: chao an <anchao@lixiang.com>
This is continue work of apache#13486 Discussion here: apache#13486 (comment) 1. move cp15.h to arch public 2. replace cp15 instruct to macros to align operation 3. add memory barrier to avoid compiler optimization Signed-off-by: chao an <anchao@lixiang.com>
This is continue work of apache#13486 Discussion here: apache#13486 (comment) 1. move cp15.h to arch public 2. replace cp15 instruct to macros to align operation 3. add memory barrier to avoid compiler optimization Signed-off-by: chao an <anchao@lixiang.com>
This is continue work of apache#13486 Discussion here: apache#13486 (comment) 1. move cp15.h to arch public 2. replace cp15 instruct to macros to align operation 3. add memory barrier to avoid compiler optimization Signed-off-by: chao an <anchao@lixiang.com>
This is continue work of apache#13486 Discussion here: apache#13486 (comment) 1. move cp15.h to arch public 2. replace cp15 instruct to macros to align operation 3. add memory barrier to avoid compiler optimization Signed-off-by: chao an <anchao@lixiang.com>
This is continue work of apache#13486 Discussion here: apache#13486 (comment) 1. move cp15.h to arch public 2. replace cp15 instruct to macros to align operation 3. add memory barrier to avoid compiler optimization Signed-off-by: chao an <anchao@lixiang.com>
This is continue work of apache#13486 Discussion here: apache#13486 (comment) 1. move cp15.h to arch public 2. replace cp15 instruct to macros to align operation 3. add memory barrier to avoid compiler optimization Signed-off-by: chao an <anchao@lixiang.com>
This is continue work of #13486 Discussion here: #13486 (comment) 1. move cp15.h to arch public 2. replace cp15 instruct to macros to align operation 3. add memory barrier to avoid compiler optimization Signed-off-by: chao an <anchao@lixiang.com>


Summary
using percpu storage for g_current_regs or leveraging interrupt status
registers to determine if code is running within an interrupt context can enhance performance.
The benefits are very significant.
Before the modification, if we needed to obtain the interrupt status, it required three steps:
1 Obtain the CPU index
2 Access the global variable
4 Disable/Enable interrupts
This process involved at least 7 CPU instructions.
However, now it only requires a single CPU instruction.
Impact
none
Testing
ostest