-
Notifications
You must be signed in to change notification settings - Fork 59.8k
value swap macro change with xor #32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
may I ask what is the swap value for? |
Contributor
|
The compiler will get rid of the temporary variable anyway this is just a different way or writing it. |
|
12 GB
|
|
Thanks |
swarren
pushed a commit
to swarren/linux-tegra
that referenced
this pull request
Aug 8, 2013
There is a loop in do_mlockall() that lacks a preemption point, which
means that the following can happen on non-preemptible builds of the
kernel:
> My fuzz tester keeps hitting this. Every instance shows the non-irq stack
> came in from mlockall. I'm only seeing this on one box, but that has more
> ram (8gb) than my other machines, which might explain it.
>
> Dave
>
> INFO: rcu_preempt self-detected stall on CPU { 3} (t=6500 jiffies g=470344 c=470343 q=0)
> sending NMI to all CPUs:
> NMI backtrace for cpu 3
> CPU: 3 PID: 29664 Comm: trinity-child2 Not tainted 3.11.0-rc1+ torvalds#32
> task: ffff88023e743fc0 ti: ffff88022f6f2000 task.ti: ffff88022f6f2000
> RIP: 0010:[<ffffffff810bf7d1>] [<ffffffff810bf7d1>] trace_hardirqs_off_caller+0x21/0xb0
> RSP: 0018:ffff880244e03c30 EFLAGS: 00000046
> RAX: ffff88023e743fc0 RBX: 0000000000000001 RCX: 000000000000003c
> RDX: 000000000000000f RSI: 0000000000000004 RDI: ffffffff81033cab
> RBP: ffff880244e03c38 R08: ffff880243288a80 R09: 0000000000000001
> R10: 0000000000000000 R11: 0000000000000001 R12: ffff880243288a80
> R13: ffff8802437eda40 R14: 0000000000080000 R15: 000000000000d010
> FS: 00007f50ae33b740(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000000097f000 CR3: 0000000240fa0000 CR4: 00000000001407e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
> Stack:
> ffffffff810bf86d ffff880244e03c98 ffffffff81033cab 0000000000000096
> 000000000000d008 0000000300000002 0000000000000004 0000000000000003
> 0000000000002710 ffffffff81c50d00 ffffffff81c50d00 ffff880244fcde00
> Call Trace:
> <IRQ>
> [<ffffffff810bf86d>] ? trace_hardirqs_off+0xd/0x10
> [<ffffffff81033cab>] __x2apic_send_IPI_mask+0x1ab/0x1c0
> [<ffffffff81033cdc>] x2apic_send_IPI_all+0x1c/0x20
> [<ffffffff81030115>] arch_trigger_all_cpu_backtrace+0x65/0xa0
> [<ffffffff811144b1>] rcu_check_callbacks+0x331/0x8e0
> [<ffffffff8108bfa0>] ? hrtimer_run_queues+0x20/0x180
> [<ffffffff8109e905>] ? sched_clock_cpu+0xb5/0x100
> [<ffffffff81069557>] update_process_times+0x47/0x80
> [<ffffffff810bd115>] tick_sched_handle.isra.16+0x25/0x60
> [<ffffffff810bd231>] tick_sched_timer+0x41/0x60
> [<ffffffff8108ace1>] __run_hrtimer+0x81/0x4e0
> [<ffffffff810bd1f0>] ? tick_sched_do_timer+0x60/0x60
> [<ffffffff8108b93f>] hrtimer_interrupt+0xff/0x240
> [<ffffffff8102de84>] local_apic_timer_interrupt+0x34/0x60
> [<ffffffff81718c5f>] smp_apic_timer_interrupt+0x3f/0x60
> [<ffffffff817178ef>] apic_timer_interrupt+0x6f/0x80
> [<ffffffff8170e8e0>] ? retint_restore_args+0xe/0xe
> [<ffffffff8105f101>] ? __do_softirq+0xb1/0x440
> [<ffffffff8105f64d>] irq_exit+0xcd/0xe0
> [<ffffffff81718c65>] smp_apic_timer_interrupt+0x45/0x60
> [<ffffffff817178ef>] apic_timer_interrupt+0x6f/0x80
> <EOI>
> [<ffffffff8170e8e0>] ? retint_restore_args+0xe/0xe
> [<ffffffff8170b830>] ? wait_for_completion_killable+0x170/0x170
> [<ffffffff8170c853>] ? preempt_schedule_irq+0x53/0x90
> [<ffffffff8170e9f6>] retint_kernel+0x26/0x30
> [<ffffffff8107a523>] ? queue_work_on+0x43/0x90
> [<ffffffff8107c369>] schedule_on_each_cpu+0xc9/0x1a0
> [<ffffffff81167770>] ? lru_add_drain+0x50/0x50
> [<ffffffff811677c5>] lru_add_drain_all+0x15/0x20
> [<ffffffff81186965>] SyS_mlockall+0xa5/0x1a0
> [<ffffffff81716e94>] tracesys+0xdd/0xe2
This commit addresses this problem by inserting the required preemption
point.
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
torvalds
pushed a commit
that referenced
this pull request
Sep 25, 2013
There is a loop in do_mlockall() that lacks a preemption point, which
means that the following can happen on non-preemptible builds of the
kernel. Dave Jones reports:
"My fuzz tester keeps hitting this. Every instance shows the non-irq
stack came in from mlockall. I'm only seeing this on one box, but
that has more ram (8gb) than my other machines, which might explain
it.
INFO: rcu_preempt self-detected stall on CPU { 3} (t=6500 jiffies g=470344 c=470343 q=0)
sending NMI to all CPUs:
NMI backtrace for cpu 3
CPU: 3 PID: 29664 Comm: trinity-child2 Not tainted 3.11.0-rc1+ #32
Call Trace:
lru_add_drain_all+0x15/0x20
SyS_mlockall+0xa5/0x1a0
tracesys+0xdd/0xe2"
This commit addresses this problem by inserting the required preemption
point.
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
johnweber
pushed a commit
to johnweber/linux
that referenced
this pull request
Oct 1, 2013
torvalds#125 IRQ torvalds#125's status is not constant on different boards, IRQ torvalds#32 is IOMUXC's interrupt which can be triggered manually at anytime, use this irq instead of torvalds#125 to generate interrupt for avoiding CCM enter low power mode by mistake. Signed-off-by: Anson Huang <b20788@freescale.com>
swarren
pushed a commit
to swarren/linux-tegra
that referenced
this pull request
Oct 1, 2013
There is a loop in do_mlockall() that lacks a preemption point, which
means that the following can happen on non-preemptible builds of the
kernel:
> My fuzz tester keeps hitting this. Every instance shows the non-irq stack
> came in from mlockall. I'm only seeing this on one box, but that has more
> ram (8gb) than my other machines, which might explain it.
>
> Dave
>
> INFO: rcu_preempt self-detected stall on CPU { 3} (t=6500 jiffies g=470344 c=470343 q=0)
> sending NMI to all CPUs:
> NMI backtrace for cpu 3
> CPU: 3 PID: 29664 Comm: trinity-child2 Not tainted 3.11.0-rc1+ torvalds#32
> task: ffff88023e743fc0 ti: ffff88022f6f2000 task.ti: ffff88022f6f2000
> RIP: 0010:[<ffffffff810bf7d1>] [<ffffffff810bf7d1>] trace_hardirqs_off_caller+0x21/0xb0
> RSP: 0018:ffff880244e03c30 EFLAGS: 00000046
> RAX: ffff88023e743fc0 RBX: 0000000000000001 RCX: 000000000000003c
> RDX: 000000000000000f RSI: 0000000000000004 RDI: ffffffff81033cab
> RBP: ffff880244e03c38 R08: ffff880243288a80 R09: 0000000000000001
> R10: 0000000000000000 R11: 0000000000000001 R12: ffff880243288a80
> R13: ffff8802437eda40 R14: 0000000000080000 R15: 000000000000d010
> FS: 00007f50ae33b740(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000000097f000 CR3: 0000000240fa0000 CR4: 00000000001407e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
> Stack:
> ffffffff810bf86d ffff880244e03c98 ffffffff81033cab 0000000000000096
> 000000000000d008 0000000300000002 0000000000000004 0000000000000003
> 0000000000002710 ffffffff81c50d00 ffffffff81c50d00 ffff880244fcde00
> Call Trace:
> <IRQ>
> [<ffffffff810bf86d>] ? trace_hardirqs_off+0xd/0x10
> [<ffffffff81033cab>] __x2apic_send_IPI_mask+0x1ab/0x1c0
> [<ffffffff81033cdc>] x2apic_send_IPI_all+0x1c/0x20
> [<ffffffff81030115>] arch_trigger_all_cpu_backtrace+0x65/0xa0
> [<ffffffff811144b1>] rcu_check_callbacks+0x331/0x8e0
> [<ffffffff8108bfa0>] ? hrtimer_run_queues+0x20/0x180
> [<ffffffff8109e905>] ? sched_clock_cpu+0xb5/0x100
> [<ffffffff81069557>] update_process_times+0x47/0x80
> [<ffffffff810bd115>] tick_sched_handle.isra.16+0x25/0x60
> [<ffffffff810bd231>] tick_sched_timer+0x41/0x60
> [<ffffffff8108ace1>] __run_hrtimer+0x81/0x4e0
> [<ffffffff810bd1f0>] ? tick_sched_do_timer+0x60/0x60
> [<ffffffff8108b93f>] hrtimer_interrupt+0xff/0x240
> [<ffffffff8102de84>] local_apic_timer_interrupt+0x34/0x60
> [<ffffffff81718c5f>] smp_apic_timer_interrupt+0x3f/0x60
> [<ffffffff817178ef>] apic_timer_interrupt+0x6f/0x80
> [<ffffffff8170e8e0>] ? retint_restore_args+0xe/0xe
> [<ffffffff8105f101>] ? __do_softirq+0xb1/0x440
> [<ffffffff8105f64d>] irq_exit+0xcd/0xe0
> [<ffffffff81718c65>] smp_apic_timer_interrupt+0x45/0x60
> [<ffffffff817178ef>] apic_timer_interrupt+0x6f/0x80
> <EOI>
> [<ffffffff8170e8e0>] ? retint_restore_args+0xe/0xe
> [<ffffffff8170b830>] ? wait_for_completion_killable+0x170/0x170
> [<ffffffff8170c853>] ? preempt_schedule_irq+0x53/0x90
> [<ffffffff8170e9f6>] retint_kernel+0x26/0x30
> [<ffffffff8107a523>] ? queue_work_on+0x43/0x90
> [<ffffffff8107c369>] schedule_on_each_cpu+0xc9/0x1a0
> [<ffffffff81167770>] ? lru_add_drain+0x50/0x50
> [<ffffffff811677c5>] lru_add_drain_all+0x15/0x20
> [<ffffffff81186965>] SyS_mlockall+0xa5/0x1a0
> [<ffffffff81716e94>] tracesys+0xdd/0xe2
This commit addresses this problem by inserting the required preemption
point.
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
swarren
pushed a commit
to swarren/linux-tegra
that referenced
this pull request
Oct 14, 2013
As the new x86 CPU bootup printout format code maintainer, I am taking immediate action to improve and clean (and thus indulge my OCD) the reporting of the cores when coming up online. Fix padding to a right-hand alignment, cleanup code and bind reporting width to the max number of supported CPUs on the system, like this: [ 0.074509] smpboot: Booting Node 0, Processors: #1 #2 #3 #4 #5 torvalds#6 torvalds#7 OK [ 0.644008] smpboot: Booting Node 1, Processors: torvalds#8 torvalds#9 torvalds#10 torvalds#11 torvalds#12 torvalds#13 torvalds#14 torvalds#15 OK [ 1.245006] smpboot: Booting Node 2, Processors: torvalds#16 torvalds#17 torvalds#18 torvalds#19 torvalds#20 torvalds#21 torvalds#22 torvalds#23 OK [ 1.864005] smpboot: Booting Node 3, Processors: torvalds#24 torvalds#25 torvalds#26 torvalds#27 torvalds#28 torvalds#29 torvalds#30 torvalds#31 OK [ 2.489005] smpboot: Booting Node 4, Processors: torvalds#32 torvalds#33 torvalds#34 torvalds#35 torvalds#36 torvalds#37 torvalds#38 torvalds#39 OK [ 3.093005] smpboot: Booting Node 5, Processors: torvalds#40 torvalds#41 torvalds#42 torvalds#43 torvalds#44 torvalds#45 torvalds#46 torvalds#47 OK [ 3.698005] smpboot: Booting Node 6, Processors: torvalds#48 torvalds#49 torvalds#50 torvalds#51 #52 #53 torvalds#54 torvalds#55 OK [ 4.304005] smpboot: Booting Node 7, Processors: torvalds#56 torvalds#57 #58 torvalds#59 torvalds#60 torvalds#61 torvalds#62 torvalds#63 OK [ 4.961413] Brought up 64 CPUs and this: [ 0.072367] smpboot: Booting Node 0, Processors: #1 #2 #3 #4 #5 torvalds#6 torvalds#7 OK [ 0.686329] Brought up 8 CPUs Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Libin <huawei.libin@huawei.com> Cc: wangyijing@huawei.com Cc: fenghua.yu@intel.com Cc: guohanjun@huawei.com Cc: paul.gortmaker@windriver.com Link: http://lkml.kernel.org/r/20130927143554.GF4422@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>
swarren
pushed a commit
to swarren/linux-tegra
that referenced
this pull request
Oct 14, 2013
Turn it into (for example): [ 0.073380] x86: Booting SMP configuration: [ 0.074005] .... node #0, CPUs: #1 #2 #3 #4 #5 torvalds#6 torvalds#7 [ 0.603005] .... node #1, CPUs: torvalds#8 torvalds#9 torvalds#10 torvalds#11 torvalds#12 torvalds#13 torvalds#14 torvalds#15 [ 1.200005] .... node #2, CPUs: torvalds#16 torvalds#17 torvalds#18 torvalds#19 torvalds#20 torvalds#21 torvalds#22 torvalds#23 [ 1.796005] .... node #3, CPUs: torvalds#24 torvalds#25 torvalds#26 torvalds#27 torvalds#28 torvalds#29 torvalds#30 torvalds#31 [ 2.393005] .... node #4, CPUs: torvalds#32 torvalds#33 torvalds#34 torvalds#35 torvalds#36 torvalds#37 torvalds#38 torvalds#39 [ 2.996005] .... node #5, CPUs: torvalds#40 torvalds#41 torvalds#42 torvalds#43 torvalds#44 torvalds#45 torvalds#46 torvalds#47 [ 3.600005] .... node torvalds#6, CPUs: torvalds#48 torvalds#49 torvalds#50 torvalds#51 #52 #53 torvalds#54 torvalds#55 [ 4.202005] .... node torvalds#7, CPUs: torvalds#56 torvalds#57 #58 torvalds#59 torvalds#60 torvalds#61 torvalds#62 torvalds#63 [ 4.811005] .... node torvalds#8, CPUs: torvalds#64 torvalds#65 torvalds#66 torvalds#67 torvalds#68 torvalds#69 #70 torvalds#71 [ 5.421006] .... node torvalds#9, CPUs: torvalds#72 torvalds#73 torvalds#74 torvalds#75 torvalds#76 torvalds#77 torvalds#78 torvalds#79 [ 6.032005] .... node torvalds#10, CPUs: torvalds#80 torvalds#81 torvalds#82 torvalds#83 torvalds#84 torvalds#85 torvalds#86 torvalds#87 [ 6.648006] .... node torvalds#11, CPUs: torvalds#88 torvalds#89 torvalds#90 torvalds#91 torvalds#92 torvalds#93 torvalds#94 torvalds#95 [ 7.262005] .... node torvalds#12, CPUs: torvalds#96 torvalds#97 torvalds#98 torvalds#99 torvalds#100 torvalds#101 torvalds#102 torvalds#103 [ 7.865005] .... node torvalds#13, CPUs: torvalds#104 torvalds#105 torvalds#106 torvalds#107 torvalds#108 torvalds#109 torvalds#110 torvalds#111 [ 8.466005] .... node torvalds#14, CPUs: torvalds#112 torvalds#113 torvalds#114 torvalds#115 torvalds#116 torvalds#117 torvalds#118 torvalds#119 [ 9.073006] .... node torvalds#15, CPUs: torvalds#120 torvalds#121 torvalds#122 torvalds#123 torvalds#124 torvalds#125 torvalds#126 torvalds#127 [ 9.679901] x86: Booted up 16 nodes, 128 CPUs and drop useless elements. Change num_digits() to hpa's division-avoiding, cell-phone-typed version which he went at great lengths and pains to submit on a Saturday evening. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: huawei.libin@huawei.com Cc: wangyijing@huawei.com Cc: fenghua.yu@intel.com Cc: guohanjun@huawei.com Cc: paul.gortmaker@windriver.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20130930095624.GB16383@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>
swarren
pushed a commit
to swarren/linux-tegra
that referenced
this pull request
Oct 14, 2013
The 'driver' field of the i2c_client struct is redundant. The same data can be accessed through to_i2c_driver(client->dev.driver). The generated code for both approaches in more or less the same. E.g. on ARM the expression client->driver->command(...) generates ... ldr r3, [r0, torvalds#28] ldr r3, [r3, torvalds#32] blx r3 ... and the expression to_i2c_driver(client->dev.driver)->command(...) generates ... ldr r3, [r0, torvalds#160] ldr r3, [r3, #-4] blx r3 ... Other architectures will generate similar code. All users of the 'driver' field outside of the I2C core have already been converted. So this only leaves the core itself. This patch converts the remaining few users in the I2C core and then removes the 'driver' field from the i2c_client struct. Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
TechNexion
pushed a commit
to TechNexion/linux
that referenced
this pull request
Oct 25, 2013
commit 7ed47b7 upstream. The ghash_update function passes a pointer to gf128mul_4k_lle which will be NULL if ghash_setkey is not called or if the most recent call to ghash_setkey failed to allocate memory. This causes an oops. Fix this up by returning an error code in the null case. This is trivially triggered from unprivileged userspace through the AF_ALG interface by simply writing to the socket without setting a key. The ghash_final function has a similar issue, but triggering it requires a memory allocation failure in ghash_setkey _after_ at least one successful call to ghash_update. BUG: unable to handle kernel NULL pointer dereference at 00000670 IP: [<d88c92d4>] gf128mul_4k_lle+0x23/0x60 [gf128mul] *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP Modules linked in: ghash_generic gf128mul algif_hash af_alg nfs lockd nfs_acl sunrpc bridge ipv6 stp llc Pid: 1502, comm: hashatron Tainted: G W 3.1.0-rc9-00085-ge9308cf torvalds#32 Bochs Bochs EIP: 0060:[<d88c92d4>] EFLAGS: 00000202 CPU: 0 EIP is at gf128mul_4k_lle+0x23/0x60 [gf128mul] EAX: d69db1f0 EBX: d6b8ddac ECX: 00000004 EDX: 00000000 ESI: 00000670 EDI: d6b8ddac EBP: d6b8ddc8 ESP: d6b8dda4 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process hashatron (pid: 1502, ti=d6b8c000 task=d6810000 task.ti=d6b8c000) Stack: 00000000 d69db1f0 00000163 00000000 d6b8ddc8 c101a520 d69db1f0 d52aa000 00000ff0 d6b8dde8 d88d310f d6b8a3f8 d52aa000 00001000 d88d502c d6b8ddfc 00001000 d6b8ddf4 c11676e d69db1e8 d6b8de24 c11679ad d52aa000 00000000 Call Trace: [<c101a520>] ? kmap_atomic_prot+0x37/0xa6 [<d88d310f>] ghash_update+0x85/0xbe [ghash_generic] [<c11676ed>] crypto_shash_update+0x18/0x1b [<c11679ad>] shash_ahash_update+0x22/0x36 [<c11679cc>] shash_async_update+0xb/0xd [<d88ce0ba>] hash_sendpage+0xba/0xf2 [algif_hash] [<c121b24c>] kernel_sendpage+0x39/0x4e [<d88ce000>] ? 0xd88cdfff [<c121b298>] sock_sendpage+0x37/0x3e [<c121b261>] ? kernel_sendpage+0x4e/0x4e [<c10b4dbc>] pipe_to_sendpage+0x56/0x61 [<c10b4e1f>] splice_from_pipe_feed+0x58/0xcd [<c10b4d66>] ? splice_from_pipe_begin+0x10/0x10 [<c10b51f5>] __splice_from_pipe+0x36/0x55 [<c10b4d66>] ? splice_from_pipe_begin+0x10/0x10 [<c10b6383>] splice_from_pipe+0x51/0x64 [<c10b63c2>] ? default_file_splice_write+0x2c/0x2c [<c10b63d5>] generic_splice_sendpage+0x13/0x15 [<c10b4d66>] ? splice_from_pipe_begin+0x10/0x10 [<c10b527f>] do_splice_from+0x5d/0x67 [<c10b6865>] sys_splice+0x2bf/0x363 [<c129373b>] ? sysenter_exit+0xf/0x16 [<c104dc1e>] ? trace_hardirqs_on_caller+0x10e/0x13f [<c129370c>] sysenter_do_call+0x12/0x32 Code: 83 c4 0c 5b 5e 5f c9 c3 55 b9 04 00 00 00 89 e5 57 8d 7d e4 56 53 8d 5d e4 83 ec 18 89 45 e0 89 55 dc 0f b6 70 0f c1 e6 04 01 d6 <f3> a5 be 0f 00 00 00 4e 89 d8 e8 48 ff ff ff 8b 45 e0 89 da 0f EIP: [<d88c92d4>] gf128mul_4k_lle+0x23/0x60 [gf128mul] SS:ESP 0068:d6b8dda4 CR2: 0000000000000670 ---[ end trace 4eaa2a86a8e2da24 ]--- note: hashatron[1502] exited with preempt_count 1 BUG: scheduling while atomic: hashatron/1502/0x10000002 INFO: lockdep is turned off. [...] Signed-off-by: Nick Bowler <nbowler@elliptictech.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
swarren
pushed a commit
to swarren/linux-tegra
that referenced
this pull request
Oct 29, 2013
There is a defect in imx6 LPM design. When SW tries to enter low power mode with following sequence, the chip will enter low power mode before A9 CPU execute WFI instruction: 1. Set CCM_CLPCR[1:0] to 2'b00; 2. ARM CPU enters WFI; 3. ARM CPU wakeup from an interrupt event, which is masked by GPC or not visible to GPC, such as interrupt from local timer; 4. Set CCM_CLPCR[1:0] to 2'b01 or 2'b10; 5. ARM CPU execute WFI. Before the last step, the chip will enter WAIT mode if CCM_CLPCR[1:0] is set to 2'b01, or enter STOP mode if CCM_CLPCR[1:0] is set to 2'b10. The patch implements a recommended workaround for this issue. 1. SW triggers irq torvalds#32(IOMUX) to be always pending manually by setting IOMUX_GPR1_GINT bit; 2. SW should then unmask it in GPC before setting CCM LPM; 3. SW should mask it right after CCM LPM is set (bit0-1 of CCM_CLPCR). Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
torvalds
pushed a commit
that referenced
this pull request
Nov 22, 2013
Lockdep complains about btrfs's async commit: [ 2372.462171] [ BUG: bad unlock balance detected! ] [ 2372.462191] 3.12.0+ #32 Tainted: G W [ 2372.462209] ------------------------------------- [ 2372.462228] ceph-osd/14048 is trying to release lock (sb_internal) at: [ 2372.462275] [<ffffffffa022cb10>] btrfs_commit_transaction_async+0x1b0/0x2a0 [btrfs] [ 2372.462305] but there are no more locks to release! [ 2372.462324] [ 2372.462324] other info that might help us debug this: [ 2372.462349] no locks held by ceph-osd/14048. [ 2372.462367] [ 2372.462367] stack backtrace: [ 2372.462386] CPU: 2 PID: 14048 Comm: ceph-osd Tainted: G W 3.12.0+ #32 [ 2372.462414] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS 080015 11/09/2011 [ 2372.462455] ffffffffa022cb10 ffff88007490fd28 ffffffff816f094a ffff8800378aa320 [ 2372.462491] ffff88007490fd50 ffffffff810adf4c ffff8800378aa320 ffff88009af97650 [ 2372.462526] ffffffffa022cb10 ffff88007490fd88 ffffffff810b01ee ffff8800898c0000 [ 2372.462562] Call Trace: [ 2372.462584] [<ffffffffa022cb10>] ? btrfs_commit_transaction_async+0x1b0/0x2a0 [btrfs] [ 2372.462619] [<ffffffff816f094a>] dump_stack+0x45/0x56 [ 2372.462642] [<ffffffff810adf4c>] print_unlock_imbalance_bug+0xec/0x100 [ 2372.462677] [<ffffffffa022cb10>] ? btrfs_commit_transaction_async+0x1b0/0x2a0 [btrfs] [ 2372.462710] [<ffffffff810b01ee>] lock_release+0x18e/0x210 [ 2372.462742] [<ffffffffa022cb36>] btrfs_commit_transaction_async+0x1d6/0x2a0 [btrfs] [ 2372.462783] [<ffffffffa025a7ce>] btrfs_ioctl_start_sync+0x3e/0xc0 [btrfs] [ 2372.462822] [<ffffffffa025f1d3>] btrfs_ioctl+0x4c3/0x1f70 [btrfs] [ 2372.462849] [<ffffffff812c0321>] ? avc_has_perm+0x121/0x1b0 [ 2372.462873] [<ffffffff812c0224>] ? avc_has_perm+0x24/0x1b0 [ 2372.462897] [<ffffffff8107ecc8>] ? sched_clock_cpu+0xa8/0x100 [ 2372.462922] [<ffffffff8117b145>] do_vfs_ioctl+0x2e5/0x4e0 [ 2372.462946] [<ffffffff812c19e6>] ? file_has_perm+0x86/0xa0 [ 2372.462969] [<ffffffff8117b3c1>] SyS_ioctl+0x81/0xa0 [ 2372.462991] [<ffffffff817045a4>] tracesys+0xdd/0xe2 ==================================================== It's because that we don't do the right thing when checking if it's ok to tell lockdep that we're trying to release the rwsem. If the trans handle's type is TRANS_ATTACH, we won't acquire the freeze rwsem, but as TRANS_ATTACH fits the check (trans < TRANS_JOIN_NOLOCK), we'll release the freeze rwsem, which makes lockdep complains a lot. Reported-by: Ma Jianpeng <majianpeng@gmail.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
heftig
referenced
this pull request
in zen-kernel/zen-kernel
Dec 21, 2013
commit b1a06a4 upstream. Lockdep complains about btrfs's async commit: [ 2372.462171] [ BUG: bad unlock balance detected! ] [ 2372.462191] 3.12.0+ #32 Tainted: G W [ 2372.462209] ------------------------------------- [ 2372.462228] ceph-osd/14048 is trying to release lock (sb_internal) at: [ 2372.462275] [<ffffffffa022cb10>] btrfs_commit_transaction_async+0x1b0/0x2a0 [btrfs] [ 2372.462305] but there are no more locks to release! [ 2372.462324] [ 2372.462324] other info that might help us debug this: [ 2372.462349] no locks held by ceph-osd/14048. [ 2372.462367] [ 2372.462367] stack backtrace: [ 2372.462386] CPU: 2 PID: 14048 Comm: ceph-osd Tainted: G W 3.12.0+ #32 [ 2372.462414] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS 080015 11/09/2011 [ 2372.462455] ffffffffa022cb10 ffff88007490fd28 ffffffff816f094a ffff8800378aa320 [ 2372.462491] ffff88007490fd50 ffffffff810adf4c ffff8800378aa320 ffff88009af97650 [ 2372.462526] ffffffffa022cb10 ffff88007490fd88 ffffffff810b01ee ffff8800898c0000 [ 2372.462562] Call Trace: [ 2372.462584] [<ffffffffa022cb10>] ? btrfs_commit_transaction_async+0x1b0/0x2a0 [btrfs] [ 2372.462619] [<ffffffff816f094a>] dump_stack+0x45/0x56 [ 2372.462642] [<ffffffff810adf4c>] print_unlock_imbalance_bug+0xec/0x100 [ 2372.462677] [<ffffffffa022cb10>] ? btrfs_commit_transaction_async+0x1b0/0x2a0 [btrfs] [ 2372.462710] [<ffffffff810b01ee>] lock_release+0x18e/0x210 [ 2372.462742] [<ffffffffa022cb36>] btrfs_commit_transaction_async+0x1d6/0x2a0 [btrfs] [ 2372.462783] [<ffffffffa025a7ce>] btrfs_ioctl_start_sync+0x3e/0xc0 [btrfs] [ 2372.462822] [<ffffffffa025f1d3>] btrfs_ioctl+0x4c3/0x1f70 [btrfs] [ 2372.462849] [<ffffffff812c0321>] ? avc_has_perm+0x121/0x1b0 [ 2372.462873] [<ffffffff812c0224>] ? avc_has_perm+0x24/0x1b0 [ 2372.462897] [<ffffffff8107ecc8>] ? sched_clock_cpu+0xa8/0x100 [ 2372.462922] [<ffffffff8117b145>] do_vfs_ioctl+0x2e5/0x4e0 [ 2372.462946] [<ffffffff812c19e6>] ? file_has_perm+0x86/0xa0 [ 2372.462969] [<ffffffff8117b3c1>] SyS_ioctl+0x81/0xa0 [ 2372.462991] [<ffffffff817045a4>] tracesys+0xdd/0xe2 ==================================================== It's because that we don't do the right thing when checking if it's ok to tell lockdep that we're trying to release the rwsem. If the trans handle's type is TRANS_ATTACH, we won't acquire the freeze rwsem, but as TRANS_ATTACH fits the check (trans < TRANS_JOIN_NOLOCK), we'll release the freeze rwsem, which makes lockdep complains a lot. Reported-by: Ma Jianpeng <majianpeng@gmail.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
apxii
pushed a commit
to apxii/linux
that referenced
this pull request
Jan 4, 2014
Improve the comment of SW workaround for CCM lpm issue using hardware errata description to avoid confusion. ERR007265: CCM: When improper low-power sequence is used, the SoC enters low power mode before the ARM core executes WFI. Software workaround: 1) Software should trigger IRQ torvalds#32 (IOMUX) to be always pending by setting IOMUX_GPR1_GINT. 2) Software should then unmask IRQ torvalds#32 in GPC before setting CCM Low-Power mode. 3) Software should mask IRQ torvalds#32 right after CCM Low-Power mode is set (set bits 0-1 of CCM_CLPCR). Signed-off-by: Anson Huang <b20788@freescale.com> Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
johnweber
pushed a commit
to wandboard-org/linux
that referenced
this pull request
Jan 10, 2014
torvalds#125 IRQ torvalds#125's status is not constant on different boards, IRQ torvalds#32 is IOMUXC's interrupt which can be triggered manually at anytime, use this irq instead of torvalds#125 to generate interrupt for avoiding CCM enter low power mode by mistake. Signed-off-by: Anson Huang <b20788@freescale.com>
swarren
pushed a commit
to swarren/linux-tegra
that referenced
this pull request
Feb 26, 2014
When doing some numa tests on powerpc, I triggered an oops bug. I find it is caused by using page->_last_cpupid. It should be initialized as "-1 & LAST_CPUPID_MASK", but not "-1". Otherwise, in task_numa_fault(), we will miss the checking (last_cpupid == (-1 & LAST_CPUPID_MASK)). And finally cause an oops bug in task_numa_group(), since the online cpu is less than possible cpu. PPC needs the LAST_CPUPID_NOT_IN_PAGE_FLAGS case because ppc needs to support a large physical address region, up to 2^46 but small section size (2^24). So when NR_CPUS grows up, it is easily to cause not-in-page-flags. Call trace: [ 55.978091] SMP NR_CPUS=64 NUMA PowerNV [ 55.978118] Modules linked in: [ 55.978145] CPU: 24 PID: 804 Comm: systemd-udevd Not tainted 3.13.0-rc1+ torvalds#32 [ 55.978183] task: c000001e2746aa80 ti: c000001e32c50000 task.ti: c000001e32c50000 [ 55.978219] NIP: c0000000000f5ad0 LR: c0000000000f5ac8 CTR: c000000000913cf0 [ 55.978256] REGS: c000001e32c53510 TRAP: 0300 Not tainted (3.13.0-rc1+) [ 55.978286] MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI> CR: 28024424 XER: 20000000 [ 55.978380] CFAR: c000000000009324 DAR: 7265717569726857 DSISR: 40000000 SOFTE: 1 GPR00: c0000000000f5ac8 c000001e32c53790 c000000001f34338 0000000000000021 GPR04: 0000000000000000 0000000000000031 c000000001f74338 0000ffffffffffff GPR08: 0000000000000001 7265717569726573 0000000000000000 0000000000000000 GPR12: 0000000028024422 c00000000ffdd800 00000000296b2e64 0000000000000020 GPR16: 0000000000000002 0000000000000003 c000001e2f8e4658 c000001e25c1c1d8 GPR20: c000001e2f8e4000 c000000001f7a858 0000000000000658 0000000040000392 GPR24: 00000000000000a8 c000001e33c1a400 00000000000001d8 c000001e25c1c000 GPR28: c000001e33c37ff0 0007837840000392 000000000000003f c000001e32c53790 [ 55.978903] NIP [c0000000000f5ad0] .task_numa_fault+0x1470/0x2370 [ 55.978934] LR [c0000000000f5ac8] .task_numa_fault+0x1468/0x2370 [ 55.978964] Call Trace: [ 55.978978] [c000001e32c53790] [c0000000000f5ac8] .task_numa_fault+0x1468/0x2370 (unreliable) [ 55.979036] [c000001e32c539e0] [c00000000020a820] .do_numa_page+0x480/0x4a0 [ 55.979072] [c000001e32c53b10] [c00000000020bfec] .handle_mm_fault+0x4ec/0xc90 [ 55.979123] [c000001e32c53c00] [c000000000e88c98] .do_page_fault+0x3a8/0x890 [ 55.979161] [c000001e32c53e30] [c000000000009568] handle_page_fault+0x10/0x30 [ 55.979197] Instruction dump: [ 55.979216] 3c82fefb 3884b138 48d9cff1 60000000 48000574 3c62fef 3863af78 3c82fefb [ 55.979277] 3884b138 48d9cfd5 60000000 e93f0100 <812902e4> 7d2907b4 5529063e 7d2a07b4 [ 55.979354] ---[ end trace 15f2510da5ae07cf ]--- Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
swarren
pushed a commit
to swarren/linux-tegra
that referenced
this pull request
Mar 3, 2014
I can trigger a lockdep warning: # mount -t cgroup -o cpuset xxx /cgroup # mkdir /cgroup/cpuset # mkdir /cgroup/tmp # echo 0 > /cgroup/tmp/cpuset.cpus # echo 0 > /cgroup/tmp/cpuset.mems # echo 1 > /cgroup/tmp/cpuset.memory_migrate # echo $$ > /cgroup/tmp/tasks # echo 1 > /cgruop/tmp/cpuset.mems =============================== [ INFO: suspicious RCU usage. ] 3.14.0-rc1-0.1-default+ torvalds#32 Not tainted ------------------------------- include/linux/cgroup.h:682 suspicious rcu_dereference_check() usage! ... [<ffffffff81582174>] dump_stack+0x72/0x86 [<ffffffff810b8f01>] lockdep_rcu_suspicious+0x101/0x140 [<ffffffff81105ba1>] cpuset_migrate_mm+0xb1/0xe0 ... We used to hold cgroup_mutex when calling cpuset_migrate_mm(), but now we hold cpuset_mutex, which causes task_css() to complain. This is not a false-positive but a real issue. Holding cpuset_mutex won't prevent a task from migrating to another cpuset, and it won't prevent the original task->cgroup from destroying during this change. Fixes: 5d21cc2 (cpuset: replace cgroup_mutex locking with cpuset internal locking) Cc: <stable@vger.kernel.org> # 3.9+ Signed-off-by: Li Zefan <lizefan@huawei.com> Sigend-off-by: Tejun Heo <tj@kernel.org>
torvalds
pushed a commit
that referenced
this pull request
Mar 5, 2014
When doing some numa tests on powerpc, I triggered an oops bug. I find it is caused by using page->_last_cpupid. It should be initialized as "-1 & LAST_CPUPID_MASK", but not "-1". Otherwise, in task_numa_fault(), we will miss the checking (last_cpupid == (-1 & LAST_CPUPID_MASK)). And finally cause an oops bug in task_numa_group(), since the online cpu is less than possible cpu. This happen with CONFIG_SPARSE_VMEMMAP disabled Call trace: SMP NR_CPUS=64 NUMA PowerNV Modules linked in: CPU: 24 PID: 804 Comm: systemd-udevd Not tainted3.13.0-rc1+ #32 task: c000001e2746aa80 ti: c000001e32c50000 task.ti:c000001e32c50000 REGS: c000001e32c53510 TRAP: 0300 Not tainted(3.13.0-rc1+) MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI> CR:28024424 XER: 20000000 CFAR: c000000000009324 DAR: 7265717569726857 DSISR:40000000 SOFTE: 1 NIP .task_numa_fault+0x1470/0x2370 LR .task_numa_fault+0x1468/0x2370 Call Trace: .task_numa_fault+0x1468/0x2370 (unreliable) .do_numa_page+0x480/0x4a0 .handle_mm_fault+0x4ec/0xc90 .do_page_fault+0x3a8/0x890 handle_page_fault+0x10/0x30 Instruction dump: 3c82fefb 3884b138 48d9cff1 60000000 48000574 3c62fefb3863af78 3c82fefb 3884b138 48d9cfd5 60000000 e93f0100 <812902e4> 7d2907b45529063e 7d2a07b4 ---[ end trace 15f2510da5ae07cf ]--- Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
zeitgeist87
pushed a commit
to zeitgeist87/linux
that referenced
this pull request
Mar 14, 2014
When doing some numa tests on powerpc, I triggered an oops bug. I find it is caused by using page->_last_cpupid. It should be initialized as "-1 & LAST_CPUPID_MASK", but not "-1". Otherwise, in task_numa_fault(), we will miss the checking (last_cpupid == (-1 & LAST_CPUPID_MASK)). And finally cause an oops bug in task_numa_group(), since the online cpu is less than possible cpu. This happen with CONFIG_SPARSE_VMEMMAP disabled Call trace: [ 55.978091] SMP NR_CPUS=64 NUMA PowerNV [ 55.978118] Modules linked in: [ 55.978145] CPU: 24 PID: 804 Comm: systemd-udevd Not tainted3.13.0-rc1+ torvalds#32 [ 55.978183] task: c000001e2746aa80 ti: c000001e32c50000 task.ti:c000001e32c50000 [ 55.978219] NIP: c0000000000f5ad0 LR: c0000000000f5ac8 CTR:c000000000913cf0 [ 55.978256] REGS: c000001e32c53510 TRAP: 0300 Not tainted(3.13.0-rc1+) [ 55.978286] MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI> CR:28024424 XER: 20000000 [ 55.978380] CFAR: c000000000009324 DAR: 7265717569726857 DSISR:40000000 SOFTE: 1 GPR00: c0000000000f5ac8 c000001e32c53790 c000000001f343380000000000000021 GPR04: 0000000000000000 0000000000000031 c000000001f743380000ffffffffffff GPR08: 0000000000000001 7265717569726573 00000000000000000000000000000000 GPR12: 0000000028024422 c00000000ffdd800 00000000296b2e640000000000000020 GPR16: 0000000000000002 0000000000000003 c000001e2f8e4658c000001e25c1c1d8 GPR20: c000001e2f8e4000 c000000001f7a858 00000000000006580000000040000392 GPR24: 00000000000000a8 c000001e33c1a400 00000000000001d8c000001e25c1c000 GPR28: c000001e33c37ff0 0007837840000392 000000000000003fc000001e32c53790 [ 55.978903] NIP [c0000000000f5ad0] .task_numa_fault+0x1470/0x2370 [ 55.978934] LR [c0000000000f5ac8] .task_numa_fault+0x1468/0x2370 [ 55.978964] Call Trace: [ 55.978978] [c000001e32c53790] [c0000000000f5ac8].task_numa_fault+0x1468/0x2370 (unreliable) [ 55.979036] [c000001e32c539e0] [c00000000020a820].do_numa_page+0x480/0x4a0 [ 55.979072] [c000001e32c53b10] [c00000000020bfec].handle_mm_fault+0x4ec/0xc90 [ 55.979123] [c000001e32c53c00] [c000000000e88c98].do_page_fault+0x3a8/0x890 [ 55.979161] [c000001e32c53e30] [c000000000009568]handle_page_fault+0x10/0x30 [ 55.979197] Instruction dump: [ 55.979216] 3c82fefb 3884b138 48d9cff1 60000000 48000574 3c62fefb3863af78 3c82fefb [ 55.979277] 3884b138 48d9cfd5 60000000 e93f0100 <812902e4> 7d2907b45529063e 7d2a07b4 [ 55.979354] ---[ end trace 15f2510da5ae07cf ]--- Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
brianlilly
pushed a commit
to crystalfontz/cfa_10036_kernel
that referenced
this pull request
Mar 26, 2014
commit 4729583 upstream. I can trigger a lockdep warning: # mount -t cgroup -o cpuset xxx /cgroup # mkdir /cgroup/cpuset # mkdir /cgroup/tmp # echo 0 > /cgroup/tmp/cpuset.cpus # echo 0 > /cgroup/tmp/cpuset.mems # echo 1 > /cgroup/tmp/cpuset.memory_migrate # echo $$ > /cgroup/tmp/tasks # echo 1 > /cgruop/tmp/cpuset.mems =============================== [ INFO: suspicious RCU usage. ] 3.14.0-rc1-0.1-default+ torvalds#32 Not tainted ------------------------------- include/linux/cgroup.h:682 suspicious rcu_dereference_check() usage! ... [<ffffffff81582174>] dump_stack+0x72/0x86 [<ffffffff810b8f01>] lockdep_rcu_suspicious+0x101/0x140 [<ffffffff81105ba1>] cpuset_migrate_mm+0xb1/0xe0 ... We used to hold cgroup_mutex when calling cpuset_migrate_mm(), but now we hold cpuset_mutex, which causes task_css() to complain. This is not a false-positive but a real issue. Holding cpuset_mutex won't prevent a task from migrating to another cpuset, and it won't prevent the original task->cgroup from destroying during this change. Fixes: 5d21cc2 (cpuset: replace cgroup_mutex locking with cpuset internal locking) Signed-off-by: Li Zefan <lizefan@huawei.com> Sigend-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
swarren
pushed a commit
to swarren/linux-tegra
that referenced
this pull request
Apr 25, 2014
…ixes WARNING: please, no spaces at the start of a line torvalds#29: FILE: mm/memcontrol.c:689: + int nid = zone_to_nid(zone);$ WARNING: please, no spaces at the start of a line torvalds#30: FILE: mm/memcontrol.c:690: + int zid = zone_idx(zone);$ WARNING: please, no spaces at the start of a line torvalds#32: FILE: mm/memcontrol.c:692: + return mem_cgroup_zoneinfo(memcg, nid, zid);$ total: 0 errors, 3 warnings, 35 lines checked ./patches/mm-memcontrolc-introduce-helper-mem_cgroup_zoneinfo_zone.patch has style problems, please review. If any of these errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Jianyu Zhan <nasa4836@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ddstreet
pushed a commit
to ddstreet/linux
that referenced
this pull request
Apr 26, 2014
…ixes WARNING: please, no spaces at the start of a line torvalds#29: FILE: mm/memcontrol.c:689: + int nid = zone_to_nid(zone);$ WARNING: please, no spaces at the start of a line torvalds#30: FILE: mm/memcontrol.c:690: + int zid = zone_idx(zone);$ WARNING: please, no spaces at the start of a line torvalds#32: FILE: mm/memcontrol.c:692: + return mem_cgroup_zoneinfo(memcg, nid, zid);$ total: 0 errors, 3 warnings, 35 lines checked ./patches/mm-memcontrolc-introduce-helper-mem_cgroup_zoneinfo_zone.patch has style problems, please review. If any of these errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Jianyu Zhan <nasa4836@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
torvalds
pushed a commit
that referenced
this pull request
Apr 27, 2014
The asm-generic, big-endian version of zero_bytemask creates a mask of bytes preceding the first zero-byte by left shifting ~0ul based on the position of the first zero byte. Unfortunately, if the first (top) byte is zero, the output of prep_zero_mask has only the top bit set, resulting in undefined C behaviour as we shift left by an amount equal to the width of the type. As it happens, GCC doesn't manage to spot this through the call to fls(), but the issue remains if architectures choose to implement their shift instructions differently. An example would be arch/arm/ (AArch32), where LSL Rd, Rn, #32 results in Rd == 0x0, whilst on arch/arm64 (AArch64) LSL Xd, Xn, #64 results in Xd == Xn. Rather than check explicitly for the problematic shift, this patch adds an extra shift by 1, replacing fls with __fls. Since zero_bytemask is never called with a zero argument (has_zero() is used to check the data first), we don't need to worry about calling __fls(0), which is undefined. Cc: <stable@vger.kernel.org> Cc: Victor Kamensky <victor.kamensky@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
swarren
pushed a commit
to swarren/linux-tegra
that referenced
this pull request
Apr 28, 2014
…ixes WARNING: please, no spaces at the start of a line torvalds#29: FILE: mm/memcontrol.c:689: + int nid = zone_to_nid(zone);$ WARNING: please, no spaces at the start of a line torvalds#30: FILE: mm/memcontrol.c:690: + int zid = zone_idx(zone);$ WARNING: please, no spaces at the start of a line torvalds#32: FILE: mm/memcontrol.c:692: + return mem_cgroup_zoneinfo(memcg, nid, zid);$ total: 0 errors, 3 warnings, 35 lines checked ./patches/mm-memcontrolc-introduce-helper-mem_cgroup_zoneinfo_zone.patch has style problems, please review. If any of these errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Jianyu Zhan <nasa4836@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ddstreet
pushed a commit
to ddstreet/linux
that referenced
this pull request
May 2, 2014
…ixes WARNING: please, no spaces at the start of a line torvalds#29: FILE: mm/memcontrol.c:689: + int nid = zone_to_nid(zone);$ WARNING: please, no spaces at the start of a line torvalds#30: FILE: mm/memcontrol.c:690: + int zid = zone_idx(zone);$ WARNING: please, no spaces at the start of a line torvalds#32: FILE: mm/memcontrol.c:692: + return mem_cgroup_zoneinfo(memcg, nid, zid);$ total: 0 errors, 3 warnings, 35 lines checked ./patches/mm-memcontrolc-introduce-helper-mem_cgroup_zoneinfo_zone.patch has style problems, please review. If any of these errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Jianyu Zhan <nasa4836@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Dec 17, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/20251201120633.1193122-3-ruanjinjie@huawei.com
Link: https://lore.kernel.org/all/20251126071446.3234218-1-ruanjinjie@huawei.com/ [1]
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Charlie Jenkins <charlie@rivosinc.com>
Cc: Christian Zankel <chris@zankel.net>
Cc: "Dmitry V. Levin" <ldv@strace.io>
Cc: Helge Deller <deller@gmx.de>
Cc: Maciej W. Rozycki <macro@orcam.me.uk>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Dec 17, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/20251201120633.1193122-3-ruanjinjie@huawei.com
Link: https://lore.kernel.org/all/20251126071446.3234218-1-ruanjinjie@huawei.com/ [1]
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Charlie Jenkins <charlie@rivosinc.com>
Cc: Christian Zankel <chris@zankel.net>
Cc: "Dmitry V. Levin" <ldv@strace.io>
Cc: Helge Deller <deller@gmx.de>
Cc: Maciej W. Rozycki <macro@orcam.me.uk>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Dec 17, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/20251201120633.1193122-3-ruanjinjie@huawei.com
Link: https://lore.kernel.org/all/20251126071446.3234218-1-ruanjinjie@huawei.com/ [1]
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Charlie Jenkins <charlie@rivosinc.com>
Cc: Christian Zankel <chris@zankel.net>
Cc: "Dmitry V. Levin" <ldv@strace.io>
Cc: Helge Deller <deller@gmx.de>
Cc: Maciej W. Rozycki <macro@orcam.me.uk>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kaz205
pushed a commit
to Kaz205/linux
that referenced
this pull request
Dec 18, 2025
The cpuidle governor callbacks for update, select and reflect
are always running on the actual idle entering/exiting CPU, so
use the more optimized this_cpu_ptr() to access the internal teo
data.
This brings down the latency-critical teo_reflect() from
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffcff0: hint #0x19
ffffffc080ffcff4: stp x29, x30, [sp, #-48]!
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffcff8: adrp x2, ffffffc0848c0000 <gicv5_global_data+0x28>
{
ffffffc080ffcffc: add x29, sp, #0x0
ffffffc080ffd000: stp x19, x20, [sp, torvalds#16]
ffffffc080ffd004: orr x20, xzr, x0
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd008: add x0, x2, #0xc20
{
ffffffc080ffd00c: stp x21, x22, [sp, torvalds#32]
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd010: adrp x19, ffffffc083eb5000 <cpu_devices+0x78>
ffffffc080ffd014: add x19, x19, #0xbb0
ffffffc080ffd018: ldr w3, [x20, #4]
dev->last_state_idx = state;
to
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffd034: hint #0x19
ffffffc080ffd038: stp x29, x30, [sp, #-48]!
ffffffc080ffd03c: add x29, sp, #0x0
ffffffc080ffd040: stp x19, x20, [sp, torvalds#16]
ffffffc080ffd044: orr x20, xzr, x0
struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd048: adrp x19, ffffffc083eb5000 <cpu_devices+0x78>
{
ffffffc080ffd04c: stp x21, x22, [sp, torvalds#32]
struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd050: add x19, x19, #0xbb0
dev->last_state_idx = state;
This saves us:
adrp x2, ffffffc0848c0000 <gicv5_global_data+0x28>
add x0, x2, #0xc20
ldr w3, [x20, #4]
Signed-off-by: Christian Loehle <christian.loehle@arm.com>
[ rjw: Subject tweak ]
Link: https://patch.msgid.link/20251110120819.714560-1-christian.loehle@arm.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Dec 19, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/20251201120633.1193122-3-ruanjinjie@huawei.com
Link: https://lore.kernel.org/all/20251126071446.3234218-1-ruanjinjie@huawei.com/ [1]
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Charlie Jenkins <charlie@rivosinc.com>
Cc: Christian Zankel <chris@zankel.net>
Cc: "Dmitry V. Levin" <ldv@strace.io>
Cc: Helge Deller <deller@gmx.de>
Cc: Maciej W. Rozycki <macro@orcam.me.uk>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Dec 20, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/20251201120633.1193122-3-ruanjinjie@huawei.com
Link: https://lore.kernel.org/all/20251126071446.3234218-1-ruanjinjie@huawei.com/ [1]
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Charlie Jenkins <charlie@rivosinc.com>
Cc: Christian Zankel <chris@zankel.net>
Cc: "Dmitry V. Levin" <ldv@strace.io>
Cc: Helge Deller <deller@gmx.de>
Cc: Maciej W. Rozycki <macro@orcam.me.uk>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Dec 21, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/20251201120633.1193122-3-ruanjinjie@huawei.com
Link: https://lore.kernel.org/all/20251126071446.3234218-1-ruanjinjie@huawei.com/ [1]
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Charlie Jenkins <charlie@rivosinc.com>
Cc: Christian Zankel <chris@zankel.net>
Cc: "Dmitry V. Levin" <ldv@strace.io>
Cc: Helge Deller <deller@gmx.de>
Cc: Maciej W. Rozycki <macro@orcam.me.uk>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Dec 21, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/20251201120633.1193122-3-ruanjinjie@huawei.com
Link: https://lore.kernel.org/all/20251126071446.3234218-1-ruanjinjie@huawei.com/ [1]
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Charlie Jenkins <charlie@rivosinc.com>
Cc: Christian Zankel <chris@zankel.net>
Cc: "Dmitry V. Levin" <ldv@strace.io>
Cc: Helge Deller <deller@gmx.de>
Cc: Maciej W. Rozycki <macro@orcam.me.uk>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Dec 21, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/20251201120633.1193122-3-ruanjinjie@huawei.com
Link: https://lore.kernel.org/all/20251126071446.3234218-1-ruanjinjie@huawei.com/ [1]
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Charlie Jenkins <charlie@rivosinc.com>
Cc: Christian Zankel <chris@zankel.net>
Cc: "Dmitry V. Levin" <ldv@strace.io>
Cc: Helge Deller <deller@gmx.de>
Cc: Maciej W. Rozycki <macro@orcam.me.uk>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Dec 23, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/20251201120633.1193122-3-ruanjinjie@huawei.com
Link: https://lore.kernel.org/all/20251126071446.3234218-1-ruanjinjie@huawei.com/ [1]
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Charlie Jenkins <charlie@rivosinc.com>
Cc: Christian Zankel <chris@zankel.net>
Cc: "Dmitry V. Levin" <ldv@strace.io>
Cc: Helge Deller <deller@gmx.de>
Cc: Maciej W. Rozycki <macro@orcam.me.uk>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Dec 23, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/20251201120633.1193122-3-ruanjinjie@huawei.com
Link: https://lore.kernel.org/all/20251126071446.3234218-1-ruanjinjie@huawei.com/ [1]
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Charlie Jenkins <charlie@rivosinc.com>
Cc: Christian Zankel <chris@zankel.net>
Cc: "Dmitry V. Levin" <ldv@strace.io>
Cc: Helge Deller <deller@gmx.de>
Cc: Maciej W. Rozycki <macro@orcam.me.uk>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Dec 23, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/20251201120633.1193122-3-ruanjinjie@huawei.com
Link: https://lore.kernel.org/all/20251126071446.3234218-1-ruanjinjie@huawei.com/ [1]
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Charlie Jenkins <charlie@rivosinc.com>
Cc: Christian Zankel <chris@zankel.net>
Cc: "Dmitry V. Levin" <ldv@strace.io>
Cc: Helge Deller <deller@gmx.de>
Cc: Maciej W. Rozycki <macro@orcam.me.uk>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kaz205
pushed a commit
to Kaz205/linux
that referenced
this pull request
Dec 26, 2025
The cpuidle governor callbacks for update, select and reflect
are always running on the actual idle entering/exiting CPU, so
use the more optimized this_cpu_ptr() to access the internal teo
data.
This brings down the latency-critical teo_reflect() from
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffcff0: hint #0x19
ffffffc080ffcff4: stp x29, x30, [sp, #-48]!
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffcff8: adrp x2, ffffffc0848c0000 <gicv5_global_data+0x28>
{
ffffffc080ffcffc: add x29, sp, #0x0
ffffffc080ffd000: stp x19, x20, [sp, torvalds#16]
ffffffc080ffd004: orr x20, xzr, x0
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd008: add x0, x2, #0xc20
{
ffffffc080ffd00c: stp x21, x22, [sp, torvalds#32]
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd010: adrp x19, ffffffc083eb5000 <cpu_devices+0x78>
ffffffc080ffd014: add x19, x19, #0xbb0
ffffffc080ffd018: ldr w3, [x20, #4]
dev->last_state_idx = state;
to
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffd034: hint #0x19
ffffffc080ffd038: stp x29, x30, [sp, #-48]!
ffffffc080ffd03c: add x29, sp, #0x0
ffffffc080ffd040: stp x19, x20, [sp, torvalds#16]
ffffffc080ffd044: orr x20, xzr, x0
struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd048: adrp x19, ffffffc083eb5000 <cpu_devices+0x78>
{
ffffffc080ffd04c: stp x21, x22, [sp, torvalds#32]
struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd050: add x19, x19, #0xbb0
dev->last_state_idx = state;
This saves us:
adrp x2, ffffffc0848c0000 <gicv5_global_data+0x28>
add x0, x2, #0xc20
ldr w3, [x20, #4]
Signed-off-by: Christian Loehle <christian.loehle@arm.com>
[ rjw: Subject tweak ]
Link: https://patch.msgid.link/20251110120819.714560-1-christian.loehle@arm.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Kaz205
pushed a commit
to Kaz205/linux
that referenced
this pull request
Dec 28, 2025
The cpuidle governor callbacks for update, select and reflect
are always running on the actual idle entering/exiting CPU, so
use the more optimized this_cpu_ptr() to access the internal teo
data.
This brings down the latency-critical teo_reflect() from
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffcff0: hint #0x19
ffffffc080ffcff4: stp x29, x30, [sp, #-48]!
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffcff8: adrp x2, ffffffc0848c0000 <gicv5_global_data+0x28>
{
ffffffc080ffcffc: add x29, sp, #0x0
ffffffc080ffd000: stp x19, x20, [sp, torvalds#16]
ffffffc080ffd004: orr x20, xzr, x0
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd008: add x0, x2, #0xc20
{
ffffffc080ffd00c: stp x21, x22, [sp, torvalds#32]
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd010: adrp x19, ffffffc083eb5000 <cpu_devices+0x78>
ffffffc080ffd014: add x19, x19, #0xbb0
ffffffc080ffd018: ldr w3, [x20, #4]
dev->last_state_idx = state;
to
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffd034: hint #0x19
ffffffc080ffd038: stp x29, x30, [sp, #-48]!
ffffffc080ffd03c: add x29, sp, #0x0
ffffffc080ffd040: stp x19, x20, [sp, torvalds#16]
ffffffc080ffd044: orr x20, xzr, x0
struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd048: adrp x19, ffffffc083eb5000 <cpu_devices+0x78>
{
ffffffc080ffd04c: stp x21, x22, [sp, torvalds#32]
struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd050: add x19, x19, #0xbb0
dev->last_state_idx = state;
This saves us:
adrp x2, ffffffc0848c0000 <gicv5_global_data+0x28>
add x0, x2, #0xc20
ldr w3, [x20, #4]
Signed-off-by: Christian Loehle <christian.loehle@arm.com>
[ rjw: Subject tweak ]
Link: https://patch.msgid.link/20251110120819.714560-1-christian.loehle@arm.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Kaz205
pushed a commit
to Kaz205/linux
that referenced
this pull request
Dec 28, 2025
The cpuidle governor callbacks for update, select and reflect
are always running on the actual idle entering/exiting CPU, so
use the more optimized this_cpu_ptr() to access the internal teo
data.
This brings down the latency-critical teo_reflect() from
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffcff0: hint #0x19
ffffffc080ffcff4: stp x29, x30, [sp, #-48]!
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffcff8: adrp x2, ffffffc0848c0000 <gicv5_global_data+0x28>
{
ffffffc080ffcffc: add x29, sp, #0x0
ffffffc080ffd000: stp x19, x20, [sp, torvalds#16]
ffffffc080ffd004: orr x20, xzr, x0
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd008: add x0, x2, #0xc20
{
ffffffc080ffd00c: stp x21, x22, [sp, torvalds#32]
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd010: adrp x19, ffffffc083eb5000 <cpu_devices+0x78>
ffffffc080ffd014: add x19, x19, #0xbb0
ffffffc080ffd018: ldr w3, [x20, #4]
dev->last_state_idx = state;
to
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffd034: hint #0x19
ffffffc080ffd038: stp x29, x30, [sp, #-48]!
ffffffc080ffd03c: add x29, sp, #0x0
ffffffc080ffd040: stp x19, x20, [sp, torvalds#16]
ffffffc080ffd044: orr x20, xzr, x0
struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd048: adrp x19, ffffffc083eb5000 <cpu_devices+0x78>
{
ffffffc080ffd04c: stp x21, x22, [sp, torvalds#32]
struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd050: add x19, x19, #0xbb0
dev->last_state_idx = state;
This saves us:
adrp x2, ffffffc0848c0000 <gicv5_global_data+0x28>
add x0, x2, #0xc20
ldr w3, [x20, #4]
Signed-off-by: Christian Loehle <christian.loehle@arm.com>
[ rjw: Subject tweak ]
Link: https://patch.msgid.link/20251110120819.714560-1-christian.loehle@arm.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Dec 29, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/20251201120633.1193122-3-ruanjinjie@huawei.com
Link: https://lore.kernel.org/all/20251126071446.3234218-1-ruanjinjie@huawei.com/ [1]
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Charlie Jenkins <charlie@rivosinc.com>
Cc: Christian Zankel <chris@zankel.net>
Cc: "Dmitry V. Levin" <ldv@strace.io>
Cc: Helge Deller <deller@gmx.de>
Cc: Maciej W. Rozycki <macro@orcam.me.uk>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Dec 30, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/20251201120633.1193122-3-ruanjinjie@huawei.com
Link: https://lore.kernel.org/all/20251126071446.3234218-1-ruanjinjie@huawei.com/ [1]
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Charlie Jenkins <charlie@rivosinc.com>
Cc: Christian Zankel <chris@zankel.net>
Cc: "Dmitry V. Levin" <ldv@strace.io>
Cc: Helge Deller <deller@gmx.de>
Cc: Maciej W. Rozycki <macro@orcam.me.uk>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Dec 30, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/20251201120633.1193122-3-ruanjinjie@huawei.com
Link: https://lore.kernel.org/all/20251126071446.3234218-1-ruanjinjie@huawei.com/ [1]
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Charlie Jenkins <charlie@rivosinc.com>
Cc: Christian Zankel <chris@zankel.net>
Cc: "Dmitry V. Levin" <ldv@strace.io>
Cc: Helge Deller <deller@gmx.de>
Cc: Maciej W. Rozycki <macro@orcam.me.uk>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Jan 1, 2026
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/20251201120633.1193122-3-ruanjinjie@huawei.com
Link: https://lore.kernel.org/all/20251126071446.3234218-1-ruanjinjie@huawei.com/ [1]
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Charlie Jenkins <charlie@rivosinc.com>
Cc: Christian Zankel <chris@zankel.net>
Cc: "Dmitry V. Levin" <ldv@strace.io>
Cc: Helge Deller <deller@gmx.de>
Cc: Maciej W. Rozycki <macro@orcam.me.uk>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
hellsgod
pushed a commit
to hellsgod/linux
that referenced
this pull request
Jan 2, 2026
The cpuidle governor callbacks for update, select and reflect
are always running on the actual idle entering/exiting CPU, so
use the more optimized this_cpu_ptr() to access the internal teo
data.
This brings down the latency-critical teo_reflect() from
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffcff0: hint #0x19
ffffffc080ffcff4: stp x29, x30, [sp, #-48]!
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffcff8: adrp x2, ffffffc0848c0000 <gicv5_global_data+0x28>
{
ffffffc080ffcffc: add x29, sp, #0x0
ffffffc080ffd000: stp x19, x20, [sp, torvalds#16]
ffffffc080ffd004: orr x20, xzr, x0
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd008: add x0, x2, #0xc20
{
ffffffc080ffd00c: stp x21, x22, [sp, torvalds#32]
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd010: adrp x19, ffffffc083eb5000 <cpu_devices+0x78>
ffffffc080ffd014: add x19, x19, #0xbb0
ffffffc080ffd018: ldr w3, [x20, #4]
dev->last_state_idx = state;
to
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffd034: hint #0x19
ffffffc080ffd038: stp x29, x30, [sp, #-48]!
ffffffc080ffd03c: add x29, sp, #0x0
ffffffc080ffd040: stp x19, x20, [sp, torvalds#16]
ffffffc080ffd044: orr x20, xzr, x0
struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd048: adrp x19, ffffffc083eb5000 <cpu_devices+0x78>
{
ffffffc080ffd04c: stp x21, x22, [sp, torvalds#32]
struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd050: add x19, x19, #0xbb0
dev->last_state_idx = state;
This saves us:
adrp x2, ffffffc0848c0000 <gicv5_global_data+0x28>
add x0, x2, #0xc20
ldr w3, [x20, #4]
Signed-off-by: Christian Loehle <christian.loehle@arm.com>
[ rjw: Subject tweak ]
Link: https://patch.msgid.link/20251110120819.714560-1-christian.loehle@arm.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Kaz205
pushed a commit
to Kaz205/linux
that referenced
this pull request
Jan 2, 2026
The cpuidle governor callbacks for update, select and reflect
are always running on the actual idle entering/exiting CPU, so
use the more optimized this_cpu_ptr() to access the internal teo
data.
This brings down the latency-critical teo_reflect() from
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffcff0: hint #0x19
ffffffc080ffcff4: stp x29, x30, [sp, #-48]!
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffcff8: adrp x2, ffffffc0848c0000 <gicv5_global_data+0x28>
{
ffffffc080ffcffc: add x29, sp, #0x0
ffffffc080ffd000: stp x19, x20, [sp, torvalds#16]
ffffffc080ffd004: orr x20, xzr, x0
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd008: add x0, x2, #0xc20
{
ffffffc080ffd00c: stp x21, x22, [sp, torvalds#32]
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd010: adrp x19, ffffffc083eb5000 <cpu_devices+0x78>
ffffffc080ffd014: add x19, x19, #0xbb0
ffffffc080ffd018: ldr w3, [x20, #4]
dev->last_state_idx = state;
to
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffd034: hint #0x19
ffffffc080ffd038: stp x29, x30, [sp, #-48]!
ffffffc080ffd03c: add x29, sp, #0x0
ffffffc080ffd040: stp x19, x20, [sp, torvalds#16]
ffffffc080ffd044: orr x20, xzr, x0
struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd048: adrp x19, ffffffc083eb5000 <cpu_devices+0x78>
{
ffffffc080ffd04c: stp x21, x22, [sp, torvalds#32]
struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd050: add x19, x19, #0xbb0
dev->last_state_idx = state;
This saves us:
adrp x2, ffffffc0848c0000 <gicv5_global_data+0x28>
add x0, x2, #0xc20
ldr w3, [x20, #4]
Signed-off-by: Christian Loehle <christian.loehle@arm.com>
[ rjw: Subject tweak ]
Link: https://patch.msgid.link/20251110120819.714560-1-christian.loehle@arm.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Jan 2, 2026
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/20251201120633.1193122-3-ruanjinjie@huawei.com
Link: https://lore.kernel.org/all/20251126071446.3234218-1-ruanjinjie@huawei.com/ [1]
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Charlie Jenkins <charlie@rivosinc.com>
Cc: Christian Zankel <chris@zankel.net>
Cc: "Dmitry V. Levin" <ldv@strace.io>
Cc: Helge Deller <deller@gmx.de>
Cc: Maciej W. Rozycki <macro@orcam.me.uk>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
intel-lab-lkp
pushed a commit
to intel-lab-lkp/linux
that referenced
this pull request
Jan 4, 2026
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/20251201120633.1193122-3-ruanjinjie@huawei.com
Link: https://lore.kernel.org/all/20251126071446.3234218-1-ruanjinjie@huawei.com/ [1]
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Charlie Jenkins <charlie@rivosinc.com>
Cc: Christian Zankel <chris@zankel.net>
Cc: "Dmitry V. Levin" <ldv@strace.io>
Cc: Helge Deller <deller@gmx.de>
Cc: Maciej W. Rozycki <macro@orcam.me.uk>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
morimoto
pushed a commit
to morimoto/linux
that referenced
this pull request
Jan 5, 2026
The cpuidle governor callbacks for update, select and reflect
are always running on the actual idle entering/exiting CPU, so
use the more optimized this_cpu_ptr() to access the internal teo
data.
This brings down the latency-critical teo_reflect() from
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffcff0: hint #0x19
ffffffc080ffcff4: stp x29, x30, [sp, #-48]!
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffcff8: adrp x2, ffffffc0848c0000 <gicv5_global_data+0x28>
{
ffffffc080ffcffc: add x29, sp, #0x0
ffffffc080ffd000: stp x19, x20, [sp, torvalds#16]
ffffffc080ffd004: orr x20, xzr, x0
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd008: add x0, x2, #0xc20
{
ffffffc080ffd00c: stp x21, x22, [sp, torvalds#32]
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd010: adrp x19, ffffffc083eb5000 <cpu_devices+0x78>
ffffffc080ffd014: add x19, x19, #0xbb0
ffffffc080ffd018: ldr w3, [x20, #4]
dev->last_state_idx = state;
to
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffd034: hint #0x19
ffffffc080ffd038: stp x29, x30, [sp, #-48]!
ffffffc080ffd03c: add x29, sp, #0x0
ffffffc080ffd040: stp x19, x20, [sp, torvalds#16]
ffffffc080ffd044: orr x20, xzr, x0
struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd048: adrp x19, ffffffc083eb5000 <cpu_devices+0x78>
{
ffffffc080ffd04c: stp x21, x22, [sp, torvalds#32]
struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd050: add x19, x19, #0xbb0
dev->last_state_idx = state;
This saves us:
adrp x2, ffffffc0848c0000 <gicv5_global_data+0x28>
add x0, x2, #0xc20
ldr w3, [x20, #4]
Signed-off-by: Christian Loehle <christian.loehle@arm.com>
[ rjw: Subject tweak ]
Link: https://patch.msgid.link/20251110120819.714560-1-christian.loehle@arm.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 0796ddf
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next)
Bug: 450671466
Change-Id: Icb6faa509da6dc282270d763f763bc943d461119
Signed-off-by: Reka Norman <rekanorman@google.com>
morimoto
pushed a commit
to morimoto/linux
that referenced
this pull request
Jan 5, 2026
The cpuidle governor callbacks for update, select and reflect
are always running on the actual idle entering/exiting CPU, so
use the more optimized this_cpu_ptr() to access the internal teo
data.
This brings down the latency-critical teo_reflect() from
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffcff0: hint #0x19
ffffffc080ffcff4: stp x29, x30, [sp, #-48]!
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffcff8: adrp x2, ffffffc0848c0000 <gicv5_global_data+0x28>
{
ffffffc080ffcffc: add x29, sp, #0x0
ffffffc080ffd000: stp x19, x20, [sp, torvalds#16]
ffffffc080ffd004: orr x20, xzr, x0
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd008: add x0, x2, #0xc20
{
ffffffc080ffd00c: stp x21, x22, [sp, torvalds#32]
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd010: adrp x19, ffffffc083eb5000 <cpu_devices+0x78>
ffffffc080ffd014: add x19, x19, #0xbb0
ffffffc080ffd018: ldr w3, [x20, #4]
dev->last_state_idx = state;
to
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffd034: hint #0x19
ffffffc080ffd038: stp x29, x30, [sp, #-48]!
ffffffc080ffd03c: add x29, sp, #0x0
ffffffc080ffd040: stp x19, x20, [sp, torvalds#16]
ffffffc080ffd044: orr x20, xzr, x0
struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd048: adrp x19, ffffffc083eb5000 <cpu_devices+0x78>
{
ffffffc080ffd04c: stp x21, x22, [sp, torvalds#32]
struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd050: add x19, x19, #0xbb0
dev->last_state_idx = state;
This saves us:
adrp x2, ffffffc0848c0000 <gicv5_global_data+0x28>
add x0, x2, #0xc20
ldr w3, [x20, #4]
Signed-off-by: Christian Loehle <christian.loehle@arm.com>
[ rjw: Subject tweak ]
Link: https://patch.msgid.link/20251110120819.714560-1-christian.loehle@arm.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 0796ddf
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next)
Bug: 450671466
Change-Id: Icb6faa509da6dc282270d763f763bc943d461119
Signed-off-by: Reka Norman <rekanorman@google.com>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Jan 6, 2026
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/20251201120633.1193122-3-ruanjinjie@huawei.com
Link: https://lore.kernel.org/all/20251126071446.3234218-1-ruanjinjie@huawei.com/ [1]
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Charlie Jenkins <charlie@rivosinc.com>
Cc: Christian Zankel <chris@zankel.net>
Cc: "Dmitry V. Levin" <ldv@strace.io>
Cc: Helge Deller <deller@gmx.de>
Cc: Maciej W. Rozycki <macro@orcam.me.uk>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Jan 6, 2026
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/20251201120633.1193122-3-ruanjinjie@huawei.com
Link: https://lore.kernel.org/all/20251126071446.3234218-1-ruanjinjie@huawei.com/ [1]
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Charlie Jenkins <charlie@rivosinc.com>
Cc: Christian Zankel <chris@zankel.net>
Cc: "Dmitry V. Levin" <ldv@strace.io>
Cc: Helge Deller <deller@gmx.de>
Cc: Maciej W. Rozycki <macro@orcam.me.uk>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
intel-lab-lkp
pushed a commit
to intel-lab-lkp/linux
that referenced
this pull request
Jan 6, 2026
Do not use memcpy() to extract syscall arguments from struct pt_regs
but rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments()
and syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance
benchmarks from perf bench basic syscall on kunpeng920 gives roughly
a 1% performance uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
[1]: https://lore.kernel.org/all/20251126071446.3234218-1-ruanjinjie@huawei.com/
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Signed-off-by: Will Deacon <will@kernel.org>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Jan 6, 2026
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/20251201120633.1193122-3-ruanjinjie@huawei.com
Link: https://lore.kernel.org/all/20251126071446.3234218-1-ruanjinjie@huawei.com/ [1]
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Charlie Jenkins <charlie@rivosinc.com>
Cc: Christian Zankel <chris@zankel.net>
Cc: "Dmitry V. Levin" <ldv@strace.io>
Cc: Helge Deller <deller@gmx.de>
Cc: Maciej W. Rozycki <macro@orcam.me.uk>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kaz205
pushed a commit
to Kaz205/linux
that referenced
this pull request
Jan 7, 2026
The cpuidle governor callbacks for update, select and reflect
are always running on the actual idle entering/exiting CPU, so
use the more optimized this_cpu_ptr() to access the internal teo
data.
This brings down the latency-critical teo_reflect() from
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffcff0: hint #0x19
ffffffc080ffcff4: stp x29, x30, [sp, #-48]!
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffcff8: adrp x2, ffffffc0848c0000 <gicv5_global_data+0x28>
{
ffffffc080ffcffc: add x29, sp, #0x0
ffffffc080ffd000: stp x19, x20, [sp, torvalds#16]
ffffffc080ffd004: orr x20, xzr, x0
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd008: add x0, x2, #0xc20
{
ffffffc080ffd00c: stp x21, x22, [sp, torvalds#32]
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd010: adrp x19, ffffffc083eb5000 <cpu_devices+0x78>
ffffffc080ffd014: add x19, x19, #0xbb0
ffffffc080ffd018: ldr w3, [x20, #4]
dev->last_state_idx = state;
to
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffd034: hint #0x19
ffffffc080ffd038: stp x29, x30, [sp, #-48]!
ffffffc080ffd03c: add x29, sp, #0x0
ffffffc080ffd040: stp x19, x20, [sp, torvalds#16]
ffffffc080ffd044: orr x20, xzr, x0
struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd048: adrp x19, ffffffc083eb5000 <cpu_devices+0x78>
{
ffffffc080ffd04c: stp x21, x22, [sp, torvalds#32]
struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd050: add x19, x19, #0xbb0
dev->last_state_idx = state;
This saves us:
adrp x2, ffffffc0848c0000 <gicv5_global_data+0x28>
add x0, x2, #0xc20
ldr w3, [x20, #4]
Signed-off-by: Christian Loehle <christian.loehle@arm.com>
[ rjw: Subject tweak ]
Link: https://patch.msgid.link/20251110120819.714560-1-christian.loehle@arm.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.