[RFC] net: vertexcom: mse102x: Fix SPI interrupt handling #10

lategoodbye · 2025-01-23T10:29:30Z

This series fixes the handling of the SPI interrupt in the MSE102x driver. The current solution is consider as a request for comments.

Additional it improves the related debug capabilites by introducing an optional DT property to use the interrupt line as GPIO (currently not intended for upstream).

It's hard to debug the SPI interrupt level, so implement the possibility to allocate the SPI interrupt as a GPIO. This allows to debug the current level with the common tools. Signed-off-by: Stefan Wahren <stefan.wahren@chargebyte.com>

This allows to debug the SPI interrupt level. Signed-off-by: Stefan Wahren <stefan.wahren@chargebyte.com>

Since the SPI implementation on the MSE102x MCU is in software, it cannot reply to SPI commands in busy state. So drop the scaring statistics about "invalid" command replies. Signed-off-by: Stefan Wahren <stefan.wahren@chargebyte.com>

The MSE102x doesn't provide any SPI commands for interrupt handling. So in case the interrupt fired before the driver requests the IRQ, the interrupt will never fire again. In order to fix this always poll for pending packets after opening the interface. Fixes: 2f207cb ("net: vertexcom: Add MSE102x SPI support") Signed-off-by: Stefan Wahren <stefan.wahren@chargebyte.com>

According to the MSE102x documentation the trigger type is a high level. Signed-off-by: Stefan Wahren <stefan.wahren@chargebyte.com>

Since there is no protection in the SPI protocol against electrical interferences, the driver shouldn't blindly trust the length payload of CMD_RTS. So do at least a lower bounds check for incoming frames. Signed-off-by: Stefan Wahren <stefan.wahren@chargebyte.com>

According to the MSE102x documentation the trigger type is a high level. Fixes: 2717566 ("dt-bindings: net: add Vertexcom MSE102x support") Signed-off-by: Stefan Wahren <stefan.wahren@chargebyte.com>

Since mse102x_rx_pkt_spi is also used for polling and the SPI IRQ type is level based, there is actually no need for a retry mechanism within mse102x_rx_pkt_spi. So drop it and simplify the receive path. Signed-off-by: Stefan Wahren <stefan.wahren@chargebyte.com>

The example of the initial DT binding of the Vertexcom MSE 102x suggested a IRQ_TYPE_EDGE_RISING, which is wrong. So warn everyone to fix their device tree. Signed-off-by: Stefan Wahren <stefan.wahren@chargebyte.com>

lategoodbye · 2025-01-27T15:12:41Z

range check for CMD_RTS has been adjusted (no upper bounds check)

drivers/net/ethernet/vertexcom/mse102x.c

In case the CMD_RTS got corrupted by interferences, the MSE102x doesn't allow a retransmission of the command. Instead the Ethernet frame must be shifted of the SPI FIFO. Since the actual length is unknown, assume the maximum possible value. Fixes: 2717566 ("dt-bindings: net: add Vertexcom MSE102x support") Signed-off-by: Stefan Wahren <stefan.wahren@chargebyte.com>

Introduce a upper bounds check for incoming frames in order to catch invalid CMD_RTS. Signed-off-by: Stefan Wahren <stefan.wahren@chargebyte.com>

[ Upstream commit 88f7f56 ] When a bio with REQ_PREFLUSH is submitted to dm, __send_empty_flush() generates a flush_bio with REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC, which causes the flush_bio to be throttled by wbt_wait(). An example from v5.4, similar problem also exists in upstream: crash> bt 2091206 PID: 2091206 TASK: ffff2050df92a300 CPU: 109 COMMAND: "kworker/u260:0" #0 [ffff800084a2f7f0] __switch_to at ffff80004008aeb8 #1 [ffff800084a2f820] __schedule at ffff800040bfa0c4 #2 [ffff800084a2f880] schedule at ffff800040bfa4b4 #3 [ffff800084a2f8a0] io_schedule at ffff800040bfa9c4 #4 [ffff800084a2f8c0] rq_qos_wait at ffff8000405925bc #5 [ffff800084a2f940] wbt_wait at ffff8000405bb3a0 #6 [ffff800084a2f9a0] __rq_qos_throttle at ffff800040592254 #7 [ffff800084a2f9c0] blk_mq_make_request at ffff80004057cf38 #8 [ffff800084a2fa60] generic_make_request at ffff800040570138 #9 [ffff800084a2fae0] submit_bio at ffff8000405703b4 #10 [ffff800084a2fb50] xlog_write_iclog at ffff800001280834 [xfs] #11 [ffff800084a2fbb0] xlog_sync at ffff800001280c3c [xfs] #12 [ffff800084a2fbf0] xlog_state_release_iclog at ffff800001280df4 [xfs] #13 [ffff800084a2fc10] xlog_write at ffff80000128203c [xfs] #14 [ffff800084a2fcd0] xlog_cil_push at ffff8000012846dc [xfs] #15 [ffff800084a2fda0] xlog_cil_push_work at ffff800001284a2c [xfs] #16 [ffff800084a2fdb0] process_one_work at ffff800040111d08 #17 [ffff800084a2fe00] worker_thread at ffff8000401121cc #18 [ffff800084a2fe70] kthread at ffff800040118de4 After commit 2def284 ("xfs: don't allow log IO to be throttled"), the metadata submitted by xlog_write_iclog() should not be throttled. But due to the existence of the dm layer, throttling flush_bio indirectly causes the metadata bio to be throttled. Fix this by conditionally adding REQ_IDLE to flush_bio.bi_opf, which makes wbt_should_throttle() return false to avoid wbt_wait(). Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Tianxiang Peng <txpeng@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit bed18f0 ] ACPICA commit 8829e70e1360c81e7a5a901b5d4f48330e021ea5 I'm Seunghun Han, and I work for National Security Research Institute of South Korea. I have been doing a research on ACPI and found an ACPI cache leak in ACPI early abort cases. Boot log of ACPI cache leak is as follows: [ 0.352414] ACPI: Added _OSI(Module Device) [ 0.353182] ACPI: Added _OSI(Processor Device) [ 0.353182] ACPI: Added _OSI(3.0 _SCP Extensions) [ 0.353182] ACPI: Added _OSI(Processor Aggregator Device) [ 0.356028] ACPI: Unable to start the ACPI Interpreter [ 0.356799] ACPI Error: Could not remove SCI handler (20170303/evmisc-281) [ 0.360215] kmem_cache_destroy Acpi-State: Slab cache still has objects [ 0.360648] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 4.12.0-rc4-next-20170608+ #10 [ 0.361273] Hardware name: innotek gmb_h virtual_box/virtual_box, BIOS virtual_box 12/01/2006 [ 0.361873] Call Trace: [ 0.362243] ? dump_stack+0x5c/0x81 [ 0.362591] ? kmem_cache_destroy+0x1aa/0x1c0 [ 0.362944] ? acpi_sleep_proc_init+0x27/0x27 [ 0.363296] ? acpi_os_delete_cache+0xa/0x10 [ 0.363646] ? acpi_ut_delete_caches+0x6d/0x7b [ 0.364000] ? acpi_terminate+0xa/0x14 [ 0.364000] ? acpi_init+0x2af/0x34f [ 0.364000] ? __class_create+0x4c/0x80 [ 0.364000] ? video_setup+0x7f/0x7f [ 0.364000] ? acpi_sleep_proc_init+0x27/0x27 [ 0.364000] ? do_one_initcall+0x4e/0x1a0 [ 0.364000] ? kernel_init_freeable+0x189/0x20a [ 0.364000] ? rest_init+0xc0/0xc0 [ 0.364000] ? kernel_init+0xa/0x100 [ 0.364000] ? ret_from_fork+0x25/0x30 I analyzed this memory leak in detail. I found that “Acpi-State” cache and “Acpi-Parse” cache were merged because the size of cache objects was same slab cache size. I finally found “Acpi-Parse” cache and “Acpi-parse_ext” cache were leaked using SLAB_NEVER_MERGE flag in kmem_cache_create() function. Real ACPI cache leak point is as follows: [ 0.360101] ACPI: Added _OSI(Module Device) [ 0.360101] ACPI: Added _OSI(Processor Device) [ 0.360101] ACPI: Added _OSI(3.0 _SCP Extensions) [ 0.361043] ACPI: Added _OSI(Processor Aggregator Device) [ 0.364016] ACPI: Unable to start the ACPI Interpreter [ 0.365061] ACPI Error: Could not remove SCI handler (20170303/evmisc-281) [ 0.368174] kmem_cache_destroy Acpi-Parse: Slab cache still has objects [ 0.369332] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G W 4.12.0-rc4-next-20170608+ #8 [ 0.371256] Hardware name: innotek gmb_h virtual_box/virtual_box, BIOS virtual_box 12/01/2006 [ 0.372000] Call Trace: [ 0.372000] ? dump_stack+0x5c/0x81 [ 0.372000] ? kmem_cache_destroy+0x1aa/0x1c0 [ 0.372000] ? acpi_sleep_proc_init+0x27/0x27 [ 0.372000] ? acpi_os_delete_cache+0xa/0x10 [ 0.372000] ? acpi_ut_delete_caches+0x56/0x7b [ 0.372000] ? acpi_terminate+0xa/0x14 [ 0.372000] ? acpi_init+0x2af/0x34f [ 0.372000] ? __class_create+0x4c/0x80 [ 0.372000] ? video_setup+0x7f/0x7f [ 0.372000] ? acpi_sleep_proc_init+0x27/0x27 [ 0.372000] ? do_one_initcall+0x4e/0x1a0 [ 0.372000] ? kernel_init_freeable+0x189/0x20a [ 0.372000] ? rest_init+0xc0/0xc0 [ 0.372000] ? kernel_init+0xa/0x100 [ 0.372000] ? ret_from_fork+0x25/0x30 [ 0.388039] kmem_cache_destroy Acpi-parse_ext: Slab cache still has objects [ 0.389063] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G W 4.12.0-rc4-next-20170608+ #8 [ 0.390557] Hardware name: innotek gmb_h virtual_box/virtual_box, BIOS virtual_box 12/01/2006 [ 0.392000] Call Trace: [ 0.392000] ? dump_stack+0x5c/0x81 [ 0.392000] ? kmem_cache_destroy+0x1aa/0x1c0 [ 0.392000] ? acpi_sleep_proc_init+0x27/0x27 [ 0.392000] ? acpi_os_delete_cache+0xa/0x10 [ 0.392000] ? acpi_ut_delete_caches+0x6d/0x7b [ 0.392000] ? acpi_terminate+0xa/0x14 [ 0.392000] ? acpi_init+0x2af/0x34f [ 0.392000] ? __class_create+0x4c/0x80 [ 0.392000] ? video_setup+0x7f/0x7f [ 0.392000] ? acpi_sleep_proc_init+0x27/0x27 [ 0.392000] ? do_one_initcall+0x4e/0x1a0 [ 0.392000] ? kernel_init_freeable+0x189/0x20a [ 0.392000] ? rest_init+0xc0/0xc0 [ 0.392000] ? kernel_init+0xa/0x100 [ 0.392000] ? ret_from_fork+0x25/0x30 When early abort is occurred due to invalid ACPI information, Linux kernel terminates ACPI by calling acpi_terminate() function. The function calls acpi_ut_delete_caches() function to delete local caches (acpi_gbl_namespace_ cache, state_cache, operand_cache, ps_node_cache, ps_node_ext_cache). But the deletion codes in acpi_ut_delete_caches() function only delete slab caches using kmem_cache_destroy() function, therefore the cache objects should be flushed before acpi_ut_delete_caches() function. "Acpi-Parse" cache and "Acpi-ParseExt" cache are used in an AML parse function, acpi_ps_parse_loop(). The function should complete all ops using acpi_ps_complete_final_op() when an error occurs due to invalid AML codes. However, the current implementation of acpi_ps_complete_final_op() does not complete all ops when it meets some errors and this cause cache leak. This cache leak has a security threat because an old kernel (<= 4.9) shows memory locations of kernel functions in stack dump. Some malicious users could use this information to neutralize kernel ASLR. To fix ACPI cache leak for enhancing security, I made a patch to complete all ops unconditionally for acpi_ps_complete_final_op() function. I hope that this patch improves the security of Linux kernel. Thank you. Link: acpica/acpica@8829e70e Signed-off-by: Seunghun Han <kkamagui@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/2363774.ElGaqSPkdT@rjwysocki.net Signed-off-by: Sasha Levin <sashal@kernel.org>

lategoodbye added 5 commits January 22, 2025 11:33

arm64: dts: imx93-charge-som-dc-evb: Switch MSE102x to GPIO interrupt

19ecd2a

This allows to debug the SPI interrupt level. Signed-off-by: Stefan Wahren <stefan.wahren@chargebyte.com>

net: vertexcom: mse102x: drop invalid cmd stats

9f8a69e

Since the SPI implementation on the MSE102x MCU is in software, it cannot reply to SPI commands in busy state. So drop the scaring statistics about "invalid" command replies. Signed-off-by: Stefan Wahren <stefan.wahren@chargebyte.com>

net: vertexcom: mse102x: Fix SPI IRQ type

33af733

According to the MSE102x documentation the trigger type is a high level. Signed-off-by: Stefan Wahren <stefan.wahren@chargebyte.com>

lategoodbye requested a review from mhei January 23, 2025 10:29

lategoodbye added 4 commits January 27, 2025 16:06

dt-bindings: vertexcom-mse102x: Fix IRQ type

0801a88

According to the MSE102x documentation the trigger type is a high level. Fixes: 2717566 ("dt-bindings: net: add Vertexcom MSE102x support") Signed-off-by: Stefan Wahren <stefan.wahren@chargebyte.com>

net: vertexcom: mse102x: Add warning about IRQ trigger type

f55250a

The example of the initial DT binding of the Vertexcom MSE 102x suggested a IRQ_TYPE_EDGE_RISING, which is wrong. So warn everyone to fix their device tree. Signed-off-by: Stefan Wahren <stefan.wahren@chargebyte.com>

lategoodbye force-pushed the v6.6.23-vertexcom_int branch from 658e45c to f55250a Compare January 27, 2025 15:08

mhei requested changes Feb 20, 2025

View reviewed changes

drivers/net/ethernet/vertexcom/mse102x.c Outdated Show resolved Hide resolved

drivers/net/ethernet/vertexcom/mse102x.c Outdated Show resolved Hide resolved

lategoodbye added 2 commits February 20, 2025 14:17

net: vertexcom: mse102x: Extend range check for CMD_RTS

9be3cbf

Introduce a upper bounds check for incoming frames in order to catch invalid CMD_RTS. Signed-off-by: Stefan Wahren <stefan.wahren@chargebyte.com>

lategoodbye force-pushed the v6.6.23-vertexcom_int branch from ea55f7e to 9be3cbf Compare February 20, 2025 13:20

mhei approved these changes Feb 20, 2025

View reviewed changes

lategoodbye merged commit dd76b38 into v6.6.23-2.0.0-phy-cb Feb 20, 2025

lategoodbye deleted the v6.6.23-vertexcom_int branch February 20, 2025 14:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] net: vertexcom: mse102x: Fix SPI interrupt handling #10

[RFC] net: vertexcom: mse102x: Fix SPI interrupt handling #10

Uh oh!

lategoodbye commented Jan 23, 2025

Uh oh!

lategoodbye commented Jan 27, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[RFC] net: vertexcom: mse102x: Fix SPI interrupt handling #10

[RFC] net: vertexcom: mse102x: Fix SPI interrupt handling #10

Uh oh!

Conversation

lategoodbye commented Jan 23, 2025

Uh oh!

lategoodbye commented Jan 27, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants