Critical qca_spi fixes and some improvements #3

lategoodbye · 2023-10-23T09:33:54Z

This pull request contains some critical fixes and some minor improvements on the qca_spi driver.

drivers/net/ethernet/qualcomm/qca_debug.c

drivers/net/ethernet/qualcomm/qca_spi.h

drivers/net/ethernet/qualcomm/qca_spi.c

drivers/net/ethernet/qualcomm/qca_7k.c

The qca_spi driver create/stop the SPI kernel thread in case of netdev_open/close. This is a big issue because it allows userspace to prevent from restarting the SPI thread after ring parameter changes (e.g. signals which stop the thread). This could be done by terminating a script which changes the ring parameter in a loop. So fix this by moving create/stop of the SPI kernel into the init/uninit ops. The open/close ops could be realized just by 'park/unpark' the SPI kernel thread. Fixes: 291ab06 ("net: qualcomm: new Ethernet over SPI driver for QCA7000") Signed-off-by: Stefan Wahren <wahrenst@gmx.net>

The functions qcaspi_netdev_open/close are responsible of request & free of the SPI interrupt, which wasn't the best choice. Currently it's possible to trigger a double free of the interrupt by calling qcaspi_netdev_close() after qcaspi_netdev_open() has failed. So let us split IRQ allocation & enabling, so we can take advantage of a device managed IRQ and also fix the issue. Fixes: 291ab06 ("net: qualcomm: new Ethernet over SPI driver for QCA7000") Signed-off-by: Stefan Wahren <wahrenst@gmx.net>

After calling ethtool -g it was not possible to adjust the TX ring size again. The reason for this is that the readonly setting rx_pending get initialized and after that the range check in qcaspi_set_ringparam() fails regardless of the provided parameter. Since there is no adjustable RX ring at all, drop it from qcaspi_get_ringparam(). Fixes: 291ab06 ("net: qualcomm: new Ethernet over SPI driver for QCA7000") Signed-off-by: Stefan Wahren <wahrenst@gmx.net>

In case of a reset triggered by the QCA7000 itself, the behavior of the qca_spi driver was not quite correct: - in case of a pending RX frame decoding the drop counter must be incremented and decoding state machine reseted - also the reset counter must always be incremented regardless of sync state Fixes: 291ab06 ("net: qualcomm: new Ethernet over SPI driver for QCA7000") Signed-off-by: Stefan Wahren <wahrenst@gmx.net>

The skb spare room needs to be expanded for SPI header, footer and possible padding within the TX path. So announce the necessary space in order to avoid expensive skb_copy_expand calls. Signed-off-by: Stefan Wahren <wahrenst@gmx.net>

qcafrm_fsm_decode has the almost the same function description in qca_7k_common.c. So drop the comment here. Signed-off-by: Stefan Wahren <wahrenst@gmx.net>

This member is never used. So drop it. Signed-off-by: Stefan Wahren <wahrenst@gmx.net>

All defines in qca_spi.h except of the two ring limit defines have a QCASPI prefix. Since the name is quite generic add the QCASPI prefix to avoid possible name conflicts. Signed-off-by: Stefan Wahren <wahrenst@gmx.net>

Currently qca_spi reserves enough space for 4 complete Ethernet over SPI frames in the receive buffer. Unfortunately this is hidden under a magic number. So replace it with a more self explaining define. Signed-off-by: Stefan Wahren <wahrenst@gmx.net>

There are two points with the calculation of RX buffer size which are not optimal: 1. mtu is a mutual parameter, but actually we need the maximum possible MTU. So better use the define directly. 2. This magic number 4 represent the hardware generated frame length which is specific to SPI. We better replace this with the suitable define. There is no functional change. Signed-off-by: Stefan Wahren <wahrenst@gmx.net>

Most of the users doesn't know the expected signature of the QCA700x. So provide it within the error message. Signed-off-by: Stefan Wahren <wahrenst@gmx.net>

All known SPI registers of the QCA700x are 16 bit long. So adjust the formater width accordingly. Signed-off-by: Stefan Wahren <wahrenst@gmx.net>

According to MODULE_LICENSE the driver is under a dual license. So replace the BSD license text with the proper SPDX tag. Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com> Signed-off-by: Stefan Wahren <wahrenst@gmx.net>

The company I2SE has been acquired a long time ago. Switch to my private mail address before the I2SE account is deactivated. Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com> Signed-off-by: Stefan Wahren <wahrenst@gmx.net>

mooraby · 2023-11-17T11:49:10Z

Tested with the scripts, but did not do a full test e.g. a charing session

mhei

LGTM

mhei · 2023-11-21T13:43:27Z

I don't delete the branch for now, but @lategoodbye please clean it up once you don't need it anymore.

[ Upstream commit 5da692e ] A cache device failing to resume due to mapping errors should not be retried, as the failure leaves a partially initialized policy object. Repeating the resume operation risks triggering BUG_ON when reloading cache mappings into the incomplete policy object. Reproduce steps: 1. create a cache metadata consisting of 512 or more cache blocks, with some mappings stored in the first array block of the mapping array. Here we use cache_restore v1.0 to build the metadata. cat <<EOF >> cmeta.xml <superblock uuid="" block_size="128" nr_cache_blocks="512" \ policy="smq" hint_width="4"> <mappings> <mapping cache_block="0" origin_block="0" dirty="false"/> </mappings> </superblock> EOF dmsetup create cmeta --table "0 8192 linear /dev/sdc 0" cache_restore -i cmeta.xml -o /dev/mapper/cmeta --metadata-version=2 dmsetup remove cmeta 2. wipe the second array block of the mapping array to simulate data degradations. mapping_root=$(dd if=/dev/sdc bs=1c count=8 skip=192 \ 2>/dev/null | hexdump -e '1/8 "%u\n"') ablock=$(dd if=/dev/sdc bs=1c count=8 skip=$((4096*mapping_root+2056)) \ 2>/dev/null | hexdump -e '1/8 "%u\n"') dd if=/dev/zero of=/dev/sdc bs=4k count=1 seek=$ablock 3. try bringing up the cache device. The resume is expected to fail due to the broken array block. dmsetup create cmeta --table "0 8192 linear /dev/sdc 0" dmsetup create cdata --table "0 65536 linear /dev/sdc 8192" dmsetup create corig --table "0 524288 linear /dev/sdc 262144" dmsetup create cache --notable dmsetup load cache --table "0 524288 cache /dev/mapper/cmeta \ /dev/mapper/cdata /dev/mapper/corig 128 2 metadata2 writethrough smq 0" dmsetup resume cache 4. try resuming the cache again. An unexpected BUG_ON is triggered while loading cache mappings. dmsetup resume cache Kernel logs: (snip) ------------[ cut here ]------------ kernel BUG at drivers/md/dm-cache-policy-smq.c:752! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI CPU: 0 UID: 0 PID: 332 Comm: dmsetup Not tainted 6.13.4 #3 RIP: 0010:smq_load_mapping+0x3e5/0x570 Fix by disallowing resume operations for devices that failed the initial attempt. Signed-off-by: Ming-Hung Tsai <mtsai@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit 88f7f56 ] When a bio with REQ_PREFLUSH is submitted to dm, __send_empty_flush() generates a flush_bio with REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC, which causes the flush_bio to be throttled by wbt_wait(). An example from v5.4, similar problem also exists in upstream: crash> bt 2091206 PID: 2091206 TASK: ffff2050df92a300 CPU: 109 COMMAND: "kworker/u260:0" #0 [ffff800084a2f7f0] __switch_to at ffff80004008aeb8 #1 [ffff800084a2f820] __schedule at ffff800040bfa0c4 #2 [ffff800084a2f880] schedule at ffff800040bfa4b4 #3 [ffff800084a2f8a0] io_schedule at ffff800040bfa9c4 #4 [ffff800084a2f8c0] rq_qos_wait at ffff8000405925bc #5 [ffff800084a2f940] wbt_wait at ffff8000405bb3a0 #6 [ffff800084a2f9a0] __rq_qos_throttle at ffff800040592254 #7 [ffff800084a2f9c0] blk_mq_make_request at ffff80004057cf38 #8 [ffff800084a2fa60] generic_make_request at ffff800040570138 #9 [ffff800084a2fae0] submit_bio at ffff8000405703b4 #10 [ffff800084a2fb50] xlog_write_iclog at ffff800001280834 [xfs] #11 [ffff800084a2fbb0] xlog_sync at ffff800001280c3c [xfs] #12 [ffff800084a2fbf0] xlog_state_release_iclog at ffff800001280df4 [xfs] #13 [ffff800084a2fc10] xlog_write at ffff80000128203c [xfs] #14 [ffff800084a2fcd0] xlog_cil_push at ffff8000012846dc [xfs] #15 [ffff800084a2fda0] xlog_cil_push_work at ffff800001284a2c [xfs] #16 [ffff800084a2fdb0] process_one_work at ffff800040111d08 #17 [ffff800084a2fe00] worker_thread at ffff8000401121cc #18 [ffff800084a2fe70] kthread at ffff800040118de4 After commit 2def284 ("xfs: don't allow log IO to be throttled"), the metadata submitted by xlog_write_iclog() should not be throttled. But due to the existence of the dm layer, throttling flush_bio indirectly causes the metadata bio to be throttled. Fix this by conditionally adding REQ_IDLE to flush_bio.bi_opf, which makes wbt_should_throttle() return false to avoid wbt_wait(). Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Tianxiang Peng <txpeng@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit ee684de ] As shown in [1], it is possible to corrupt a BPF ELF file such that arbitrary BPF instructions are loaded by libbpf. This can be done by setting a symbol (BPF program) section offset to a large (unsigned) number such that <section start + symbol offset> overflows and points before the section data in the memory. Consider the situation below where: - prog_start = sec_start + symbol_offset <-- size_t overflow here - prog_end = prog_start + prog_size prog_start sec_start prog_end sec_end | | | | v v v v .....................|################################|............ The report in [1] also provides a corrupted BPF ELF which can be used as a reproducer: $ readelf -S crash Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align ... [ 2] uretprobe.mu[...] PROGBITS 0000000000000000 00000040 0000000000000068 0000000000000000 AX 0 0 8 $ readelf -s crash Symbol table '.symtab' contains 8 entries: Num: Value Size Type Bind Vis Ndx Name ... 6: ffffffffffffffb8 104 FUNC GLOBAL DEFAULT 2 handle_tp Here, the handle_tp prog has section offset ffffffffffffffb8, i.e. will point before the actual memory where section 2 is allocated. This is also reported by AddressSanitizer: ================================================================= ==1232==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7c7302fe0000 at pc 0x7fc3046e4b77 bp 0x7ffe64677cd0 sp 0x7ffe64677490 READ of size 104 at 0x7c7302fe0000 thread T0 #0 0x7fc3046e4b76 in memcpy (/lib64/libasan.so.8+0xe4b76) #1 0x00000040df3e in bpf_object__init_prog /src/libbpf/src/libbpf.c:856 #2 0x00000040df3e in bpf_object__add_programs /src/libbpf/src/libbpf.c:928 #3 0x00000040df3e in bpf_object__elf_collect /src/libbpf/src/libbpf.c:3930 #4 0x00000040df3e in bpf_object_open /src/libbpf/src/libbpf.c:8067 #5 0x00000040f176 in bpf_object__open_file /src/libbpf/src/libbpf.c:8090 #6 0x000000400c16 in main /poc/poc.c:8 #7 0x7fc3043d25b4 in __libc_start_call_main (/lib64/libc.so.6+0x35b4) #8 0x7fc3043d2667 in __libc_start_main@@GLIBC_2.34 (/lib64/libc.so.6+0x3667) #9 0x000000400b34 in _start (/poc/poc+0x400b34) 0x7c7302fe0000 is located 64 bytes before 104-byte region [0x7c7302fe0040,0x7c7302fe00a8) allocated by thread T0 here: #0 0x7fc3046e716b in malloc (/lib64/libasan.so.8+0xe716b) #1 0x7fc3045ee600 in __libelf_set_rawdata_wrlock (/lib64/libelf.so.1+0xb600) #2 0x7fc3045ef018 in __elf_getdata_rdlock (/lib64/libelf.so.1+0xc018) #3 0x00000040642f in elf_sec_data /src/libbpf/src/libbpf.c:3740 The problem here is that currently, libbpf only checks that the program end is within the section bounds. There used to be a check `while (sec_off < sec_sz)` in bpf_object__add_programs, however, it was removed by commit 6245947 ("libbpf: Allow gaps in BPF program sections to support overriden weak functions"). Add a check for detecting the overflow of `sec_off + prog_sz` to bpf_object__init_prog to fix this issue. [1] https://github.com/lmarch2/poc/blob/main/libbpf/libbpf.md Fixes: 6245947 ("libbpf: Allow gaps in BPF program sections to support overriden weak functions") Reported-by: lmarch2 <2524158037@qq.com> Signed-off-by: Viktor Malik <vmalik@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Link: https://github.com/lmarch2/poc/blob/main/libbpf/libbpf.md Link: https://lore.kernel.org/bpf/20250415155014.397603-1-vmalik@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>

lategoodbye requested review from mhei and mooraby October 23, 2023 09:33

lategoodbye marked this pull request as draft October 23, 2023 09:37

lategoodbye changed the title ~~Minor qca_spi improvements~~ Critical qca_spi fixes and some improvements Oct 24, 2023

lategoodbye force-pushed the v6.1-tarragon_qca7k branch from e98079f to d5257c4 Compare October 30, 2023 12:03

lategoodbye marked this pull request as ready for review October 30, 2023 12:05

lategoodbye requested a review from barsnick November 1, 2023 10:48

mooraby reviewed Nov 17, 2023

View reviewed changes

lategoodbye force-pushed the v6.1-tarragon_qca7k branch from d5257c4 to 926cabf Compare November 17, 2023 10:57

lategoodbye and others added 14 commits November 17, 2023 12:25

qca_spi: Avoid skb_copy_expand in TX path

610431c

The skb spare room needs to be expanded for SPI header, footer and possible padding within the TX path. So announce the necessary space in order to avoid expensive skb_copy_expand calls. Signed-off-by: Stefan Wahren <wahrenst@gmx.net>

qca_7k_common: Drop unnecessary function description

d7f926d

qcafrm_fsm_decode has the almost the same function description in qca_7k_common.c. So drop the comment here. Signed-off-by: Stefan Wahren <wahrenst@gmx.net>

qca_7k_common: Drop unused len from qcafrm_handle

83b56db

This member is never used. So drop it. Signed-off-by: Stefan Wahren <wahrenst@gmx.net>

qca_spi: Add QCASPI prefix to ring defines

5d6a6d7

All defines in qca_spi.h except of the two ring limit defines have a QCASPI prefix. Since the name is quite generic add the QCASPI prefix to avoid possible name conflicts. Signed-off-by: Stefan Wahren <wahrenst@gmx.net>

qca_spi: Log expected signature in error case

7e5c1be

Most of the users doesn't know the expected signature of the QCA700x. So provide it within the error message. Signed-off-by: Stefan Wahren <wahrenst@gmx.net>

qca_spi: Adjust log of SPI_REG_RDBUF_BYTE_AVA

5c91237

All known SPI registers of the QCA700x are 16 bit long. So adjust the formater width accordingly. Signed-off-by: Stefan Wahren <wahrenst@gmx.net>

qca_7k: replace BSD boilerplate with SPDX

1a8c5b1

According to MODULE_LICENSE the driver is under a dual license. So replace the BSD license text with the proper SPDX tag. Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com> Signed-off-by: Stefan Wahren <wahrenst@gmx.net>

qca_7k: Replace old mail address

e57f5e9

The company I2SE has been acquired a long time ago. Switch to my private mail address before the I2SE account is deactivated. Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com> Signed-off-by: Stefan Wahren <wahrenst@gmx.net>

lategoodbye force-pushed the v6.1-tarragon_qca7k branch from 926cabf to e57f5e9 Compare November 17, 2023 11:27

mooraby approved these changes Nov 17, 2023

View reviewed changes

mhei approved these changes Nov 21, 2023

View reviewed changes

mhei merged commit f50ffd7 into v6.1-tarragon Nov 21, 2023

lategoodbye deleted the v6.1-tarragon_qca7k branch December 20, 2023 13:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Critical qca_spi fixes and some improvements #3

Critical qca_spi fixes and some improvements #3

Uh oh!

lategoodbye commented Oct 23, 2023 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mooraby commented Nov 17, 2023

Uh oh!

mhei left a comment

Uh oh!

mhei commented Nov 21, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Critical qca_spi fixes and some improvements #3

Critical qca_spi fixes and some improvements #3

Uh oh!

Conversation

lategoodbye commented Oct 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mooraby commented Nov 17, 2023

Uh oh!

mhei left a comment

Choose a reason for hiding this comment

Uh oh!

mhei commented Nov 21, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lategoodbye commented Oct 23, 2023 •

edited

Loading