[6.6-velinux] Intel: Backport to fix EDAC driver for GNR platform#43
Closed
AichunShi wants to merge 3 commits into6.6-velinuxfrom
Closed
[6.6-velinux] Intel: Backport to fix EDAC driver for GNR platform#43AichunShi wants to merge 3 commits into6.6-velinuxfrom
AichunShi wants to merge 3 commits into6.6-velinuxfrom
Conversation
commit 8b93582 upstream. Commit afdb82fd763c ("EDAC, i10nm: make skx_common.o a separate module") made skx_common.o a separate module. With skx_common.o now a separate module, move the common debug code setup_{skx,i10nm}_debug() and teardown_{skx,i10nm}_debug() in {skx,i10nm}_base.c to skx_common.c to reduce code duplication. Additionally, prefix these function names with 'skx' to maintain consistency with other names in the file. Intel-SIG: commit 8b93582 EDAC/{skx_common,skx,i10nm}: Move the common debug code to skx_common Backport to fix EDAC driver for GNR Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/all/20240829055101.56245-1-qiuxu.zhuo@intel.com [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
commit 2397f795735219caa9c2fe61e7bcdd0652e670d3 upstream. The current skx_common determines whether the memory error source is the near memory of the 2LM system and then retrieves the decoded error results from the ADXL components (near-memory vs. far-memory) accordingly. However, some memory controllers may have limitations in correctly reporting the memory error source, leading to the retrieval of incorrect decoded parts from the ADXL. To address these limitations, instead of simply determining whether the memory error is from the near memory of the 2LM system, it is necessary to distinguish the memory error source details as follows: Memory error from the near memory of the 2LM system. Memory error from the far memory of the 2LM system. Memory error from the 1LM system. Not a memory error. This will enable the i10nm_edac driver to take appropriate actions for those memory controllers that have limitations in reporting the memory error source. Intel-SIG: commit 2397f7957352 EDAC/skx_common: Differentiate memory error sources Backport to fix EDAC driver for GNR Fixes: ba987ea ("EDAC/i10nm: Add Intel Granite Rapids server support") Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Tested-by: Diego Garcia Rodriguez <diego.garcia.rodriguez@intel.com> Link: https://lore.kernel.org/r/20241015072236.24543-2-qiuxu.zhuo@intel.com [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <aichun.shi@intel.com>
commit a36667037a0c0e36c59407f8ae636295390239a5 upstream.
The Granite Rapids CPUs with Flat2LM memory configurations may
mistakenly report near-memory errors as far-memory errors, resulting
in the invalid decoded ADXL results:
EDAC skx: Bad imc -1
Fix this incorrect far-memory error source indicator by prefetching the
decoded far-memory controller ID, and adjust the error source indicator
to near-memory if the far-memory controller ID is invalid.
Intel-SIG: commit a36667037a0c EDAC/{skx_common,i10nm}: Fix incorrect far-memory error source indicator
Backport to fix EDAC driver for GNR
Fixes: ba987ea ("EDAC/i10nm: Add Intel Granite Rapids server support")
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Tested-by: Diego Garcia Rodriguez <diego.garcia.rodriguez@intel.com>
Link: https://lore.kernel.org/r/20241015072236.24543-3-qiuxu.zhuo@intel.com
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <aichun.shi@intel.com>
Collaborator
|
Acked |
x56Jason
added a commit
to openvelinux/kernel-intel
that referenced
this pull request
Nov 10, 2025
…x-edac-fix-gnr' into intel-6.6-velinux
This PR is to fix EDAC driver for Intel GNR platform on 6.6-velinux kernel.
Upstream commits from v6.13:
a36667037a0c0e36c59407f8ae636295390239a5 EDAC/{skx_common,i10nm}: Fix incorrect far-memory error source indicator
2397f795735219caa9c2fe61e7bcdd0652e670d3 EDAC/skx_common: Differentiate memory error sources
Upstream commit from v6.11:
8b93582 EDAC/{skx_common,skx,i10nm}: Move the common debug code to skx_common
Upstream commit from v6.11 already merged:
123b158 EDAC, i10nm: make skx_common.o a separate module
Test
Built and run the kernel successfully.
EDAC Test is PASS on GNR platform.
Conflicts:
drivers/edac/skx_common.c
drivers/edac/skx_common.h
[jz: stable already merged part of the commits]
Collaborator
|
Since it has conflicts, merge it and adapt locally. Close it here. |
guojinhui-liam
pushed a commit
that referenced
this pull request
Mar 3, 2026
[ Upstream commit ca70b8baf2bd125b2a4d96e76db79375c07d7ff2 ]
The current sk memory accounting logic in __SK_REDIRECT is pre-uncharging
tosend bytes, which is either msg->sg.size or a smaller value apply_bytes.
Potential problems with this strategy are as follows:
- If the actual sent bytes are smaller than tosend, we need to charge some
bytes back, as in line 487, which is okay but seems not clean.
- When tosend is set to apply_bytes, as in line 417, and (ret < 0), we may
miss uncharging (msg->sg.size - apply_bytes) bytes.
[...]
415 tosend = msg->sg.size;
416 if (psock->apply_bytes && psock->apply_bytes < tosend)
417 tosend = psock->apply_bytes;
[...]
443 sk_msg_return(sk, msg, tosend);
444 release_sock(sk);
446 origsize = msg->sg.size;
447 ret = tcp_bpf_sendmsg_redir(sk_redir, redir_ingress,
448 msg, tosend, flags);
449 sent = origsize - msg->sg.size;
[...]
454 lock_sock(sk);
455 if (unlikely(ret < 0)) {
456 int free = sk_msg_free_nocharge(sk, msg);
458 if (!cork)
459 *copied -= free;
460 }
[...]
487 if (eval == __SK_REDIRECT)
488 sk_mem_charge(sk, tosend - sent);
[...]
When running the selftest test_txmsg_redir_wait_sndmem with txmsg_apply,
the following warning will be reported:
------------[ cut here ]------------
WARNING: CPU: 6 PID: 57 at net/ipv4/af_inet.c:156 inet_sock_destruct+0x190/0x1a0
Modules linked in:
CPU: 6 UID: 0 PID: 57 Comm: kworker/6:0 Not tainted 6.12.0-rc1.bm.1-amd64+ #43
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
Workqueue: events sk_psock_destroy
RIP: 0010:inet_sock_destruct+0x190/0x1a0
RSP: 0018:ffffad0a8021fe08 EFLAGS: 00010206
RAX: 0000000000000011 RBX: ffff9aab4475b900 RCX: ffff9aab481a0800
RDX: 0000000000000303 RSI: 0000000000000011 RDI: ffff9aab4475b900
RBP: ffff9aab4475b990 R08: 0000000000000000 R09: ffff9aab40050ec0
R10: 0000000000000000 R11: ffff9aae6fdb1d01 R12: ffff9aab49c60400
R13: ffff9aab49c60598 R14: ffff9aab49c60598 R15: dead000000000100
FS: 0000000000000000(0000) GS:ffff9aae6fd80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffec7e47bd8 CR3: 00000001a1a1c004 CR4: 0000000000770ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
? __warn+0x89/0x130
? inet_sock_destruct+0x190/0x1a0
? report_bug+0xfc/0x1e0
? handle_bug+0x5c/0xa0
? exc_invalid_op+0x17/0x70
? asm_exc_invalid_op+0x1a/0x20
? inet_sock_destruct+0x190/0x1a0
__sk_destruct+0x25/0x220
sk_psock_destroy+0x2b2/0x310
process_scheduled_works+0xa3/0x3e0
worker_thread+0x117/0x240
? __pfx_worker_thread+0x10/0x10
kthread+0xcf/0x100
? __pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x40
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
---[ end trace 0000000000000000 ]---
In __SK_REDIRECT, a more concise way is delaying the uncharging after sent
bytes are finalized, and uncharge this value. When (ret < 0), we shall
invoke sk_msg_free.
Same thing happens in case __SK_DROP, when tosend is set to apply_bytes,
we may miss uncharging (msg->sg.size - apply_bytes) bytes. The same
warning will be reported in selftest.
[...]
468 case __SK_DROP:
469 default:
470 sk_msg_free_partial(sk, msg, tosend);
471 sk_msg_apply_bytes(psock, tosend);
472 *copied -= (tosend + delta);
473 return -EACCES;
[...]
So instead of sk_msg_free_partial we can do sk_msg_free here.
Fixes: 604326b ("bpf, sockmap: convert to generic sk_msg interface")
Fixes: 8ec95b9 ("bpf, sockmap: Fix the sk->sk_forward_alloc warning of sk_stream_kill_queues")
Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20241016234838.3167769-3-zijianzhang@bytedance.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Ashish Bijlani <ashish.bijlani@bytedance.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR is to fix EDAC driver for Intel GNR platform on 6.6-velinux kernel.
Upstream commits from v6.13:
a36667037a0c0e36c59407f8ae636295390239a5 EDAC/{skx_common,i10nm}: Fix incorrect far-memory error source indicator
2397f795735219caa9c2fe61e7bcdd0652e670d3 EDAC/skx_common: Differentiate memory error sources
Upstream commit from v6.11:
8b93582 EDAC/{skx_common,skx,i10nm}: Move the common debug code to skx_common
Upstream commit from v6.11 already merged:
123b158 EDAC, i10nm: make skx_common.o a separate module
Test
Built and run the kernel successfully.
EDAC Test is PASS on GNR platform.
Configs
No Change.