ice: switchdev less resource iwl v1 #7

mswiatko · 2024-01-25T14:18:35Z

Using less resources in switchdev (no additional ones)

The function i40e_pf_wait_queues_disabled() iterates all PF's VSIs up to 'pf->hw.func_caps.num_vsis' but this is incorrect because the real number of VSIs can be up to 'pf->num_alloc_vsi' that can be higher. Fix this loop. Fixes: 69129dc ("i40e: Modify Tx disable wait flow in case of DCB reconfiguration") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>

Mask used for clearing PRTDCB_RETSTCC register in function i40e_dcb_hw_rx_ets_bw_config() is incorrect as there is used define I40E_PRTDCB_RETSTCC_ETSTC_SHIFT instead of define I40E_PRTDCB_RETSTCC_ETSTC_MASK. The PRTDCB_RETSTCC register is used to configure whether ETS or strict priority is used as TSA in Rx for particular TC. In practice it means that once the register is set to use ETS as TSA then it is not possible to switch back to strict priority without CoreR reset. Fix the value in the clearing mask. Fixes: 90bc8e0 ("i40e: Add hardware configuration for software based DCB") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>

Use existing i40e_find_vsi_by_type() to find a VSI associated with flow director. Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com>

Introduce i40e_for_each_vsi() and i40e_for_each_veb() helper macros and use them to iterate relevant arrays. Replace pattern: for (i = 0; i < pf->num_alloc_vsi; i++) by: i40e_for_each_vsi(pf, i, vsi) and pattern: for (i = 0; i < I40E_MAX_VEB; i++) by i40e_for_each_veb(pf, i, veb) These macros also check if array item pf->vsi[i] or pf->veb[i] are not NULL and skip such items so we can remove redundant checks from loop bodies. Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com>

Add two helpers i40e_(veb|vsi)_get_by_seid() to find corresponding VEB or VSI by their SEID value and use these helpers to replace existing open-coded loops. Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com>

Although the i40e supports so-called floating VEB (VEB without an uplink connection to external network), this support is broken. This functionality is currently unused (except debugfs) but it will be used by subsequent series for switchdev mode slow-path. Fix this by following: 1) Handle correctly floating VEB (VEB with uplink_seid == 0) in i40e_reconstitute_veb() and look for owner VSI and create it only for non-floating VEBs and also set bridge mode only for such VEBs as the floating ones are using always VEB mode. 2) Handle correctly floating VEB in i40e_veb_release() and disallow its release when there are some VSIs. This is different from regular VEB that have owner VSI that is connected to VEB's uplink after VEB deletion by FW. 3) Fix i40e_add_veb() to handle 'vsi' that is NULL for floating VEBs. For floating VEB use 0 for downlink SEID and 'true' for 'default_port' parameters as per datasheet. 4) Fix 'add relay' command in i40e_dbg_command_write() to allow to create floating VEB by 'add relay 0 0' or 'add relay' Tested using debugfs: 1) Initial state [root@host net-next]# echo dump switch > $CMD [root@host net-next]# dmesg -c [ 173.701286] i40e 0000:02:00.0: header: 3 reported 3 total [ 173.706701] i40e 0000:02:00.0: type=19 seid=392 uplink=160 downlink=16 [ 173.713241] i40e 0000:02:00.0: type=17 seid=160 uplink=2 downlink=0 [ 173.719507] i40e 0000:02:00.0: type=19 seid=390 uplink=160 downlink=16 2) Add floating VEB [root@host net-next]# CMD="/sys/kernel/debug/i40e/0000:02:00.0/command" [root@host net-next]# echo add relay > $CMD [root@host net-next]# dmesg -c [ 245.551720] i40e 0000:02:00.0: added relay 162 [root@host net-next]# echo dump switch > $CMD [root@host net-next]# dmesg -c [ 276.984371] i40e 0000:02:00.0: header: 4 reported 4 total [ 276.989779] i40e 0000:02:00.0: type=19 seid=392 uplink=160 downlink=16 [ 276.996302] i40e 0000:02:00.0: type=17 seid=160 uplink=2 downlink=0 [ 277.002569] i40e 0000:02:00.0: type=19 seid=390 uplink=160 downlink=16 [ 277.009091] i40e 0000:02:00.0: type=17 seid=162 uplink=0 downlink=0 3) Add VMDQ2 VSI to this new VEB [root@host net-next]# echo add vsi 162 > $CMD [root@host net-next]# dmesg -c [ 332.314030] i40e 0000:02:00.0: added VSI 394 to relay 162 [ 332.337486] enp2s0f0np0v0: NIC Link is Up, 40 Gbps Full Duplex, Flow Control: None [root@host net-next]# echo dump switch > $CMD [root@host net-next]# dmesg -c [ 387.284490] i40e 0000:02:00.0: header: 5 reported 5 total [ 387.289904] i40e 0000:02:00.0: type=19 seid=394 uplink=162 downlink=16 [ 387.296446] i40e 0000:02:00.0: type=17 seid=162 uplink=0 downlink=0 [ 387.302708] i40e 0000:02:00.0: type=19 seid=392 uplink=160 downlink=16 [ 387.309234] i40e 0000:02:00.0: type=17 seid=160 uplink=2 downlink=0 [ 387.315500] i40e 0000:02:00.0: type=19 seid=390 uplink=160 downlink=16 4) Try to delete the VEB [root@host net-next]# echo del relay 162 > $CMD [root@host net-next]# dmesg -c [ 428.749297] i40e 0000:02:00.0: deleting relay 162 [ 428.754011] i40e 0000:02:00.0: can't remove VEB 162 with 1 VSIs left 5) Do PF reset and check switch status after rebuild [root@host net-next]# echo pfr > $CMD [root@host net-next]# echo dump switch > $CMD [root@host net-next]# dmesg -c [ 738.056172] i40e 0000:02:00.0: header: 5 reported 5 total [ 738.061577] i40e 0000:02:00.0: type=19 seid=394 uplink=162 downlink=16 [ 738.068104] i40e 0000:02:00.0: type=17 seid=162 uplink=0 downlink=0 [ 738.074367] i40e 0000:02:00.0: type=19 seid=392 uplink=160 downlink=16 [ 738.080892] i40e 0000:02:00.0: type=17 seid=160 uplink=2 downlink=0 [ 738.087160] i40e 0000:02:00.0: type=19 seid=390 uplink=160 downlink=16 6) Delete VSI and delete VEB [root@host net-next]# echo del vsi 394 > $CMD [root@host net-next]# echo del relay 162 > $CMD [root@host net-next]# echo dump switch > $CMD [root@host net-next]# dmesg -c [ 1233.081126] i40e 0000:02:00.0: deleting VSI 394 [ 1239.345139] i40e 0000:02:00.0: deleting relay 162 [ 1244.886920] i40e 0000:02:00.0: header: 3 reported 3 total [ 1244.892328] i40e 0000:02:00.0: type=19 seid=392 uplink=160 downlink=16 [ 1244.898853] i40e 0000:02:00.0: type=17 seid=160 uplink=2 downlink=0 [ 1244.905119] i40e 0000:02:00.0: type=19 seid=390 uplink=160 downlink=16 Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com>

The VEB (virtual embedded switch) as a switch element can be connected according datasheet though its uplink to: - Physical port - Port Virtualizer (not used directly by i40e driver but can be present in MFP mode where the physical port is shared between PFs) - No uplink (aka floating VEB) But VEB uplink cannot be connected to another VEB and any attempt to do so results in: "i40e 0000:02:00.0: couldn't add VEB, err -EIO aq_err I40E_AQ_RC_ENOENT" that indicates "the uplink SEID does not point to valid element". Remove this logic from the driver code this way: 1) For debugfs only allow to build floating VEB (uplink_seid == 0) or main VEB (uplink_seid == mac_seid) 2) Do not recurse in i40e_veb_link_event() as no VEB cannot have sub-VEBs 3) Ditto for i40e_veb_rebuild() + simplify the function as we know that the VEB for rebuild can be only the main LAN VEB or some of the floating VEBs 4) In i40e_rebuild() there is no need to check veb->uplink_seid as the possible ones are 0 and MAC SEID 5) In i40e_vsi_release() do not take into account VEBs whose uplink is another VEB as this is not possible 6) Remove veb_idx field from i40e_veb as a VEB cannot have sub-VEBs Tested using i40e debugfs interface: 1) Initial state [root@cnb-03 net-next]# CMD="/sys/kernel/debug/i40e/0000:02:00.0/command" [root@cnb-03 net-next]# echo dump switch > $CMD [root@cnb-03 net-next]# dmesg -c [ 98.440641] i40e 0000:02:00.0: header: 3 reported 3 total [ 98.446053] i40e 0000:02:00.0: type=19 seid=392 uplink=160 downlink=16 [ 98.452593] i40e 0000:02:00.0: type=17 seid=160 uplink=2 downlink=0 [ 98.458856] i40e 0000:02:00.0: type=19 seid=390 uplink=160 downlink=16 2) Add floating VEB [root@cnb-03 net-next]# echo add relay > $CMD [root@cnb-03 net-next]# dmesg -c [ 122.745630] i40e 0000:02:00.0: added relay 162 [root@cnb-03 net-next]# echo dump switch > $CMD [root@cnb-03 net-next]# dmesg -c [ 136.650049] i40e 0000:02:00.0: header: 4 reported 4 total [ 136.655466] i40e 0000:02:00.0: type=19 seid=392 uplink=160 downlink=16 [ 136.661994] i40e 0000:02:00.0: type=17 seid=160 uplink=2 downlink=0 [ 136.668264] i40e 0000:02:00.0: type=19 seid=390 uplink=160 downlink=16 [ 136.674787] i40e 0000:02:00.0: type=17 seid=162 uplink=0 downlink=0 3) Add VMDQ2 VSI to this new VEB [root@cnb-03 net-next]# dmesg -c [ 168.351763] i40e 0000:02:00.0: added VSI 394 to relay 162 [ 168.374652] enp2s0f0np0v0: NIC Link is Up, 40 Gbps Full Duplex, Flow Control: None [root@cnb-03 net-next]# echo dump switch > $CMD [root@cnb-03 net-next]# dmesg -c [ 195.683204] i40e 0000:02:00.0: header: 5 reported 5 total [ 195.688611] i40e 0000:02:00.0: type=19 seid=394 uplink=162 downlink=16 [ 195.695143] i40e 0000:02:00.0: type=17 seid=162 uplink=0 downlink=0 [ 195.701410] i40e 0000:02:00.0: type=19 seid=392 uplink=160 downlink=16 [ 195.707935] i40e 0000:02:00.0: type=17 seid=160 uplink=2 downlink=0 [ 195.714201] i40e 0000:02:00.0: type=19 seid=390 uplink=160 downlink=16 4) Try to delete the VEB [root@cnb-03 net-next]# echo del relay 162 > $CMD [root@cnb-03 net-next]# dmesg -c [ 239.260901] i40e 0000:02:00.0: deleting relay 162 [ 239.265621] i40e 0000:02:00.0: can't remove VEB 162 with 1 VSIs left 5) Do PF reset and check switch status after rebuild [root@cnb-03 net-next]# echo pfr > $CMD [root@cnb-03 net-next]# echo dump switch > $CMD [root@cnb-03 net-next]# dmesg -c ... [ 272.333655] i40e 0000:02:00.0: header: 5 reported 5 total [ 272.339066] i40e 0000:02:00.0: type=19 seid=394 uplink=162 downlink=16 [ 272.345599] i40e 0000:02:00.0: type=17 seid=162 uplink=0 downlink=0 [ 272.351862] i40e 0000:02:00.0: type=19 seid=392 uplink=160 downlink=16 [ 272.358387] i40e 0000:02:00.0: type=17 seid=160 uplink=2 downlink=0 [ 272.364654] i40e 0000:02:00.0: type=19 seid=390 uplink=160 downlink=16 6) Delete VSI and delete VEB [ 297.199116] i40e 0000:02:00.0: deleting VSI 394 [ 299.807580] i40e 0000:02:00.0: deleting relay 162 [ 309.767905] i40e 0000:02:00.0: header: 3 reported 3 total [ 309.773318] i40e 0000:02:00.0: type=19 seid=392 uplink=160 downlink=16 [ 309.779845] i40e 0000:02:00.0: type=17 seid=160 uplink=2 downlink=0 [ 309.786111] i40e 0000:02:00.0: type=19 seid=390 uplink=160 downlink=16 Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com>

Introduce new Intel Ethernet E825C family devices. Add new PCI device IDs which are going to be supported by the driver: - 579C: Intel(R) Ethernet Connection E825-C for backplane - 579D: Intel(R) Ethernet Connection E825-C for QSFP - 579E: Intel(R) Ethernet Connection E825-C for SFP - 579F: Intel(R) Ethernet Connection E825-C for SGMII Add helper function ice_is_e825c() to verify if the running device belongs to E825C family. Co-developed-by: Jan Glaza <jan.glaza@intel.com> Signed-off-by: Jan Glaza <jan.glaza@intel.com> Co-developed-by: Michal Michalik <michal.michalik@intel.com> Signed-off-by: Michal Michalik <michal.michalik@intel.com> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>

E800 series devices have a couple of quirks: 1. Sideband control queues are not supported 2. The registers that the driver needs to program for the "Precision Time Protocol (PTP)" feature are different for E800 series devices compared to other devices supported by this driver. Both these require conditional logic based on the underlying device we are dealing with. The function ice_is_sbq_supported added by commit 8f5ee3c ("ice: add support for sideband messages") addresses (1). The same function can be used to address (2) as well but this just looks weird readability wise in cases that have nothing to do with sideband control queues: if (ice_is_sbq_supported(hw)) /* program register A */ else /* program register B */ For these cases, the function ice_is_generic_mac introduced by this patch communicates the idea/intention better. Also rework ice_is_sbq_supported to use this new function. As side-band queue is supported for E825C devices, it's mac_type is considered as generic mac_type. Co-developed-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>

E825C devices shall support the new signing type of RSA 3K for new DDP section (SEGMENT_SIGN_TYPE_RSA3K_E825 (5) - already in the code). The driver is responsible to verify the presence of correct signing type. Add 3k signinig support for E825C devices based on mac_type: ICE_MAC_GENERIC_3K_E825; Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>

Add support for driver-specific devlink loopback param. Supported values are "enabled", "disabled" and "prioritized". Default configuration is set to "enabled". Add documentation in networking/devlink/ice.rst. In previous generations of Intel NICs the trasmit scheduler was only limited by PCIe bandwidth when scheduling/assigning hairpin-badwidth between VFs. Changes to E810 HW design introduced scheduler limitation, so that available hairpin-bandwidth is bound to external port speed. In order to address this limitation and enable NFV services such as "service chaining" a knob to adjust the scheduler config was created. Driver can send a configuration message to the FW over admin queue and internal FW logic will reconfigure HW to prioritize and add more BW to VF to VF traffic. As end result for example 10G port will no longer limit hairpin-badwith to 10G and much higher speeds can be achieved. Devlink loopback param set to "prioritized" enables higher hairpin-badwitdh on related PFs. Configuration is applicable only to 8x10G and 4x25G cards. Changing loopback configuration will trigger CORER reset in order to take effect. Example command to change current value: devlink dev param set pci/0000:b2:00.3 name loopback value prioritized \ cmode runtime Co-developed-by: Michal Wilczynski <michal.wilczynski@intel.com> Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Pawel Kaminski <pawel.kaminski@intel.com>

The e1000e driver supports hardware with a variety of different clock speeds, and thus a variety of different increment values used for programming its PTP hardware clock. The values currently programmed in e1000e_ptp_init are incorrect. In particular, only two maximum adjustments are used: 24000000 - 1, and 600000000 - 1. These were originally intended to be used with the 96 MHz clock and the 25 MHz clock. Both of these values are actually slightly too high. For the 96 MHz clock, the actual maximum value that can safely be programmed is 23,999,938. For the 25 MHz clock, the maximum value is 599,999,904. Worse, several devices use a 24 MHz clock or a 38.4 MHz clock. These parts are incorrectly assigned one of either the 24million or 600million values. For the 24 MHz clock, this is not a significant issue: its current increment value can support an adjustment up to 7billion in the positive direction. However, the 38.4 KHz clock uses an increment value which can only support up to 230,769,157 before it starts overflowing. To understand where these values come from, consider that frequency adjustments have the form of: new_incval = base_incval + (base_incval * adjustment) / (unit of adjustment) The maximum adjustment is reported in terms of parts per billion: new_incval = base_incval + (base_incval * adjustment) / 1 billion The largest possible adjustment is thus given by the following: max_incval = base_incval + (base_incval * max_adj) / 1 billion Re-arranging to solve for max_adj: max_adj = (max_incval - base_incval) * 1 billion / base_incval We also need to ensure that negative adjustments cannot underflow. This can be achieved simply by ensuring max_adj is always less than 1 billion. Introduce new macros in e1000.h codifying the maximum adjustment in PPB for each frequency given its associated increment values. Also clarify where these values come from by commenting about the above equations. Replace the switch statement in e1000e_ptp_init with one which mirrors the increment value switch statement from e1000e_get_base_timinica. For each device, assign the appropriate maximum adjustment based on its frequency. Some parts can have one of two frequency modes as determined by E1000_TSYNCRXCTL_SYSCFI. Since the new flow directly matches the assignments in e1000e_get_base_timinca, and uses well defined macro names, it is much easier to verify that the resulting maximum adjustments are correct. It also avoids difficult to parse construction such as the "hw->mac.type < e1000_phc_lpt", and the use of fallthrough which was especially confusing when combined with a conditional block. Note that I believe the current increment value configuration used for 24MHz clocks is sub-par, as it leaves at least 3 extra bits available in the INCVALUE register. However, fixing that requires more careful review of the clock rate and associated values. Reported-by: Trey Harrison <harrisondigitalmedia@gmail.com> Fixes: 68fe1d5 ("e1000e: Add Support for 38.4MHZ frequency") Fixes: d89777b ("e1000e: add support for IEEE-1588 PTP") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

Forcing SMBUS in enable ULP flow causes sporadically PHY loss on MTL systems. This is a legacy configuration that is no longer required on newer PHYs. Therefore we remove it for MTL systems and above. Fixes: 6607c99 ("e1000e: i219 - fix to enable both ULP and EEE in Sx state") Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com> Co-developed-by: Dima Ruinskiy <dima.ruinskiy@intel.com> Signed-off-by: Dima Ruinskiy <dima.ruinskiy@intel.com>

On some Meteor Lake systems accessing the PHY via the MDIO interface may result in an MDI error. This issue happens sporadically and in most cases a second access to the PHY via the MDIO interface results in success. As a workaround, introduce a retry counter which is set to 3 on Meteor Lake systems. The driver will only return an error if 3 consecutive PHY access attempts fail. The retry mechanism is disabled in specific flows, where MDI errors are expected. Fixes: cc23f4f ("e1000e: Add support for Meteor Lake") Suggested-by: Nikolay Mushayev <nikolay.mushayev@intel.com> Co-developed-by: Nir Efrati <nir.efrati@intel.com> Signed-off-by: Nir Efrati <nir.efrati@intel.com> Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com>

HW incorrectly reports EIPE errors on encapsulated packets with L2 padding inside inner packet. HW shows outer UDP/IPV4 packet checksum errors as part of the EIPE flags of the Rx descriptor. These are reported only if checksum offload is enabled and L3/L4 parsed flag is valid in Rx descriptor. When that error is reported by HW, we don't act on it instead of incrementing main Rx errors statistic as it would normally happen. Add a new statistic to count these errors since we still want to print them. Signed-off-by: Aniruddha Paul <aniruddha.paul@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Jan Glaza <jan.glaza@intel.com> Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org>

Add curly braces to avoid entering an if statement where it is not always required in e1000_shutdown function. This improves code readability and might prevent a non-deterministic behaviour in the future. Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com>

In the arm random config file, kconfig option 'CONFIG_AEABI' is disabled which results in adding the compiler flag '-mabi=apcs-gnu'. This causes the compiler to add padding in virtchnl2_ptype structure to align it to 8 bytes, resulting in the following size check failure: include/linux/build_bug.h:78:41: error: static assertion failed: "(6) == sizeof(struct virtchnl2_ptype)" 78 | #define __static_assert(expr, msg, ...) _Static_assert(expr, msg) | ^~~~~~~~~~~~~~ include/linux/build_bug.h:77:34: note: in expansion of macro '__static_assert' 77 | #define static_assert(expr, ...) __static_assert(expr, ##__VA_ARGS__, #expr) | ^~~~~~~~~~~~~~~ drivers/net/ethernet/intel/idpf/virtchnl2.h:26:9: note: in expansion of macro 'static_assert' 26 | static_assert((n) == sizeof(struct X)) | ^~~~~~~~~~~~~ drivers/net/ethernet/intel/idpf/virtchnl2.h:982:1: note: in expansion of macro 'VIRTCHNL2_CHECK_STRUCT_LEN' 982 | VIRTCHNL2_CHECK_STRUCT_LEN(6, virtchnl2_ptype); | ^~~~~~~~~~~~~~~~~~~~~~~~~~ Avoid the compiler padding by using "__packed" structure attribute for the virtchnl2_ptype struct. Also align the structure by using "__aligned(2)" for better code optimization. Fixes: 0d7502a ("virtchnl: add virtchnl version 2 ops") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202312220250.ufEm8doQ-lkp@intel.com Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>

Commit 1978d3e ("intel: fix string truncation warnings") fixes '-Wformat-truncation=' warnings in igb_main.c by using kasprintf. drivers/net/ethernet/intel/igb/igb_main.c:3092:53: warning：‘%d’ directive output may be truncated writing between 1 and 5 bytes into a region of size between 1 and 13 [-Wformat-truncation=] 3092 | "%d.%d, 0x%08x, %d.%d.%d", | ^~ drivers/net/ethernet/intel/igb/igb_main.c:3092:34: note：directive argument in the range [0, 65535] 3092 | "%d.%d, 0x%08x, %d.%d.%d", | ^~~~~~~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/intel/igb/igb_main.c:3092:34: note：directive argument in the range [0, 65535] drivers/net/ethernet/intel/igb/igb_main.c:3090:25: note：‘snprintf’ output between 23 and 43 bytes into a destination of size 32 kasprintf() returns a pointer to dynamically allocated memory which can be NULL upon failure. Fix this warning by using a larger space for adapter->fw_version, and then fall back and continue to use snprintf. Fixes: 1978d3e ("intel: fix string truncation warnings") Signed-off-by: Kunwu Chan <chentao@kylinos.cn> Cc: Kunwu Chan <kunwu.chan@hotmail.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org>

To fully support initializing the LAG support code, a DDP package that extracts the logical port from the metadata is required. If such a package is not present, there could be difficulties in supporting some bond types. Add a check into the initialization flow that will bypass the new paths if any of the support pieces are missing. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Fixes: df006dd ("ice: Add initial support framework for LAG") Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>

All error handling paths, except this one, go to 'out' where release_swfw_sync() is called. This call balances the acquire_swfw_sync() call done at the beginning of the function. Branch to the error handling path in order to correctly release some resources in case of error. Fixes: ae14a1d ("ixgbe: Fix IOSF SB access issues") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Simon Horman <horms@kernel.org>

The iavf_reset_task() assumes that the adapter has finished the initialization cycle and is either in __IAVF_DOWN or __IAVF_RUNNING. At the early states, no resources have been allocated. Allow an early reset by simply shutting down the admin queue and reverting to the first state __IAVF_STARTUP. Fixes: 5eae00c ("i40evf: main driver core") Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>

If a reset event is received from the PF early in the init cycle, the iavf state machine could hang for about 25 seconds. For example: # echo 1 > /sys/class/net/enp175s0np0/device/sriov_numvfs && \ ip link set dev enp175s0np0 vf 0 mac <new_mac> the log shows: [532.770534] ice 0000:af:00.0: Enabling 1 VFs [532.880439] iavf 0000:af:01.0: enabling device (0000 -> 0002) [532.880983] ice 0000:af:00.0: Enabling 1 VFs with 17 vectors and 16 queues per VF [532.916547] ice 0000:af:00.0 enp175s0np0: Setting MAC 00:60:2f:20:3f:28 on VF 0. VF driver will be reinitialized [553.464990] iavf 0000:af:01.0: Failed to communicate with PF; waiting before retry [558.903000] iavf 0000:af:01.0: Hardware came out of reset. Attempting reinit. [558.984816] iavf 0000:af:01.0: Multiqueue Enabled: Queue pair count = 16 This happens because reset events are ignored in the early states where the misc irq vector is not initialized yet and communicating with the PF is through polling the AQ buffer. Fix by scanning the received OP codes for a reset event and scheduling the reset task if a reset event is received. Fixes: 5eae00c ("i40evf: main driver core") Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>

Remove a comment that was not correct; this structure has nothing to do with FW alignment. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>

Add PTP state machine so that the driver can correctly identify PTP state around resets. When the driver got information about ungraceful reset, PTP was not prepared for reset and it returned error. When this situation occurs, prepare PTP before rebuilding its structures. Co-authored-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Simon Horman <horms@kernel.org>

The ice_ptp_prepare_for_reset() and ice_ptp_reset() functions currently check the pf->flags ICE_FLAG_PFR_REQ bit to determine if the current reset is a PF reset or not. This is problematic, because it is possible that a PF reset and a higher level reset (CORE reset, GLOBAL reset, EMP reset) are requested simultaneously. In that case, the driver performs the highest level reset requested. However, the ICE_FLAG_PFR_REQ flag will still be set. The main driver reset functions take an enum ice_reset_req indicating which reset is actually being performed. Pass this data into the PTP functions and rely on this instead of relying on the driver flags. This ensures that the PTP code performs the proper level of reset that the driver is actually undergoing. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Simon Horman <horms@kernel.org>

The tx->verify_cached flag is used to inform the Tx timestamp tracking code whether it needs to verify the cached Tx timestamp value against a previous captured value. This is necessary on E810 hardware which does not have a Tx timestamp ready bitmap. In addition, we currently rely on the fact that the ice_get_phy_tx_tstamp_ready() function returns all 1s for E810 hardware. Instead of introducing a brand new flag, rename and verify_cached to has_ready_bitmap, inverting the relevant checks. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Simon Horman <horms@kernel.org>

E810 hardware does not have a Tx timestamp ready bitmap. Don't check has_ready_bitmap in E810-specific functions. Add has_ready_bitmap check in ice_ptp_process_tx_tstamp() to stop relying on the fact that ice_get_phy_tx_tstamp_ready() returns all 1s. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Simon Horman <horms@kernel.org>

The ice_ptp_tx_cfg_intr() function sends a control queue message to configure the PHY timestamp interrupt block. This is a very similar name to a function which is used to configure the MAC Other Interrupt Cause Enable register. Rename this function to ice_ptp_cfg_phy_interrupt in order to make it more obvious to the reader what action it performs, and distinguish it from other similarly named functions. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Simon Horman <horms@kernel.org>

The ice_ptp_reset() function uses a goto to skip past clock owner operations if performing a PF reset or if the device is not the clock owner. This is a bit confusing. Factor this out into ice_ptp_rebuild_owner() instead. The ice_ptp_reset() function is called by ice_rebuild() to restore PTP functionality after a device reset. Follow the convention set by the ice_main.c file and rename this function to ice_ptp_rebuild(), in the same way that we have ice_prepare_for_reset() and ice_ptp_prepare_for_reset(). Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Simon Horman <horms@kernel.org>

The ice driver currently attempts to destroy and re-initialize the Tx timestamp tracker during the reset flow. The release of the Tx tracker only happened during CORE reset or GLOBAL reset. The ice_ptp_rebuild() function always calls the ice_ptp_init_tx function which will allocate a new tracker data structure, resulting in memory leaks during PF reset. Certainly the driver should not be allocating a new tracker without removing the old tracker data, as this results in a memory leak. Additionally, there's no reason to remove the tracker memory during a reset. Remove this logic from the reset and rebuild flow. Instead of releasing the Tx tracker, flush outstanding timestamps just before we reset the PHY timestamp block in ice_ptp_cfg_phy_interrupt(). Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Simon Horman <horms@kernel.org>

Currently, XSK control path in ice driver calls directly ice_vsi_cfg_rxq() whereas we have ice_vsi_cfg_single_rxq() for that purpose. Use the latter from XSK side and make ice_vsi_cfg_rxq() static. ice_vsi_cfg_rxq() resides in ice_base.c and is rather big, so to reduce the code churn let us move two callers of it from ice_lib.c to ice_base.c. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org>

Currently, XSK control path in ice driver calls directly ice_vsi_cfg_txq() whereas we have ice_vsi_cfg_single_txq() for that purpose. Use the latter from XSK side and make ice_vsi_cfg_txq() static. ice_vsi_cfg_txq() resides in ice_base.c and is rather big, so to reduce the code churn let us move the callers of it from ice_lib.c to ice_base.c. This change puts ice_qp_ena() on nice diet due to the checks and operations that ice_vsi_cfg_single_{r,t}xq() do internally. add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-182 (-182) Function old new delta ice_xsk_pool_setup 2165 1983 -182 Total: Before=472597, After=472415, chg -0.04% Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org>

Changing queues used by eswitch will be done through PF netdev. There is no need to reserve queues if the number of used queues is known. Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

Tx can be done using PF netdev. Checks before Tx are unnecessary. Checking if switchdev mode is set seems too defensive (there is no PR netdev in legacy mode). If corresponding VF is disabled or during reset, PR netdev also should be down. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

Steer all packets that miss other rules to PF VSI. Previously in switchdev mode, PF VSI received missed packets, but only ones marked as Rx. Now it is receiving all missed packets. To queue rule per PR isn't needed, because we use PF VSI instead of control VSI now, and it's already correctly configured. Add flag to correctly set LAN_EN bit in default Tx rule. It shouldn't allow packet to go outside when there is a match. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

Tx rule in switchdev was changed to use PF instead of additional control plane VSI. Because of that during lag we should control it. Control means to add and remove the default Tx rule during lag active/inactive switching. It can be done the same way as default Rx rule. Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

For slow-path Rx and Tx PF VSI is used. There is no need to have control plane VSI. Remove all code related to it. Eswitch rebuild can't fail without rebuilding control plane VSI. Return void from ice_eswitch_rebuild(). Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

Instead of getting repr::id from xa_alloc() value, set it to the src_vsi::num_vsi value. It is unique for each PR. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

Add an ICE_RX_FLAG_MULTIDEV flag to Rx ring. If it is set try to find correct port representor. Do it based on src_vsi value stored in flex descriptor. Ids of representor pointers stored in xarray are equal to corresponding src_vsi value. Thanks to that we can directly get correct representor if we have src_vsi value. Set multidev flag during ring configuration. If the mode is switchdev, change the ring descriptor to the one that contains src_vsi value. PF netdev should be reconfigured, do it by calling ice_down() and ice_up() if the netdev was up before configuring switchdev. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

Removing control plane VSI result in no information about slow-path statistic. In current solution statistics need to be counted in driver. Patch is based on similar implementation done by Simon Horman in nfp: commit eadfa4c ("nfp: add stats and xmit helpers for representors") Add const modifier to netdev parameter in ice_netdev_to_repr(). It isn't (and shouldn't be) modified in the function. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

ivecera and others added 30 commits January 24, 2024 09:54

i40e: Use existing helper to find flow director VSI

f0c7450

Use existing i40e_find_vsi_by_type() to find a VSI associated with flow director. Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com>

ice: remove duplicate comment

b6e59a3

Remove a comment that was not correct; this structure has nothing to do with FW alignment. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>

mfijalko and others added 10 commits January 24, 2024 10:33

ice: change repr::id values

7b0fdfb

Instead of getting repr::id from xa_alloc() value, set it to the src_vsi::num_vsi value. It is unique for each PR. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

mswiatko force-pushed the dev-queue branch 2 times, most recently from 04fbb9f to 4bd95f6 Compare February 9, 2024 07:16

mswiatko force-pushed the dev-queue branch from 4bd95f6 to 3128acb Compare March 1, 2024 11:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ice: switchdev less resource iwl v1 #7

ice: switchdev less resource iwl v1 #7

Uh oh!

mswiatko commented Jan 25, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

16 participants

ice: switchdev less resource iwl v1 #7

Are you sure you want to change the base?

ice: switchdev less resource iwl v1 #7

Uh oh!

Conversation

mswiatko commented Jan 25, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

16 participants