Conversation
Signed-off-by: Jain Johny <jj@asama.ai>
WalkthroughAdds PCI device name resolution to the pcidevice collector via a pci.ids loader and flags, extends sysfs fixtures with a new PCI device and SR‑IOV entries, introduces new PCI state metrics, and adds a unit test comparing gathered metrics to expected output. Changes
Sequence DiagramsequenceDiagram
participant Test as Test Harness
participant Collector as pcidevice Collector
participant PCIIds as pci.ids loader
participant SysFS as fixture sysfs (sys.ttar)
Note over Test,Collector: Test configures flags to enable names and set idsfile
Test->>Collector: Start collector (Update)
Collector->>PCIIds: loadPCIIds(path(s))
PCIIds-->>Collector: parsed vendors/devices/classes
Collector->>SysFS: read PCI device sysfs entries
SysFS-->>Collector: device attributes, links, SR‑IOV info
Collector->>Collector: resolve names (vendor/device/subsystem/class)
Collector->>Test: emit Prometheus metrics (with name labels)
Test->>Test: compare emitted metrics to `pcidevice-names-output.txt`
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (10)
collector/pcidevice_linux_test.go (3)
40-41: Remove commented code.Line 40 contains commented-out code. If this alternative path is not needed, it should be removed rather than left as a comment.
"--collector.pcidevice", "--collector.pcidevice.names", - // "--collector.pcidevice.idsfile", "/usr/share/misc/pci.ids", "--collector.pcidevice.idsfile", "fixtures/pci.ids",
72-85: Improve error handling in the Collect method.The goroutine panics on error, which could crash the entire test suite. Consider proper error propagation instead.
func (tc *testPCICollector) Collect(ch chan<- prometheus.Metric) { sink := make(chan prometheus.Metric) + errCh := make(chan error, 1) go func() { err := tc.pc.Update(sink) if err != nil { - panic(fmt.Errorf("failed to update collector: %s", err)) + errCh <- fmt.Errorf("failed to update collector: %w", err) + close(sink) + return } + errCh <- nil close(sink) }() for m := range sink { ch <- m } + + if err := <-errCh; err != nil { + // Log error but don't panic - let the test framework handle it + fmt.Printf("Error in collector update: %v\n", err) + } }
32-44: Isolate kingpin flag parsing in tests — don't modify global CommandLine.collector/pcidevice_linux_test.go (lines 32–44) calls kingpin.CommandLine.Parse; the same pattern appears in collector/paths_test.go, collector/processes_linux_test.go, collector/textfile_test.go, collector/ipvs_linux_test.go, and collector/filesystem_linux_test.go. Use a test-local kingpin app (kingpin.New) or reset/isolate CommandLine between tests, or avoid t.Parallel for these tests to prevent global-flag interference.
collector/fixtures/pci.ids (1)
26-26: File is missing a trailing newline.POSIX-compliant text files should end with a newline character.
17aa Lenovo +collector/fixtures/pcidevice-names-output.txt (1)
74-74: File is missing a trailing newline.POSIX-compliant text files should end with a newline character.
node_pcidevice_sriov_vf_total_msix{bus="45",device="00",function="0",segment="0000"} 0 +collector/pcidevice_linux.go (2)
326-487: Consider extracting PCI ID parsing logic to a separate package.The
loadPCIIdsfunction is quite large (160+ lines) and handles multiple concerns. Consider refactoring this into a separate package or at least breaking it down into smaller helper functions for better maintainability and testability.Would you like me to help refactor this function into smaller, more focused functions or create a separate PCI IDs parser package?
551-588: Improve error message for unknown class.The function returns a formatted error string for unknown classes but not for other lookup functions. Consider consistent error handling across all helper functions.
// Return the original class ID if not found - return "Unknown class (" + classID + ")" + return classID // Return ID if name not found, consistent with other functions }collector/fixtures/sys.ttar (3)
5110-5129: Fix sysfs modes for SR‑IOV attributes on PF 0000:01:00.0 (fidelity).Kernel exposes read‑only SR‑IOV attributes as 0444. Here
sriov_totalvfsandsriov_vf_total_msixare marked 0644. Please correct for fixture fidelity. Optionally addvirtfn*symlinks if tests need VF enumeration.Apply within this hunk:
Path: sys/devices/pci0000:00/0000:00:02.1/0000:01:00.0/sriov_totalvfs Lines: 1 8 -Mode: 644 +Mode: 444 @@ Path: sys/devices/pci0000:00/0000:00:02.1/0000:01:00.0/sriov_vf_total_msix Lines: 1 16 -Mode: 644 +Mode: 444
5488-5507: Normalize SR‑IOV attribute modes on upstream port 0000:00:02.1.Same fidelity issue:
sriov_totalvfsandsriov_vf_total_msixshould be read‑only (0444).Path: sys/devices/pci0000:00/0000:00:02.1/sriov_totalvfs Lines: 1 0 -Mode: 644 +Mode: 444 @@ Path: sys/devices/pci0000:00/0000:00:02.1/sriov_vf_total_msix Lines: 1 0 -Mode: 644 +Mode: 444
5936-6078: Great coverage on 0000:45:00.0; consider trimming/splitting the fixture to keep tests fast and focused.The NIC subtree is very complete (AER, hwmon, i2c, PTP/ptp0, wakeup, VPD, power states). To reduce IO and parsing cost in unit tests:
- Split this into smaller overlays (e.g., pci_sriov.ttar, pci_power.ttar, pci_ptp.ttar) and compose in tests that need them.
- Drop large counters/statistics that aren’t asserted.
- Keep at least one example per state variant (power/control=on vs auto, runtime_status=active vs suspended) — you already cover both across nodes.
Would you like a PR to add minimal overlays and update the tests to compose them?
Also applies to: 6111-6147, 6172-6213, 6318-6332, 7617-7666, 7714-7857, 7897-7969
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
collector/fixtures/pci.ids(1 hunks)collector/fixtures/pcidevice-names-output.txt(1 hunks)collector/fixtures/sys.ttar(4 hunks)collector/pcidevice_linux.go(5 hunks)collector/pcidevice_linux_test.go(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
collector/pcidevice_linux_test.go (2)
collector/pcidevice_linux.go (1)
NewPcideviceCollector(124-172)collector/collector.go (1)
Collector(180-183)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Analyze (go)
🔇 Additional comments (14)
collector/pcidevice_linux_test.go (2)
1-13: LGTM!The copyright header and build tags are correctly formatted.
87-89: LGTM!The no-op Describe method is appropriate for test collectors.
collector/fixtures/pci.ids (2)
1-3: LGTM!Clear header comments explaining the purpose of this test fixture file.
4-11: LGTM!The PCI class definitions follow the standard pci.ids format correctly with proper indentation for the hierarchy.
collector/fixtures/pcidevice-names-output.txt (2)
1-3: LGTM!Clear documentation of the test fixture's purpose.
24-31: Good test coverage with varied examples.The test fixture provides excellent coverage with three diverse PCI devices (AMD bridge, Micron NVMe, Intel network controller), testing different vendor/device combinations and name resolution scenarios.
collector/pcidevice_linux.go (6)
37-43: LGTM!Good implementation of standard PCI ID file search paths and configuration flags.
68-102: LGTM!Well-structured Prometheus metric descriptors with clear, informative descriptions for the new PCI device metrics.
130-171: LGTM!The initialization logic correctly handles conditional label addition based on name resolution settings and properly initializes all metric descriptors.
193-211: LGTM!Proper formatting of IDs to hex strings and correct conditional name resolution.
215-283: Good defensive programming with nil checks.Excellent implementation of nil-safe access to device fields with appropriate fallback values. The consistent pattern of checking for nil before dereferencing pointers prevents potential panics.
304-324: LGTM!Clean implementation of the power state enum to numeric conversion with proper handling of all enum values.
collector/fixtures/sys.ttar (2)
874-876: Good addition: bus->devices symlink for new PCI function.This ensures 0000:45:00.0 is discoverable via sysfs canonical path and via sys/bus/pci/devices. LGTM.
7901-7934: PF 0000:45:00.0 SR‑IOV set looks good; addvirtfn*symlinks if VF enumeration is scraped.You have
sriov_totalvfs/numvfs/stride/offsetandsriov_vf_device. If the collector exports VF count only, this is sufficient. If it also walks VFs, addvirtfn0..virtfnNsymlinks.
Signed-off-by: Jain Johny <jj@asama.ai>
* add sriov, power info support and pci id name resolution Signed-off-by: Jain Johny <jj@asama.ai> * fix/remove debug lines Signed-off-by: Jain Johny <jj@asama.ai> --------- Signed-off-by: Jain Johny <jj@asama.ai>
Summary by CodeRabbit
New Features
Tests
Chores