Skip to content

Conversation

@harp-intel
Copy link
Contributor

Expanding uncore events to per-device events creates a very long list of event groups. In some cases the list is too large for perf to handle resulting in an "Argument list too long" error from perf. We have worked around this in the past by abbreviating event names to shorten the argument list.

Through investigating the problem above, we found that the individual uncore device event counters are not used independently. They are added together, and the sum is used. The original intent of uncore event expansion is unknown.

In this PR, we eliminate the expansion of uncore events.

Copilot Description Follows

This pull request refactors the metrics event processing pipeline to simplify event handling and improve aggregation logic, especially for uncore events. It removes legacy abbreviation and group expansion logic, consolidates event bucketing and aggregation steps, and clarifies group assignment for metric variables. These changes streamline the codebase and make the event handling more explicit and maintainable.

Event bucketing and aggregation improvements:

  • Replaced the coalesceEvents function with bucketEvents and added aggregateUncoreEvents to sum and deduplicate uncore events at socket granularity, removing the need for post-processing uncore group collapsing. (cmd/metrics/event_frame.go) [1] [2]
  • Removed the legacy collapseUncoreGroupsInFrame and related helper functions, as uncore event aggregation is now handled earlier in the pipeline. (cmd/metrics/event_frame.go)

Group assignment and parsing changes:

  • Added assignEventsToGroups to explicitly assign events to groups based on event group definitions after aggregation, separating parsing and grouping logic. (cmd/metrics/event_frame.go) [1] [2]
  • Simplified parseEvents to only unmarshal JSON and parse values, removing abbreviation and group assignment logic from this step. (cmd/metrics/event_frame.go) [1] [2]

Removal of legacy abbreviation and expansion logic:

  • Removed event name abbreviation and uncore group expansion functions, as these are no longer needed with the new aggregation approach. (cmd/metrics/event_defs.go, cmd/metrics/metric_defs.go) [1] [2] [3] [4]

Metric variable assignment clarification:

  • Improved error handling and clarified logic for assigning metric variable values, ensuring correct group and variable name mapping, and adjusting C-state residency only when appropriate. (cmd/metrics/metric.go)

Dependency cleanup:

  • Removed unused imports such as regexp and slices from cmd/metrics/event_defs.go.

Signed-off-by: Harper, Jason M <jason.m.harper@intel.com>
Signed-off-by: Harper, Jason M <jason.m.harper@intel.com>
@harp-intel harp-intel requested a review from Copilot August 17, 2025 19:20

This comment was marked as outdated.

@harp-intel harp-intel requested a review from Copilot August 17, 2025 21:29
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the uncore event collection and handling pipeline to eliminate per-device event expansion, which was causing "Argument list too long" errors with perf. The refactoring simplifies event processing by aggregating uncore events at collection time rather than expanding and later collapsing them.

  • Removes uncore event expansion and abbreviation logic that was previously used to manage long argument lists
  • Introduces event bucketing and aggregation functions to handle uncore events more efficiently
  • Separates event parsing from group assignment for clearer code organization

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
cmd/metrics/event_defs.go Removes event name abbreviation and uncore group expansion functions, along with unused imports
cmd/metrics/event_frame.go Replaces coalesceEvents with bucketEvents and adds aggregateUncoreEvents function; removes legacy uncore group collapsing logic
cmd/metrics/metric.go Improves error handling for metric variable assignment and clarifies C-state residency adjustment logic
cmd/metrics/metric_defs.go Removes abbreviation of event names in metric expressions

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@harp-intel harp-intel linked an issue Aug 17, 2025 that may be closed by this pull request
Signed-off-by: Harper, Jason M <jason.m.harper@intel.com>
Signed-off-by: Harper, Jason M <jason.m.harper@intel.com>
Signed-off-by: Harper, Jason M <jason.m.harper@intel.com>
@harp-intel harp-intel merged commit 5a49279 into main Aug 19, 2025
5 checks passed
@harp-intel harp-intel deleted the smpluncore branch August 19, 2025 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: Argument list too long

2 participants