Skip to content

[refactor] Semantic Function Clustering Analysis: Persistent Outliers and New Findings #21830

@github-actions

Description

@github-actions

Analysis of all non-test Go source files in pkg/ (592 files, ~2,500+ functions) confirms several refactoring opportunities that remain unresolved from previous runs, plus two newly identified outliers.

Analysis Summary

Package Files Primary Concern
pkg/workflow ~300 Compiler, engines, validation
pkg/cli ~200 Commands, codemods, logs
pkg/parser ~40 Frontmatter, YAML, schema

Positive progress since last analysis: package_extraction.go was introduced with a well-designed PackageExtractor struct. All extract*FromCommands functions (extractPipFromCommands, extractNpxFromCommands, extractGoFromCommands, extractUvFromCommands) now delegate to it. ✅


Unresolved Issues (Carried from Previous Analyses)


1. Generic Utilities Buried in metrics.go

File: pkg/workflow/metrics.go

Three exported utility functions remain in a metrics-specific file:

  • ConvertToInt(val any) int (line 198)
  • ConvertToFloat(val any) float64 (line 220)
  • PrettifyToolName(toolName string) string (line 237)

These are called from pkg/cli/audit_report.go and pkg/cli/logs_report.go, meaning cli package consumers must know these utilities live in a metrics file.

Recommendation: Move to pkg/workflow/strings.go (which already hosts SanitizeName, SanitizeWorkflowName, etc.) or a new pkg/workflow/type_utils.go.
Effort: ~15 min


2. mcp_github_config.go Mixed Responsibilities (Growing)

File: pkg/workflow/mcp_github_config.go

This file now has more content than when previously flagged. It contains two distinct concerns:

Pure GitHub config accessors (standalone functions):
hasGitHubTool, hasGitHubApp, getGitHubType, getGitHubToken, getGitHubReadOnly, getGitHubLockdown, hasGitHubLockdownExplicitlySet, getGitHubToolsets, expandDefaultToolset, getGitHubAllowedTools, getGitHubGuardPolicies, deriveSafeOutputsGuardPolicyFromGitHub, transformRepoPattern, deriveWriteSinkGuardPolicyFromWorkflow, getGitHubDockerImageVersion

Compiler step-generation methods (methods on *Compiler):

  • (*Compiler).generateGitHubMCPLockdownDetectionStep
  • (*Compiler).generateGitHubMCPAppTokenMintingStep
  • (*Compiler).generateGitHubMCPAppTokenInvalidationStep

The deriveSafeOutputsGuardPolicyFromGitHub and deriveWriteSinkGuardPolicyFromWorkflow functions are new since the last analysis, further growing the mixed-responsibility surface.

Recommendation: Extract the three (*Compiler) methods into compiler_github_mcp_steps.go (following the compiler_pre_activation_job.go naming pattern), leaving mcp_github_config.go as a clean config-accessor module.
Effort: ~30 min


3. compiler_yaml_helpers.go Step-Builder Functions

File: pkg/workflow/compiler_yaml_helpers.go

Step-generation free functions are mixed with YAML/path utilities:

Step builders (should move):

  • generateGitHubScriptWithRequire — called from 14 sites across 5 files
  • generateInlineGitHubScriptStep — called from cache.go
  • generatePlaceholderSubstitutionStep

Legitimate YAML/path utilities (keep):
ContainsCheckout, GetWorkflowIDFromPath, SanitizeWorkflowIDForCacheKey, ConvertStepToYAML, unquoteUsesWithComments, getInstallationVersion, versionToGitRef

Compiler methods (keep):
(*Compiler).convertStepToYAML, (*Compiler).generateCheckoutActionsFolder, (*Compiler).generateSetupStep, (*Compiler).generateSetRuntimePathsStep, (*Compiler).renderStepFromMap

Recommendation: Extract generateGitHubScriptWithRequire, generateInlineGitHubScriptStep, and generatePlaceholderSubstitutionStep into compiler_github_actions_steps.go.
Effort: 1–2 hours


4. CopilotEngine Firewall Methods in Wrong File

Files: pkg/workflow/copilot_logs.go, pkg/workflow/copilot_engine.go

GetFirewallLogsCollectionStep and GetSquidLogsSteps are defined on *CopilotEngine in copilot_logs.go, but the same methods on *ClaudeEngine and *CodexEngine live in their respective main engine files:

Method ClaudeEngine CodexEngine CopilotEngine
GetFirewallLogsCollectionStep claude_engine.go codex_engine.go copilot_logs.go ⚠️
GetSquidLogsSteps claude_engine.go codex_engine.go copilot_logs.go ⚠️

Additionally, (*CopilotEngine).GetCleanupStep is also in copilot_logs.go.

Recommendation: Move GetFirewallLogsCollectionStep, GetSquidLogsSteps, and GetCleanupStep from copilot_logs.go to copilot_engine.go (or a new copilot_engine_firewall.go following the copilot_engine_execution.go / copilot_engine_installation.go pattern).
Effort: ~30 min


5. compile_helpers.go vs compile_orchestration.go Naming Ambiguity

Package: pkg/cli

Three files share compilation responsibilities without a clear naming contract:

File Functions Role
compile_orchestrator.go CompileWorkflows only Public entrypoint
compile_helpers.go compileSingleFile, compileAllWorkflowFiles, compileModifiedFilesWithDependencies, handleFileDeleted, trackWorkflowFailure, printCompilationSummary Mixed file ops + summary
compile_orchestration.go compileSpecificFiles, compileAllFilesInDirectory, collectPurgeData, runPurgeOperations, displayScheduleWarnings, runPostProcessing, runPostProcessingForDirectory, outputResults Pipeline stages

The distinction between "helpers" and "orchestration" is unclear. compileSingleFile and compileSpecificFiles have overlapping purposes.

Recommendation: Rename compile_helpers.gocompile_file_operations.go (low-level file ops) and compile_orchestration.gocompile_pipeline.go (multi-file pipeline stages).
Effort: ~1 hour


6. codemod_yaml_utils.go Misleading Name

File: pkg/cli/codemod_yaml_utils.go

Contains 9 YAML frontmatter manipulation utilities used across 31+ files in pkg/cli/:
reconstructContent, parseFrontmatterLines, getIndentation, isTopLevelKey, isNestedUnder, hasExitedBlock, findAndReplaceInLine, applyFrontmatterLineTransform, removeFieldFromBlock

The codemod_ prefix implies these are specific to one codemod, but they are shared YAML frontmatter utilities used across all codemods.

Recommendation: Rename to yaml_frontmatter_utils.go.
Effort: ~5 min (rename + verify)


New Findings (First Reported This Run)


7. collectPackagesFromWorkflow in runtime_validation.go

File: pkg/workflow/runtime_validation.go (line 292)

// collectPackagesFromWorkflow is a generic helper that collects packages from workflow data
// using the provided extractor function. It deduplicates packages and optionally extracts
// from MCP tool configurations when toolCommand is provided.
func collectPackagesFromWorkflow(workflowData *WorkflowData, extractFn func(string) []string, toolCommand string) []string

This generic package-extraction bridge function is defined in a validation file, but it has no validation logic — it is called exclusively from extraction functions:

  • pip.go:41collectPackagesFromWorkflow(workflowData, extractPipFromCommands, "")
  • pip.go:58collectPackagesFromWorkflow(workflowData, extractUvFromCommands, "")
  • npm.go:28collectPackagesFromWorkflow(workflowData, extractNpxFromCommands, "npx")
  • dependabot.go:602collectPackagesFromWorkflow(workflowData, extractGoFromCommands, "")

Meanwhile package_extraction.go was recently added to house the PackageExtractor struct — the perfect home for this function too.

Recommendation: Move collectPackagesFromWorkflow from runtime_validation.go to package_extraction.go to co-locate all package-extraction infrastructure.
Effort: ~10 min


8. computeEffective* Token Helpers in compiler_safe_outputs_steps.go

File: pkg/workflow/compiler_safe_outputs_steps.go

Three standalone (non-receiver) functions compute token/URL values before the (*Compiler) methods begin:

func computeEffectivePRCheckoutToken(safeOutputs *SafeOutputsConfig) (token string, isCustom bool)
func computeEffectiveProjectToken(perConfigToken string, safeOutputsToken string) string
func computeProjectURLAndToken(safeOutputs *SafeOutputsConfig) (projectURL, projectToken string)

And buildCustomSafeOutputJobsJSON also has no receiver. These are config-computation helpers, not step-emission functions. safe_outputs_config_helpers.go already exists for exactly this purpose (12 generateMax* and generateAssign* config-generation helpers with consistent naming).

Recommendation: Move these four standalone functions to safe_outputs_config_helpers.go.
Effort: ~20 min


Refactoring Recommendations

Priority 1 — High Impact, Low Risk

  1. Move collectPackagesFromWorkflow from runtime_validation.gopackage_extraction.go

    • Completes the package extraction consolidation started with PackageExtractor
  2. Extract (*Compiler).generateGitHubMCP* methods from mcp_github_config.gocompiler_github_mcp_steps.go

    • mcp_github_config.go becomes a clean config-accessor module
  3. Move ConvertToInt/ConvertToFloat/PrettifyToolName from metrics.gostrings.go or type_utils.go

    • Improves cross-package discoverability

Priority 2 — Medium Impact

  1. Move CopilotEngine firewall methods from copilot_logs.gocopilot_engine.go

    • Restores consistency with ClaudeEngine and CodexEngine placement
  2. Move computeEffective* functions from compiler_safe_outputs_steps.gosafe_outputs_config_helpers.go

    • Better alignment with existing subsystem file structure
  3. Extract step-builder functions from compiler_yaml_helpers.gocompiler_github_actions_steps.go

Priority 3 — Naming/Cosmetic

  1. Rename codemod_yaml_utils.goyaml_frontmatter_utils.go
  2. Rename compile_helpers.gocompile_file_operations.go and compile_orchestration.gocompile_pipeline.go

Implementation Checklist

  • Move collectPackagesFromWorkflow to package_extraction.go
  • Extract (*Compiler).generateGitHubMCP* methods to compiler_github_mcp_steps.go
  • Move ConvertToInt/ConvertToFloat/PrettifyToolName out of metrics.go
  • Relocate CopilotEngine.GetFirewallLogsCollectionStep etc. to copilot_engine.go
  • Move computeEffective* helpers to safe_outputs_config_helpers.go
  • Extract step-builders from compiler_yaml_helpers.go
  • Rename codemod_yaml_utils.go and compile orchestration files

Analysis Metadata

Property Value
Total Go files analyzed 592 (non-test, pkg/)
Functions cataloged ~2,500+
Issues confirmed unresolved 6
New issues found 2
Detection method Serena semantic analysis (gopls LSP) + naming pattern analysis
Analysis date 2026-03-19
Workflow run §23307567041

References:

Generated by Semantic Function Refactoring ·

  • expires on Mar 21, 2026, 5:26 PM UTC

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions