Skip to content

[refactor] Semantic Function Clustering Analysis - Refactoring Opportunities #9973

@github-actions

Description

@github-actions

🔧 Semantic Function Clustering Analysis

Automated analysis of repository: githubnext/gh-aw

Overview

This report analyzes Go source code organization across the repository to identify refactoring opportunities through semantic function clustering, outlier detection, and duplicate analysis.

Key Findings:

  • Strong file organization by feature (compiler_, mcp_, runtime_, etc.)
  • ⚠️ 30+ parse*Config functions with similar boilerplate that could be consolidated
  • ⚠️ 14 helper files (2,198 lines) with utility functions - some consolidation opportunities
  • ⚠️ 34 validation files but validation functions scattered across 14+ non-validation files
  • ℹ️ Excellent string utility organization in pkg/stringutil package
Full Report

Analysis Summary

Scope:

  • Total Go Files Analyzed: 422 (excluding tests)
  • Total Functions Cataloged: 3,330+ (pkg/workflow alone)
  • Function Clusters Identified: 21 major clusters
  • Analysis Date: 2026-01-14

Package Distribution:

pkg/workflow:  224 files (largest package)
pkg/cli:       132 files
pkg/parser:     26 files
pkg/campaign:   11 files
pkg/console:    10 files
Other utils:    19 files (stringutil, logger, mathutil, etc.)

Function Clustering Results

Cluster Analysis by Prefix

pkg/workflow shows excellent organization by feature:

Prefix Count Status
compiler_* 21 ✅ Well-organized
safe_* 19 ✅ Good grouping
create_* 8 ✅ Creation functions grouped
copilot_* 8 ✅ Engine-specific
update_* 7 ✅ Update operations grouped
runtime_* 6 ✅ Runtime detection/validation
action_* 6 ✅ Action processing
mcp_* 5 ✅ MCP server integration
github_* 5 ✅ GitHub API integration
frontmatter_* 5 ✅ Frontmatter parsing
expression_* 5 ✅ Expression parsing/validation
engine_* 4 ✅ Engine management
claude_* 4 ✅ Claude engine
bundler_* 4 ✅ JS bundling
codex_* 3 ✅ Codex engine
*_helpers 14 files ⚠️ See helpers analysis below
*_validation 34 files ⚠️ See validation analysis below

pkg/cli also shows good organization:

Prefix Count Status
mcp_* 18 ✅ MCP CLI commands
compile_* 14 ✅ Compilation commands
logs_* 11 ✅ Logging functionality
update_* 9 ✅ Update operations
run_* 7 ✅ Workflow execution
Others Various ✅ Generally well-organized

Identified Issues

1. Repetitive parse*Config Pattern (Medium Priority)

Issue: 30+ parse*Config functions follow nearly identical patterns with boilerplate code.

Examples:

// Pattern repeated across 30+ files:

// pkg/workflow/add_labels.go
func (c *Compiler) parseAddLabelsConfig(outputMap map[string]any) *AddLabelsConfig {
    if _, exists := outputMap["add-labels"]; !exists {
        return nil
    }
    addLabelsLog.Print("Parsing add-labels configuration")
    var config AddLabelsConfig
    if err := unmarshalConfig(outputMap, "add-labels", &config, addLabelsLog); err != nil {
        addLabelsLog.Printf("Failed to unmarshal config: %v", err)
        return &AddLabelsConfig{}
    }
    addLabelsLog.Printf("Parsed configuration: ...")
    return &config
}

// pkg/workflow/add_reviewer.go
func (c *Compiler) parseAddReviewerConfig(outputMap map[string]any) *AddReviewerConfig {
    if _, exists := outputMap["add-reviewer"]; !exists {
        return nil
    }
    addReviewerLog.Print("Parsing add-reviewer configuration")
    var config AddReviewerConfig
    if err := unmarshalConfig(outputMap, "add-reviewer", &config, addReviewerLog); err != nil {
        addReviewerLog.Printf("Failed to unmarshal config: %v", err)
        config = AddReviewerConfig{}
    }
    if config.Max == 0 {
        config.Max = 3
    }
    // ... more boilerplate
}

// Similar pattern in:
// - parseAssignMilestoneConfig
// - parseAssignToAgentConfig
// - parseCloseIssuesConfig
// - parseCreateProjectsConfig
// - parseDispatchWorkflowConfig
// - parseHideCommentConfig
// - parsePullRequestsConfig
// ... and 20+ more

Impact:

  • Code duplication: Similar error handling and logging across all functions
  • Maintenance burden: Changes to parsing logic require updates in 30+ places
  • Inconsistency risk: Subtle differences in error handling between implementations

Recommendation:
Consider using Go generics (Go 1.18+) to create a unified config parser:

func parseConfig[T any](
    c *Compiler, 
    outputMap map[string]any, 
    key string, 
    logger *logger.Logger,
    defaults func(*T),
) *T {
    if _, exists := outputMap[key]; !exists {
        return nil
    }
    logger.Printf("Parsing %s configuration", key)
    var config T
    if err := unmarshalConfig(outputMap, key, &config, logger); err != nil {
        logger.Printf("Failed to unmarshal config: %v", err)
        return &config // or handle error as needed
    }
    if defaults != nil {
        defaults(&config)
    }
    return &config
}

Estimated Impact:

  • Reduced code by ~500-800 lines
  • Single source of truth for config parsing logic
  • Easier to add new config types

2. Helper File Proliferation (Low-Medium Priority)

Issue: 14 separate *_helpers.go files with 2,198 lines of utility functions.

Helper Files Found:

File Functions Lines Purpose
config_helpers.go 14 ~350 Config parsing utilities
error_helpers.go 15 ~300 Error construction
engine_helpers.go 8 ~200 Engine setup utilities
safe_outputs_config_generation_helpers.go 10 ~250 Safe output config generation
safe_outputs_config_helpers.go 3 ~100 Safe output utilities
safe_outputs_config_helpers_reflection.go Various ~300 Reflection-based helpers
compiler_yaml_helpers.go 7 ~200 YAML generation utilities
compiler_test_helpers.go 3 ~150 Test utilities (appropriate location)
close_entity_helpers.go 4 ~100 Entity closing logic
update_entity_helpers.go 5 ~150 Entity update logic
validation_helpers.go 1 ~30 Single validation function
git_helpers.go 1 ~20 Single git function
map_helpers.go 2 ~48 Map utilities
CLI: compile_helpers.go Various ~200 Compilation helpers

Analysis:

Good Practices:

  • Most helpers are appropriately scoped to their feature area
  • config_helpers.go centralizes config parsing logic
  • error_helpers.go provides consistent error handling

⚠️ Potential Issues:

  • validation_helpers.go - Only 1 function (30 lines) - could be inlined or moved
  • git_helpers.go - Only 1 function (20 lines) - consider moving to pkg/gitutil
  • map_helpers.go - Only 2 functions (48 lines) - consider moving to a utils package
  • Multiple "config helpers" files - could potentially be consolidated

Recommendation:

Priority 1: Consolidate Single-Function Helpers

  1. Move GetCurrentGitTag() from git_helpers.go to pkg/gitutil/gitutil.go
  2. Move validateIntRange() from validation_helpers.go to validation.go or inline it
  3. Move parseIntValue() and filterMapKeys() from map_helpers.go to config_helpers.go

Priority 2: Consider Consolidation

  • Merge safe_outputs_config_helpers*.go files into a single safe_outputs_config_utilities.go
  • Review if close_entity_helpers.go and update_entity_helpers.go could share common logic

Estimated Impact:

  • Remove 3 tiny helper files
  • Reduce file count by 3-5 files
  • Improve discoverability of utility functions

3. Validation Functions Outside Validation Files (Medium Priority)

Issue: 34 dedicated validation files exist, but validation functions are scattered across 14+ non-validation files.

Validation Files (Well-Organized):

pkg/workflow/agent_validation.go
pkg/workflow/bundler_runtime_validation.go
pkg/workflow/bundler_safety_validation.go
pkg/workflow/bundler_script_validation.go
pkg/workflow/compiler_filters_validation.go
pkg/workflow/dangerous_permissions_validation.go
pkg/workflow/dispatch_workflow_validation.go
pkg/workflow/docker_validation.go
pkg/workflow/engine_validation.go
pkg/workflow/expression_validation.go
pkg/workflow/features_validation.go
pkg/workflow/firewall_validation.go
pkg/workflow/mcp_config_validation.go
pkg/workflow/mcp_gateway_schema_validation.go
pkg/workflow/npm_validation.go
pkg/workflow/pip_validation.go
pkg/workflow/repository_features_validation.go
pkg/workflow/runtime_validation.go
pkg/workflow/safe_output_validation_config.go
pkg/workflow/safe_outputs_domains_validation.go
pkg/workflow/sandbox_validation.go
pkg/workflow/schema_validation.go
pkg/workflow/secrets_validation.go
pkg/workflow/step_order_validation.go
pkg/workflow/strict_mode_validation.go
pkg/workflow/template_validation.go
pkg/workflow/validation.go
pkg/workflow/validation_helpers.go
... and more in cli/parser packages

Files with Validation Functions (Not Dedicated Validation Files):

action_sha_checker.go        - ValidateActionSHAsInLockFile()
agentic_engine.go            - validateHTTPTransportSupport(), validateMaxTurnsSupport(), validateWebSearchSupport()
artifact_manager.go          - ValidateDownload(), ValidateAllDownloads()
compiler_types.go            - (validation in types file)
config_helpers.go            - (validation functions mixed with parsing)
error_helpers.go             - (error validation)
github_tool_to_toolset.go    - (toolset validation)
imports.go                   - (import validation)
jobs.go                      - (job validation)
js.go                        - validateNoRuntimeMixing(), validateNoLocalRequires(), validateNoModuleReferences()
mcp_renderer.go              - (MCP validation)
permissions_validator.go     - (permission validation - good name but not *_validation.go)
repo_memory.go               - (memory validation)
safe_outputs_app.go          - (app validation)

Specific Examples:

// pkg/workflow/agentic_engine.go - validation functions in engine file
func (c *Compiler) validateAgentFile(workflowData *WorkflowData, markdownPath string) error { ... }
func (c *Compiler) validateHTTPTransportSupport(tools map[string]any, engine CodingAgentEngine) error { ... }
func (c *Compiler) validateMaxTurnsSupport(frontmatter map[string]any, engine CodingAgentEngine) error { ... }
func (c *Compiler) validateWebSearchSupport(tools map[string]any, engine CodingAgentEngine) { ... }
func (c *Compiler) validateWorkflowRunBranches(workflowData *WorkflowData, markdownPath string) error { ... }

// Recommendation: Move to agent_validation.go
// pkg/workflow/js.go - validation functions in JavaScript bundler file
func validateNoRuntimeMixing(mainScript string, sources map[string]string, targetMode RuntimeMode) error { ... }
func validateRuntimeModeRecursive(content string, currentPath string, sources map[string]string, targetMode RuntimeMode, checked map[string]bool) error { ... }
func validateNoLocalRequires(bundledContent string) error { ... }
func validateNoModuleReferences(bundledContent string) error { ... }
func ValidateEmbeddedResourceRequires(sources map[string]string) error { ... }

// Recommendation: Move to js_validation.go or bundler_validation.go
// pkg/workflow/artifact_manager.go - validation in manager file
func (am *ArtifactManager) ValidateDownload(download *ArtifactDownload) error { ... }
func (am *ArtifactManager) ValidateAllDownloads() []error { ... }

// These are fine - methods on the manager struct

Recommendation:

Priority 1: Move Validation to Dedicated Files

  1. Create js_validation.go and move all validation functions from js.go
  2. Move validation functions from agentic_engine.go to agent_validation.go
  3. Create github_toolset_validation.go and move validation from github_tool_to_toolset.go

Priority 2: Review and Consolidate

  • Review permissions_validator.go - rename to permissions_validation.go for consistency
  • Consider if validation functions in small utility files should be colocated or moved

Why This Matters:

  • Consistency: Developers expect validation logic in *_validation.go files
  • Discoverability: Easier to find validation functions when they follow naming conventions
  • Testing: Validation tests are easier to organize when validation functions are grouped

Estimated Impact:

  • Create 2-3 new validation files
  • Move 10-15 validation functions to appropriate files
  • Improve code organization consistency

4. String Utility Organization (Exemplary - No Action Needed)

Status: ✅ Excellent Organization

The pkg/stringutil package demonstrates exemplary organization:

pkg/stringutil/
├── identifiers.go     - Workflow name normalization, file path conversions
├── paths.go           - Path normalization
├── sanitize.go        - String sanitization functions
└── stringutil.go      - General string utilities (Truncate, NormalizeWhitespace)

Well-Documented Pattern:

  • Clear separation between "sanitize" (character validity) and "normalize" (format conversion)
  • Documented in pkg/workflow/strings.go with guidance on when to use each pattern
  • Separate test files for each concern

This is a model for other packages to follow!


Detailed Function Clusters

Cluster 1: Config Parsing Functions ⚠️

Pattern: parse*Config functions
Files: 30+ files in pkg/workflow
Total Functions: ~35-40

Functions:

  • parseAddLabelsConfig, parseAddReviewerConfig, parseAssignMilestoneConfig
  • parseAssignToAgentConfig, parseAssignToUserConfig, parseCloseIssuesConfig
  • parseCloseDiscussionsConfig, parseCommentsConfig, parseCopyProjectsConfig
  • parseCreateProjectsConfig, parseDiscussionsConfig, parseDispatchWorkflowConfig
  • parseHideCommentConfig, parseIssuesConfig, parseLinkSubIssueConfig
  • parseMissingDataConfig, parseMissingToolConfig, parseNoOpConfig
  • parsePullRequestsConfig, parsePushToPullRequestBranchConfig
  • parseUpdateEntityConfig, parseUpdateIssuesConfig, parseUpdateDiscussionsConfig
  • ... and more

Analysis: Could benefit from generic implementation to reduce duplication.


Cluster 2: Engine-Specific Functions ✅

Pattern: {engine}_* (claude_, copilot_, codex_)
Files: Well-organized by engine type
Status: ✅ Good organization

Examples:

  • claude_engine.go, claude_logs.go, claude_mcp.go, claude_tools.go
  • copilot_engine.go, copilot_logs.go, copilot_mcp.go, copilot_engine_execution.go
  • codex_engine.go, codex_logs.go, codex_mcp.go

Analysis: Excellent separation by AI engine with consistent file naming.


Cluster 3: Compiler Functions ✅

Pattern: compiler_*
Files: 21 files in pkg/workflow
Status: ✅ Well-organized

Files:

  • compiler.go - Main compilation entry point
  • compiler_activation_jobs.go - Activation job generation
  • compiler_filters_validation.go - Filter validation
  • compiler_jobs.go - Job compilation
  • compiler_orchestrator.go - Orchestration logic
  • compiler_safe_output*.go (7 files) - Safe output compilation
  • compiler_types.go - Compiler types and structs
  • compiler_yaml*.go (4 files) - YAML generation
  • ... and more

Analysis: Excellent breakdown of compiler functionality into focused files.


Cluster 4: Safe Outputs ✅

Pattern: safe_output* and safe_outputs_*
Files: 19 files
Status: ✅ Good organization with room for minor consolidation

Files:

  • safe_outputs.go - Main safe outputs logic
  • safe_outputs_app.go - Application-specific
  • safe_outputs_config*.go (7 files) - Configuration handling
  • safe_outputs_domains_validation.go - Domain validation
  • safe_outputs_env.go - Environment variables
  • safe_outputs_jobs.go, safe_outputs_steps.go - Job/step generation
  • safe_output_builder.go, safe_output_config.go, safe_output_validation_config.go

Minor Suggestion: Consider consolidating the 3 safe_outputs_config_helpers*.go files into one.


Cluster 5: Runtime Detection ✅

Pattern: runtime_*
Files: 6 files
Status: ✅ Good organization

Files:

  • runtime_definitions.go - Runtime type definitions
  • runtime_detection.go - Auto-detection logic
  • runtime_deduplication.go - Deduplication logic
  • runtime_overrides.go - Manual overrides
  • runtime_step_generator.go - Step generation
  • runtime_validation.go - Validation

Analysis: Clear separation of concerns within runtime handling.


Cluster 6: Create Operations ✅

Pattern: create_*
Files: 8 files
Status: ✅ Well-organized

Files:

  • create_agent_session.go
  • create_code_scanning_alert.go
  • create_discussion.go
  • create_issue.go
  • create_pr_review_comment.go
  • create_project.go
  • create_project_status_update.go
  • create_pull_request.go

Analysis: Each creation operation has its own file - excellent pattern!


Cluster 7: Update Operations ✅

Pattern: update_*
Files: 7 files (including helpers)
Status: ✅ Good organization

Files:

  • update_discussion.go
  • update_entity_helpers.go
  • update_issue.go
  • update_project.go
  • update_project_job.go
  • update_pull_request.go
  • update_release.go

Analysis: Consistent pattern matching create operations.


Summary of Recommendations

🔴 High Priority

None identified - overall code organization is strong.

🟡 Medium Priority

  1. Consolidate parse*Config Pattern (Issue rejig docs #1)

    • Use generics to reduce 30+ similar functions to a single implementation
    • Estimated effort: 4-6 hours
    • Benefits: 500-800 lines reduced, single source of truth
  2. Move Validation Functions to Dedicated Files (Issue Add workflow: githubnext/agentics/weekly-research #3)

    • Create js_validation.go for JavaScript validation
    • Move engine validation to agent_validation.go
    • Estimated effort: 2-3 hours
    • Benefits: Improved consistency and discoverability

🟢 Low Priority

  1. Consolidate Single-Function Helper Files (Issue Add workflow: githubnext/agentics/weekly-research #2)

    • Move 3 single-function helper files to appropriate locations
    • Estimated effort: 1-2 hours
    • Benefits: Reduced file count, better discoverability
  2. Consider Helper File Consolidation (Issue Add workflow: githubnext/agentics/weekly-research #2)

    • Review and potentially merge safe_outputs_config_helpers*.go files
    • Estimated effort: 2-3 hours
    • Benefits: Reduced file count

Implementation Checklist

Phase 1: Quick Wins (Low Effort, Clear Benefit)

  • Move GetCurrentGitTag() from pkg/workflow/git_helpers.go to pkg/gitutil/gitutil.go
  • Inline or move validateIntRange() from validation_helpers.go
  • Move map utility functions to config_helpers.go or create pkg/maputil package
  • Remove empty or near-empty helper files

Phase 2: Validation Organization (Medium Effort)

  • Create pkg/workflow/js_validation.go for JavaScript validation functions
  • Move validation functions from agentic_engine.go to agent_validation.go
  • Rename permissions_validator.go to permissions_validation.go for consistency
  • Document validation file naming convention

Phase 3: Config Parser Refactoring (Higher Effort, High Impact)

  • Design generic parseConfig[T]() function using Go generics
  • Migrate 5-10 parse functions to use generic implementation
  • Test and validate approach
  • Migrate remaining parse functions
  • Update tests to cover generic implementation
  • Document new pattern for future config additions

Phase 4: Helper File Review (Optional)

  • Audit all *_helpers.go files for consolidation opportunities
  • Consider merging safe_outputs_config_helpers*.go files
  • Document when to create vs. when to extend helper files

Positive Findings

✅ Excellent Patterns Observed

  1. Feature-Based Organization: The codebase follows a clear pattern of organizing files by feature (compiler_, mcp_, runtime_)
  2. Engine Isolation: AI engine implementations are cleanly separated (claude_, copilot_, codex_)
  3. Operation Grouping: Create/update operations follow consistent naming and file organization
  4. String Utilities: The pkg/stringutil package is exceptionally well-organized with clear separation of concerns
  5. Validation Files: Most validation logic is properly organized into dedicated *_validation.go files
  6. CLI Organization: The pkg/cli package shows good organization with clear command grouping

📊 Code Organization Metrics

  • File Naming Consistency: 95%+ of files follow clear naming conventions
  • Feature Cohesion: High - related functionality is properly grouped
  • Code Duplication: Low-Medium - most duplication is in boilerplate (parse functions)
  • Documentation: Good inline documentation in key files (e.g., strings.go)

Conclusion

The gh-aw codebase demonstrates strong architectural organization with clear patterns and conventions. The primary opportunities for improvement are:

  1. Reducing boilerplate in config parsing functions through generics
  2. Moving outlier validation functions to dedicated validation files for consistency
  3. Minor consolidation of small helper files

These refactoring opportunities are relatively low-risk and would improve maintainability without requiring significant architectural changes.

Overall Assessment: 🟢 Well-Organized Codebase with targeted improvement opportunities.


Next Steps

Please review these findings and prioritize based on:

  • Current development velocity
  • Team familiarity with Go generics
  • Upcoming feature work that might benefit from refactoring

I recommend starting with Phase 1: Quick Wins as these are low-risk changes with immediate benefits.


Analysis Metadata:

  • Repository: githubnext/gh-aw
  • Workflow Run: §20998672341
  • Analysis Date: 2026-01-14
  • Files Analyzed: 422 Go source files
  • Functions Cataloged: 3,330+ functions
  • Primary Packages: pkg/workflow (224 files), pkg/cli (132 files), pkg/parser (26 files)

AI generated by Semantic Function Refactoring

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions