Skip to content

[refactor] Semantic Function Clustering Analysis - Code Organization Improvements #5506

@github-actions

Description

@github-actions

Analysis Summary

Comprehensive semantic analysis of all Go source files in the repository to identify refactoring opportunities through function clustering, duplicate detection, and code organization assessment.

Scope: 259 Go files (excluding tests) containing 1,568 functions across pkg/cli, pkg/workflow, pkg/parser, and other packages.

Full Analysis Report

Executive Summary

The codebase demonstrates generally good organization with clear patterns:

  • ✅ Well-organized validation files (*_validation.go)
  • ✅ Clear creation operation files (create_*.go)
  • ✅ Feature-specific grouping (MCP, logs, Docker, npm, pip)
  • ⚠️ Some duplicate functions identified (3 cases)
  • ⚠️ Many small files with 1-2 functions in pkg/workflow/
  • ⚠️ Scattered helper functions could benefit from consolidation

Detailed Findings

1. Duplicate Functions Detected

Priority 1: extractErrorMessage() - Near-Duplicate (90% similar)

Occurrence 1: pkg/workflow/metrics.go:462

func extractErrorMessage(line string) string {
    // Uses pre-compiled regex patterns
    cleanedLine := line
    cleanedLine = timestampPattern1.ReplaceAllString(cleanedLine, "")
    cleanedLine = timestampPattern2.ReplaceAllString(cleanedLine, "")
    cleanedLine = timestampPattern3.ReplaceAllString(cleanedLine, "")
    cleanedLine = logLevelPattern.ReplaceAllString(cleanedLine, "")
    cleanedLine = strings.TrimSpace(cleanedLine)
    return cleanedLine
}

Occurrence 2: pkg/cli/copilot_agent.go:297

func extractErrorMessage(line string) string {
    // Compiles regex inline
    line = regexp.MustCompile(`^\d{4}-\d{2}-\d{2}[T\s]...`).ReplaceAllString(line, "")
    line = regexp.MustCompile(`^\[\d{4}-\d{2}-\d{2}\s+...`).ReplaceAllString(line, "")
    line = regexp.MustCompile(`^(ERROR|WARN|WARNING|...)...`).ReplaceAllString(line, "")
    line = strings.TrimSpace(line)
    return line
}

Analysis:

  • Both functions clean error messages from log lines
  • Metrics version uses pre-compiled patterns (more efficient)
  • Copilot agent version compiles inline (less efficient)
  • Impact: Code duplication, maintenance burden, performance inconsistency

Recommendation: Create shared utility function in pkg/logger/ or pkg/workflow/strings.go


Priority 2: isMCPType() - Near-Duplicate with Inconsistency

Occurrence 1: pkg/workflow/mcp-config.go:954

func isMCPType(typeStr string) bool {
    switch typeStr {
    case "stdio", "http", "local":  // Supports 3 types
        return true
    default:
        return false
    }
}

Occurrence 2: pkg/parser/frontmatter.go:72

func isMCPType(typeStr string) bool {
    switch typeStr {
    case "stdio", "http":  // Supports only 2 types
        return true
    default:
        return false
    }
}

Analysis:

  • Critical inconsistency: Different valid type sets
  • mcp-config.go accepts "local" but frontmatter.go does not
  • Impact: Validation inconsistency, potential bugs

Recommendation: Single source of truth in pkg/parser/mcp.go or as constant


Priority 3: parseRunURL() - Function Wrapping

Occurrence 1: pkg/cli/audit.go:103

func parseRunURL(input string) (RunURLInfo, error) {
    runID, owner, repo, hostname, err := parser.ParseRunURL(input)
    if err != nil {
        return RunURLInfo{}, err
    }
    return RunURLInfo{
        RunID: runID,
        Owner: owner,
        Repo: repo,
        Hostname: hostname,
    }, nil
}

Occurrence 2: pkg/parser/github_urls.go:177

func parseRunURL(host, owner, repo string, parts []string) (*GitHubURLComponents, error) {
    // Different signature - internal parser
    ...
}

Analysis:

  • Different signatures (not true duplicates)
  • CLI version wraps parser's public function
  • Impact: Acceptable wrapper pattern, but review if necessary

Recommendation: Review if wrapper adds value or just adds indirection


2. Function Clustering Analysis

Clustering by Naming Patterns

Prefix Count Common Files Organization Status
get* 88 scripts.go (31), mcp_servers.go (9), js.go (6) ✅ Good
extract* 62 frontmatter.go (11), logs_metrics.go (5), tools.go (4) ⚠️ Review
parse* 60 Multiple files ✅ Good
is* 46 Various validation files ✅ Good
generate* 40 Various files ✅ Good
build* 37 Compiler/builder files ✅ Good
render* 34 Console/output files ✅ Good
validate* 76 Dedicated *_validation.go files ✅ Excellent

Notable Observation: The 31 get* functions in scripts.go form a script registry pattern - this is well-organized ✅


3. Validation Functions - Excellent Organization ✅

The codebase demonstrates best-practice validation organization:

pkg/workflow/
├── agent_validation.go
├── bundler_validation.go
├── docker_validation.go
├── engine_validation.go
├── expression_validation.go
├── mcp_config_validation.go
├── npm_validation.go
├── pip_validation.go
├── repository_features_validation.go
├── runtime_validation.go
├── schema_validation.go
├── step_order_validation.go
├── strict_mode_validation.go
├── template_validation.go
└── validation.go (general)

Status: Each validation concern has a dedicated file following the *_validation.go pattern. No refactoring needed.


4. Small Files Analysis (Potential Consolidation)

Files with ≤2 functions and <100 lines in pkg/workflow/:

File Lines Funcs Purpose
temp_folder.go 13 1 Temp folder management
xpia.go 13 1 XPIA detection
safe_outputs_prompt.go 14 1 Prompt text
detection.go 17 1 Detection logic
github_context.go 18 1 GitHub context
cache_memory_prompt.go 19 1 Prompt text
git_patch.go 19 1 Git patch handling
edit_tool_prompt.go 22 2 Prompt text
playwright_prompt.go 22 2 Prompt text
safe_output_config.go 25 1 Config struct
yaml_generation.go 25 2 YAML helpers
domain_sanitization.go 29 1 Sanitization
... and 18 more

Analysis:

  • Many prompt-related files could be grouped: *_prompt.goprompts.go
  • Small utility functions scattered
  • Impact: Many files to navigate, harder to discover related functions

Recommendation: Consider consolidating related small files:

  • Merge prompt files → prompts.go or prompt_templates.go
  • Group small helpers → helpers.go or enhance existing config_helpers.go

5. Package Organization Assessment

pkg/cli/ (75 files) - ✅ Well-Organized

Patterns observed:

  • ✅ Command pattern: *_command.go (add_command, compile_command, etc.)
  • ✅ Feature grouping: mcp_*.go (11 MCP-related files)
  • ✅ Feature grouping: logs_*.go (5 logs-related files)
  • ✅ Clear responsibilities per file

Status: Excellent organization, no refactoring needed


pkg/workflow/ (155 files) - ⚠️ Mixed

Good patterns:

  • ✅ Creation operations: create_*.go pattern
  • ✅ Validation: *_validation.go pattern
  • ✅ Feature files: docker.go, npm.go, pip.go, nodejs.go
  • ✅ Safe outputs: safe_*.go grouping
  • ✅ Engine files: *_engine.go pattern

Areas for improvement:

  • ⚠️ Many small files with 1-2 functions
  • ⚠️ 30+ files with <100 lines
  • ⚠️ Some generic utilities not centralized

Status: Generally good, but could benefit from consolidation of small files


pkg/parser/ (10 files) - ✅ Excellent

Clear responsibilities:

  • frontmatter.go (11 extract functions - makes sense)
  • github_urls.go
  • import_cache.go
  • schema.go
  • yaml_error.go
  • etc.

Status: Well-organized, no refactoring needed


pkg/console/ (5 files) - ✅ Excellent

console/
├── banner.go
├── console.go
├── format.go (all FormatXXXMessage functions here - good!)
├── render.go
└── spinner.go

Status: Perfect organization, no refactoring needed


6. Function Distribution Analysis

Extract Functions (62 total):

  • pkg/parser/frontmatter.go: 11 functions (makes sense - frontmatter parsing)
  • pkg/cli/logs_metrics.go: 5 functions (log processing)
  • pkg/workflow/tools.go: 4 functions
  • Scattered across 20+ other files

Status: Frontmatter and logs metrics concentrations are justified by purpose


Refactoring Recommendations

Priority 1: High Impact, Low Effort

1. Consolidate extractErrorMessage() Duplicates

Action: Create single implementation

  • Location: pkg/logger/error_formatting.go or pkg/workflow/strings.go
  • Implementation: Use pre-compiled regex patterns (metrics.go version)
  • Effort: 1-2 hours
  • Benefits:
    • Single source of truth
    • Better performance (pre-compiled patterns)
    • Easier maintenance

Example:

// pkg/logger/error_formatting.go
package logger

var (
    timestampPattern1 = regexp.MustCompile(`^\d{4}-\d{2}-\d{2}[T\s]\d{2}:\d{2}:\d{2}(\.\d+)?([+-]\d{2}:\d{2}|Z)?\s*`)
    timestampPattern2 = regexp.MustCompile(`^\[\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}\]\s*`)
    logLevelPattern   = regexp.MustCompile(`^(ERROR|WARN|WARNING|INFO|DEBUG):\s*`)
)

func ExtractErrorMessage(line string) string {
    cleaned := timestampPattern1.ReplaceAllString(line, "")
    cleaned = timestampPattern2.ReplaceAllString(cleaned, "")
    cleaned = logLevelPattern.ReplaceAllString(cleaned, "")
    return strings.TrimSpace(cleaned)
}

2. Unify isMCPType() Implementations

Action: Create single source of truth with consistent validation

  • Location: pkg/parser/mcp.go or create pkg/workflow/mcp_types.go
  • Critical: Determine correct set of valid types (stdio, http, local?)
  • Effort: 1 hour
  • Benefits:
    • Consistent validation across codebase
    • Fix potential bugs from inconsistent type checking
    • Easier to add new MCP types in future

Example:

// pkg/parser/mcp.go
package parser

// ValidMCPTypes defines all valid MCP server types
var ValidMCPTypes = []string{"stdio", "http", "local"}

// IsMCPType checks if a type string represents an MCP-compatible type
func IsMCPType(typeStr string) bool {
    switch typeStr {
    case "stdio", "http", "local":
        return true
    default:
        return false
    }
}

Then update both files to use:

import "github.com/githubnext/gh-aw/pkg/parser"

// Use parser.IsMCPType() instead of local function

Priority 2: Medium Impact, Medium Effort

3. Consolidate Small Prompt Files

Current state:

  • safe_outputs_prompt.go (14 lines, 1 func)
  • cache_memory_prompt.go (19 lines, 1 func)
  • edit_tool_prompt.go (22 lines, 2 funcs)
  • playwright_prompt.go (22 lines, 2 funcs)
  • pr_prompt.go (61 lines, 2 funcs)

Recommendation: Merge into pkg/workflow/prompts.go or prompt_templates.go

  • Effort: 2-3 hours
  • Benefits:
    • Easier to find all prompts in one place
    • Reduce file count
    • Better overview of all prompt templates

4. Review and Consolidate Small Utility Files

Candidates:

  • temp_folder.go (13 lines, 1 func)
  • xpia.go (13 lines, 1 func)
  • detection.go (17 lines, 1 func)
  • github_context.go (18 lines, 1 func)
  • git_patch.go (19 lines, 1 func)
  • domain_sanitization.go (29 lines, 1 func)

Recommendation: Group related utilities

  • Option A: Create pkg/workflow/utilities.go for miscellaneous helpers
  • Option B: Merge into existing related files (e.g., xpia.govalidation.go)
  • Effort: 3-4 hours
  • Benefits: Fewer files to navigate, related functions together

Priority 3: Long-term Improvements

5. Consider Extract Function Consolidation

Current state: 62 extract* functions across 20+ files

Recommendation: Review for patterns

  • Many frontmatter extract functions (11) - justified
  • Many logs metrics extract functions (5) - justified
  • Others scattered - review if some could be consolidated

Effort: 4-6 hours (analysis + refactoring)
Benefits: Potentially discover more duplicates or near-duplicates


Implementation Checklist

Phase 1: Quick Wins (Priority 1)

  • Create pkg/logger/error_formatting.go with unified ExtractErrorMessage()
  • Update pkg/workflow/metrics.go to use new function
  • Update pkg/cli/copilot_agent.go to use new function
  • Run tests to verify no regressions
  • Create IsMCPType() unified function in pkg/parser/mcp.go
  • Update both mcp-config.go and frontmatter.go to use it
  • Verify which types should be valid (local included?)
  • Run tests to verify validation consistency

Phase 2: File Consolidation (Priority 2)

  • Review and plan prompt file consolidation
  • Create pkg/workflow/prompts.go or prompt_templates.go
  • Migrate prompt functions from individual files
  • Update imports across codebase
  • Remove old prompt files
  • Review small utility files for consolidation opportunities
  • Implement chosen consolidation strategy
  • Run full test suite

Phase 3: Future Considerations (Priority 3)

  • Deep analysis of 62 extract* functions
  • Identify additional consolidation opportunities
  • Consider generic helpers for common patterns
  • Document file organization patterns in CONTRIBUTING.md

Analysis Metadata

  • Analysis Date: 2025-12-04
  • Tool: Semantic function clustering + manual code review
  • Files Analyzed: 259 Go files (excluding tests)
  • Total Functions: 1,568
  • Duplicates Found: 3 cases (2 high priority, 1 low priority)
  • Small Files (<100 lines, ≤2 funcs): 30+ in pkg/workflow/
  • Validation Files: 15 (excellently organized)
  • Creation Files: 6 create_*.go (well organized)

Success Criteria

This refactoring will be successful when:

  1. ✅ Zero duplicate extractErrorMessage() implementations
  2. ✅ Single source of truth for isMCPType() validation
  3. ✅ Consistent MCP type validation across codebase
  4. ✅ Reduced file count through consolidation of small files
  5. ✅ Maintained or improved test coverage
  6. ✅ No functionality broken or changed (refactoring only)

Positive Findings (Celebrate What Works!) 🎉

The codebase demonstrates many excellent patterns:

  • Validation organization: Each concern has dedicated *_validation.go file
  • Create operations: Clear create_*.go pattern
  • Command pattern: CLI commands follow *_command.go convention
  • Feature grouping: MCP files grouped with mcp_* prefix
  • Console formatting: All format functions in console/format.go
  • Parser package: Clean, focused responsibilities
  • Scripts registry: 31 get functions organized in scripts.go

Overall Assessment: The codebase is well-organized with clear patterns. The identified issues are minor and easily addressable.


Quick Action Items

For immediate impact, focus on:

  1. Consolidate extractErrorMessage() - Two nearly identical functions

    • Files: pkg/workflow/metrics.go:462 and pkg/cli/copilot_agent.go:297
    • Impact: Code duplication, performance inconsistency
  2. Unify isMCPType() - Inconsistent validation (critical!)

    • Files: pkg/workflow/mcp-config.go:954 and pkg/parser/frontmatter.go:72
    • Impact: Validation bug - different valid type sets
  3. Consider consolidating small prompt files - 5+ files with single functions

    • Files: *_prompt.go files in pkg/workflow/
    • Impact: Easier discoverability, fewer files

Estimated Total Effort:

  • Priority 1 (High Impact): 2-3 hours
  • Priority 2 (Medium Impact): 5-7 hours
  • Priority 3 (Future): 4-6 hours

Total: ~11-16 hours for full implementation

AI generated by Semantic Function Refactoring

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions