-
Notifications
You must be signed in to change notification settings - Fork 296
Description
Analysis Summary
Comprehensive semantic analysis of all Go source files in the repository to identify refactoring opportunities through function clustering, duplicate detection, and code organization assessment.
Scope: 259 Go files (excluding tests) containing 1,568 functions across pkg/cli, pkg/workflow, pkg/parser, and other packages.
Full Analysis Report
Executive Summary
The codebase demonstrates generally good organization with clear patterns:
- ✅ Well-organized validation files (
*_validation.go) - ✅ Clear creation operation files (
create_*.go) - ✅ Feature-specific grouping (MCP, logs, Docker, npm, pip)
⚠️ Some duplicate functions identified (3 cases)⚠️ Many small files with 1-2 functions inpkg/workflow/⚠️ Scattered helper functions could benefit from consolidation
Detailed Findings
1. Duplicate Functions Detected
Priority 1: extractErrorMessage() - Near-Duplicate (90% similar)
Occurrence 1: pkg/workflow/metrics.go:462
func extractErrorMessage(line string) string {
// Uses pre-compiled regex patterns
cleanedLine := line
cleanedLine = timestampPattern1.ReplaceAllString(cleanedLine, "")
cleanedLine = timestampPattern2.ReplaceAllString(cleanedLine, "")
cleanedLine = timestampPattern3.ReplaceAllString(cleanedLine, "")
cleanedLine = logLevelPattern.ReplaceAllString(cleanedLine, "")
cleanedLine = strings.TrimSpace(cleanedLine)
return cleanedLine
}Occurrence 2: pkg/cli/copilot_agent.go:297
func extractErrorMessage(line string) string {
// Compiles regex inline
line = regexp.MustCompile(`^\d{4}-\d{2}-\d{2}[T\s]...`).ReplaceAllString(line, "")
line = regexp.MustCompile(`^\[\d{4}-\d{2}-\d{2}\s+...`).ReplaceAllString(line, "")
line = regexp.MustCompile(`^(ERROR|WARN|WARNING|...)...`).ReplaceAllString(line, "")
line = strings.TrimSpace(line)
return line
}Analysis:
- Both functions clean error messages from log lines
- Metrics version uses pre-compiled patterns (more efficient)
- Copilot agent version compiles inline (less efficient)
- Impact: Code duplication, maintenance burden, performance inconsistency
Recommendation: Create shared utility function in pkg/logger/ or pkg/workflow/strings.go
Priority 2: isMCPType() - Near-Duplicate with Inconsistency
Occurrence 1: pkg/workflow/mcp-config.go:954
func isMCPType(typeStr string) bool {
switch typeStr {
case "stdio", "http", "local": // Supports 3 types
return true
default:
return false
}
}Occurrence 2: pkg/parser/frontmatter.go:72
func isMCPType(typeStr string) bool {
switch typeStr {
case "stdio", "http": // Supports only 2 types
return true
default:
return false
}
}Analysis:
- Critical inconsistency: Different valid type sets
mcp-config.goaccepts"local"butfrontmatter.godoes not- Impact: Validation inconsistency, potential bugs
Recommendation: Single source of truth in pkg/parser/mcp.go or as constant
Priority 3: parseRunURL() - Function Wrapping
Occurrence 1: pkg/cli/audit.go:103
func parseRunURL(input string) (RunURLInfo, error) {
runID, owner, repo, hostname, err := parser.ParseRunURL(input)
if err != nil {
return RunURLInfo{}, err
}
return RunURLInfo{
RunID: runID,
Owner: owner,
Repo: repo,
Hostname: hostname,
}, nil
}Occurrence 2: pkg/parser/github_urls.go:177
func parseRunURL(host, owner, repo string, parts []string) (*GitHubURLComponents, error) {
// Different signature - internal parser
...
}Analysis:
- Different signatures (not true duplicates)
- CLI version wraps parser's public function
- Impact: Acceptable wrapper pattern, but review if necessary
Recommendation: Review if wrapper adds value or just adds indirection
2. Function Clustering Analysis
Clustering by Naming Patterns
| Prefix | Count | Common Files | Organization Status |
|---|---|---|---|
get* |
88 | scripts.go (31), mcp_servers.go (9), js.go (6) |
✅ Good |
extract* |
62 | frontmatter.go (11), logs_metrics.go (5), tools.go (4) |
|
parse* |
60 | Multiple files | ✅ Good |
is* |
46 | Various validation files | ✅ Good |
generate* |
40 | Various files | ✅ Good |
build* |
37 | Compiler/builder files | ✅ Good |
render* |
34 | Console/output files | ✅ Good |
validate* |
76 | Dedicated *_validation.go files |
✅ Excellent |
Notable Observation: The 31 get* functions in scripts.go form a script registry pattern - this is well-organized ✅
3. Validation Functions - Excellent Organization ✅
The codebase demonstrates best-practice validation organization:
pkg/workflow/
├── agent_validation.go
├── bundler_validation.go
├── docker_validation.go
├── engine_validation.go
├── expression_validation.go
├── mcp_config_validation.go
├── npm_validation.go
├── pip_validation.go
├── repository_features_validation.go
├── runtime_validation.go
├── schema_validation.go
├── step_order_validation.go
├── strict_mode_validation.go
├── template_validation.go
└── validation.go (general)
Status: Each validation concern has a dedicated file following the *_validation.go pattern. No refactoring needed.
4. Small Files Analysis (Potential Consolidation)
Files with ≤2 functions and <100 lines in pkg/workflow/:
| File | Lines | Funcs | Purpose |
|---|---|---|---|
temp_folder.go |
13 | 1 | Temp folder management |
xpia.go |
13 | 1 | XPIA detection |
safe_outputs_prompt.go |
14 | 1 | Prompt text |
detection.go |
17 | 1 | Detection logic |
github_context.go |
18 | 1 | GitHub context |
cache_memory_prompt.go |
19 | 1 | Prompt text |
git_patch.go |
19 | 1 | Git patch handling |
edit_tool_prompt.go |
22 | 2 | Prompt text |
playwright_prompt.go |
22 | 2 | Prompt text |
safe_output_config.go |
25 | 1 | Config struct |
yaml_generation.go |
25 | 2 | YAML helpers |
domain_sanitization.go |
29 | 1 | Sanitization |
| ... and 18 more |
Analysis:
- Many prompt-related files could be grouped:
*_prompt.go→prompts.go - Small utility functions scattered
- Impact: Many files to navigate, harder to discover related functions
Recommendation: Consider consolidating related small files:
- Merge prompt files →
prompts.goorprompt_templates.go - Group small helpers →
helpers.goor enhance existingconfig_helpers.go
5. Package Organization Assessment
pkg/cli/ (75 files) - ✅ Well-Organized
Patterns observed:
- ✅ Command pattern:
*_command.go(add_command, compile_command, etc.) - ✅ Feature grouping:
mcp_*.go(11 MCP-related files) - ✅ Feature grouping:
logs_*.go(5 logs-related files) - ✅ Clear responsibilities per file
Status: Excellent organization, no refactoring needed
pkg/workflow/ (155 files) - ⚠️ Mixed
Good patterns:
- ✅ Creation operations:
create_*.gopattern - ✅ Validation:
*_validation.gopattern - ✅ Feature files:
docker.go,npm.go,pip.go,nodejs.go - ✅ Safe outputs:
safe_*.gogrouping - ✅ Engine files:
*_engine.gopattern
Areas for improvement:
⚠️ Many small files with 1-2 functions⚠️ 30+ files with <100 lines⚠️ Some generic utilities not centralized
Status: Generally good, but could benefit from consolidation of small files
pkg/parser/ (10 files) - ✅ Excellent
Clear responsibilities:
frontmatter.go(11 extract functions - makes sense)github_urls.goimport_cache.goschema.goyaml_error.go- etc.
Status: Well-organized, no refactoring needed
pkg/console/ (5 files) - ✅ Excellent
console/
├── banner.go
├── console.go
├── format.go (all FormatXXXMessage functions here - good!)
├── render.go
└── spinner.go
Status: Perfect organization, no refactoring needed
6. Function Distribution Analysis
Extract Functions (62 total):
pkg/parser/frontmatter.go: 11 functions (makes sense - frontmatter parsing)pkg/cli/logs_metrics.go: 5 functions (log processing)pkg/workflow/tools.go: 4 functions- Scattered across 20+ other files
Status: Frontmatter and logs metrics concentrations are justified by purpose
Refactoring Recommendations
Priority 1: High Impact, Low Effort
1. Consolidate extractErrorMessage() Duplicates
Action: Create single implementation
- Location:
pkg/logger/error_formatting.goorpkg/workflow/strings.go - Implementation: Use pre-compiled regex patterns (metrics.go version)
- Effort: 1-2 hours
- Benefits:
- Single source of truth
- Better performance (pre-compiled patterns)
- Easier maintenance
Example:
// pkg/logger/error_formatting.go
package logger
var (
timestampPattern1 = regexp.MustCompile(`^\d{4}-\d{2}-\d{2}[T\s]\d{2}:\d{2}:\d{2}(\.\d+)?([+-]\d{2}:\d{2}|Z)?\s*`)
timestampPattern2 = regexp.MustCompile(`^\[\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}\]\s*`)
logLevelPattern = regexp.MustCompile(`^(ERROR|WARN|WARNING|INFO|DEBUG):\s*`)
)
func ExtractErrorMessage(line string) string {
cleaned := timestampPattern1.ReplaceAllString(line, "")
cleaned = timestampPattern2.ReplaceAllString(cleaned, "")
cleaned = logLevelPattern.ReplaceAllString(cleaned, "")
return strings.TrimSpace(cleaned)
}2. Unify isMCPType() Implementations
Action: Create single source of truth with consistent validation
- Location:
pkg/parser/mcp.goor createpkg/workflow/mcp_types.go - Critical: Determine correct set of valid types (
stdio,http,local?) - Effort: 1 hour
- Benefits:
- Consistent validation across codebase
- Fix potential bugs from inconsistent type checking
- Easier to add new MCP types in future
Example:
// pkg/parser/mcp.go
package parser
// ValidMCPTypes defines all valid MCP server types
var ValidMCPTypes = []string{"stdio", "http", "local"}
// IsMCPType checks if a type string represents an MCP-compatible type
func IsMCPType(typeStr string) bool {
switch typeStr {
case "stdio", "http", "local":
return true
default:
return false
}
}Then update both files to use:
import "github.com/githubnext/gh-aw/pkg/parser"
// Use parser.IsMCPType() instead of local functionPriority 2: Medium Impact, Medium Effort
3. Consolidate Small Prompt Files
Current state:
safe_outputs_prompt.go(14 lines, 1 func)cache_memory_prompt.go(19 lines, 1 func)edit_tool_prompt.go(22 lines, 2 funcs)playwright_prompt.go(22 lines, 2 funcs)pr_prompt.go(61 lines, 2 funcs)
Recommendation: Merge into pkg/workflow/prompts.go or prompt_templates.go
- Effort: 2-3 hours
- Benefits:
- Easier to find all prompts in one place
- Reduce file count
- Better overview of all prompt templates
4. Review and Consolidate Small Utility Files
Candidates:
temp_folder.go(13 lines, 1 func)xpia.go(13 lines, 1 func)detection.go(17 lines, 1 func)github_context.go(18 lines, 1 func)git_patch.go(19 lines, 1 func)domain_sanitization.go(29 lines, 1 func)
Recommendation: Group related utilities
- Option A: Create
pkg/workflow/utilities.gofor miscellaneous helpers - Option B: Merge into existing related files (e.g.,
xpia.go→validation.go) - Effort: 3-4 hours
- Benefits: Fewer files to navigate, related functions together
Priority 3: Long-term Improvements
5. Consider Extract Function Consolidation
Current state: 62 extract* functions across 20+ files
Recommendation: Review for patterns
- Many frontmatter extract functions (11) - justified ✅
- Many logs metrics extract functions (5) - justified ✅
- Others scattered - review if some could be consolidated
Effort: 4-6 hours (analysis + refactoring)
Benefits: Potentially discover more duplicates or near-duplicates
Implementation Checklist
Phase 1: Quick Wins (Priority 1)
- Create
pkg/logger/error_formatting.gowith unifiedExtractErrorMessage() - Update
pkg/workflow/metrics.goto use new function - Update
pkg/cli/copilot_agent.goto use new function - Run tests to verify no regressions
- Create
IsMCPType()unified function inpkg/parser/mcp.go - Update both
mcp-config.goandfrontmatter.goto use it - Verify which types should be valid (
localincluded?) - Run tests to verify validation consistency
Phase 2: File Consolidation (Priority 2)
- Review and plan prompt file consolidation
- Create
pkg/workflow/prompts.goorprompt_templates.go - Migrate prompt functions from individual files
- Update imports across codebase
- Remove old prompt files
- Review small utility files for consolidation opportunities
- Implement chosen consolidation strategy
- Run full test suite
Phase 3: Future Considerations (Priority 3)
- Deep analysis of 62
extract*functions - Identify additional consolidation opportunities
- Consider generic helpers for common patterns
- Document file organization patterns in CONTRIBUTING.md
Analysis Metadata
- Analysis Date: 2025-12-04
- Tool: Semantic function clustering + manual code review
- Files Analyzed: 259 Go files (excluding tests)
- Total Functions: 1,568
- Duplicates Found: 3 cases (2 high priority, 1 low priority)
- Small Files (<100 lines, ≤2 funcs): 30+ in
pkg/workflow/ - Validation Files: 15 (excellently organized)
- Creation Files: 6
create_*.go(well organized)
Success Criteria
This refactoring will be successful when:
- ✅ Zero duplicate
extractErrorMessage()implementations - ✅ Single source of truth for
isMCPType()validation - ✅ Consistent MCP type validation across codebase
- ✅ Reduced file count through consolidation of small files
- ✅ Maintained or improved test coverage
- ✅ No functionality broken or changed (refactoring only)
Positive Findings (Celebrate What Works!) 🎉
The codebase demonstrates many excellent patterns:
- ✅ Validation organization: Each concern has dedicated
*_validation.gofile - ✅ Create operations: Clear
create_*.gopattern - ✅ Command pattern: CLI commands follow
*_command.goconvention - ✅ Feature grouping: MCP files grouped with
mcp_*prefix - ✅ Console formatting: All format functions in
console/format.go - ✅ Parser package: Clean, focused responsibilities
- ✅ Scripts registry: 31 get functions organized in
scripts.go
Overall Assessment: The codebase is well-organized with clear patterns. The identified issues are minor and easily addressable.
Quick Action Items
For immediate impact, focus on:
-
Consolidate
extractErrorMessage()- Two nearly identical functions- Files:
pkg/workflow/metrics.go:462andpkg/cli/copilot_agent.go:297 - Impact: Code duplication, performance inconsistency
- Files:
-
Unify
isMCPType()- Inconsistent validation (critical!)- Files:
pkg/workflow/mcp-config.go:954andpkg/parser/frontmatter.go:72 - Impact: Validation bug - different valid type sets
- Files:
-
Consider consolidating small prompt files - 5+ files with single functions
- Files:
*_prompt.gofiles inpkg/workflow/ - Impact: Easier discoverability, fewer files
- Files:
Estimated Total Effort:
- Priority 1 (High Impact): 2-3 hours
- Priority 2 (Medium Impact): 5-7 hours
- Priority 3 (Future): 4-6 hours
Total: ~11-16 hours for full implementation
AI generated by Semantic Function Refactoring