-
Notifications
You must be signed in to change notification settings - Fork 46
Description
🔧 Semantic Function Clustering Analysis
Automated analysis of repository: githubnext/gh-aw
Overview
This report analyzes Go source code organization across the repository to identify refactoring opportunities through semantic function clustering, outlier detection, and duplicate analysis.
Key Findings:
- ✅ Strong file organization by feature (compiler_, mcp_, runtime_, etc.)
⚠️ 30+ parse*Config functions with similar boilerplate that could be consolidated⚠️ 14 helper files (2,198 lines) with utility functions - some consolidation opportunities⚠️ 34 validation files but validation functions scattered across 14+ non-validation files- ℹ️ Excellent string utility organization in
pkg/stringutilpackage
Full Report
Analysis Summary
Scope:
- Total Go Files Analyzed: 422 (excluding tests)
- Total Functions Cataloged: 3,330+ (pkg/workflow alone)
- Function Clusters Identified: 21 major clusters
- Analysis Date: 2026-01-14
Package Distribution:
pkg/workflow: 224 files (largest package)
pkg/cli: 132 files
pkg/parser: 26 files
pkg/campaign: 11 files
pkg/console: 10 files
Other utils: 19 files (stringutil, logger, mathutil, etc.)
Function Clustering Results
Cluster Analysis by Prefix
pkg/workflow shows excellent organization by feature:
| Prefix | Count | Status |
|---|---|---|
compiler_* |
21 | ✅ Well-organized |
safe_* |
19 | ✅ Good grouping |
create_* |
8 | ✅ Creation functions grouped |
copilot_* |
8 | ✅ Engine-specific |
update_* |
7 | ✅ Update operations grouped |
runtime_* |
6 | ✅ Runtime detection/validation |
action_* |
6 | ✅ Action processing |
mcp_* |
5 | ✅ MCP server integration |
github_* |
5 | ✅ GitHub API integration |
frontmatter_* |
5 | ✅ Frontmatter parsing |
expression_* |
5 | ✅ Expression parsing/validation |
engine_* |
4 | ✅ Engine management |
claude_* |
4 | ✅ Claude engine |
bundler_* |
4 | ✅ JS bundling |
codex_* |
3 | ✅ Codex engine |
*_helpers |
14 files | |
*_validation |
34 files |
pkg/cli also shows good organization:
| Prefix | Count | Status |
|---|---|---|
mcp_* |
18 | ✅ MCP CLI commands |
compile_* |
14 | ✅ Compilation commands |
logs_* |
11 | ✅ Logging functionality |
update_* |
9 | ✅ Update operations |
run_* |
7 | ✅ Workflow execution |
| Others | Various | ✅ Generally well-organized |
Identified Issues
1. Repetitive parse*Config Pattern (Medium Priority)
Issue: 30+ parse*Config functions follow nearly identical patterns with boilerplate code.
Examples:
// Pattern repeated across 30+ files:
// pkg/workflow/add_labels.go
func (c *Compiler) parseAddLabelsConfig(outputMap map[string]any) *AddLabelsConfig {
if _, exists := outputMap["add-labels"]; !exists {
return nil
}
addLabelsLog.Print("Parsing add-labels configuration")
var config AddLabelsConfig
if err := unmarshalConfig(outputMap, "add-labels", &config, addLabelsLog); err != nil {
addLabelsLog.Printf("Failed to unmarshal config: %v", err)
return &AddLabelsConfig{}
}
addLabelsLog.Printf("Parsed configuration: ...")
return &config
}
// pkg/workflow/add_reviewer.go
func (c *Compiler) parseAddReviewerConfig(outputMap map[string]any) *AddReviewerConfig {
if _, exists := outputMap["add-reviewer"]; !exists {
return nil
}
addReviewerLog.Print("Parsing add-reviewer configuration")
var config AddReviewerConfig
if err := unmarshalConfig(outputMap, "add-reviewer", &config, addReviewerLog); err != nil {
addReviewerLog.Printf("Failed to unmarshal config: %v", err)
config = AddReviewerConfig{}
}
if config.Max == 0 {
config.Max = 3
}
// ... more boilerplate
}
// Similar pattern in:
// - parseAssignMilestoneConfig
// - parseAssignToAgentConfig
// - parseCloseIssuesConfig
// - parseCreateProjectsConfig
// - parseDispatchWorkflowConfig
// - parseHideCommentConfig
// - parsePullRequestsConfig
// ... and 20+ moreImpact:
- Code duplication: Similar error handling and logging across all functions
- Maintenance burden: Changes to parsing logic require updates in 30+ places
- Inconsistency risk: Subtle differences in error handling between implementations
Recommendation:
Consider using Go generics (Go 1.18+) to create a unified config parser:
func parseConfig[T any](
c *Compiler,
outputMap map[string]any,
key string,
logger *logger.Logger,
defaults func(*T),
) *T {
if _, exists := outputMap[key]; !exists {
return nil
}
logger.Printf("Parsing %s configuration", key)
var config T
if err := unmarshalConfig(outputMap, key, &config, logger); err != nil {
logger.Printf("Failed to unmarshal config: %v", err)
return &config // or handle error as needed
}
if defaults != nil {
defaults(&config)
}
return &config
}Estimated Impact:
- Reduced code by ~500-800 lines
- Single source of truth for config parsing logic
- Easier to add new config types
2. Helper File Proliferation (Low-Medium Priority)
Issue: 14 separate *_helpers.go files with 2,198 lines of utility functions.
Helper Files Found:
| File | Functions | Lines | Purpose |
|---|---|---|---|
config_helpers.go |
14 | ~350 | Config parsing utilities |
error_helpers.go |
15 | ~300 | Error construction |
engine_helpers.go |
8 | ~200 | Engine setup utilities |
safe_outputs_config_generation_helpers.go |
10 | ~250 | Safe output config generation |
safe_outputs_config_helpers.go |
3 | ~100 | Safe output utilities |
safe_outputs_config_helpers_reflection.go |
Various | ~300 | Reflection-based helpers |
compiler_yaml_helpers.go |
7 | ~200 | YAML generation utilities |
compiler_test_helpers.go |
3 | ~150 | Test utilities (appropriate location) |
close_entity_helpers.go |
4 | ~100 | Entity closing logic |
update_entity_helpers.go |
5 | ~150 | Entity update logic |
validation_helpers.go |
1 | ~30 | Single validation function |
git_helpers.go |
1 | ~20 | Single git function |
map_helpers.go |
2 | ~48 | Map utilities |
CLI: compile_helpers.go |
Various | ~200 | Compilation helpers |
Analysis:
✅ Good Practices:
- Most helpers are appropriately scoped to their feature area
config_helpers.gocentralizes config parsing logicerror_helpers.goprovides consistent error handling
validation_helpers.go- Only 1 function (30 lines) - could be inlined or movedgit_helpers.go- Only 1 function (20 lines) - consider moving topkg/gitutilmap_helpers.go- Only 2 functions (48 lines) - consider moving to a utils package- Multiple "config helpers" files - could potentially be consolidated
Recommendation:
Priority 1: Consolidate Single-Function Helpers
- Move
GetCurrentGitTag()fromgit_helpers.gotopkg/gitutil/gitutil.go - Move
validateIntRange()fromvalidation_helpers.gotovalidation.goor inline it - Move
parseIntValue()andfilterMapKeys()frommap_helpers.gotoconfig_helpers.go
Priority 2: Consider Consolidation
- Merge
safe_outputs_config_helpers*.gofiles into a singlesafe_outputs_config_utilities.go - Review if
close_entity_helpers.goandupdate_entity_helpers.gocould share common logic
Estimated Impact:
- Remove 3 tiny helper files
- Reduce file count by 3-5 files
- Improve discoverability of utility functions
3. Validation Functions Outside Validation Files (Medium Priority)
Issue: 34 dedicated validation files exist, but validation functions are scattered across 14+ non-validation files.
Validation Files (Well-Organized):
pkg/workflow/agent_validation.go
pkg/workflow/bundler_runtime_validation.go
pkg/workflow/bundler_safety_validation.go
pkg/workflow/bundler_script_validation.go
pkg/workflow/compiler_filters_validation.go
pkg/workflow/dangerous_permissions_validation.go
pkg/workflow/dispatch_workflow_validation.go
pkg/workflow/docker_validation.go
pkg/workflow/engine_validation.go
pkg/workflow/expression_validation.go
pkg/workflow/features_validation.go
pkg/workflow/firewall_validation.go
pkg/workflow/mcp_config_validation.go
pkg/workflow/mcp_gateway_schema_validation.go
pkg/workflow/npm_validation.go
pkg/workflow/pip_validation.go
pkg/workflow/repository_features_validation.go
pkg/workflow/runtime_validation.go
pkg/workflow/safe_output_validation_config.go
pkg/workflow/safe_outputs_domains_validation.go
pkg/workflow/sandbox_validation.go
pkg/workflow/schema_validation.go
pkg/workflow/secrets_validation.go
pkg/workflow/step_order_validation.go
pkg/workflow/strict_mode_validation.go
pkg/workflow/template_validation.go
pkg/workflow/validation.go
pkg/workflow/validation_helpers.go
... and more in cli/parser packages
Files with Validation Functions (Not Dedicated Validation Files):
action_sha_checker.go - ValidateActionSHAsInLockFile()
agentic_engine.go - validateHTTPTransportSupport(), validateMaxTurnsSupport(), validateWebSearchSupport()
artifact_manager.go - ValidateDownload(), ValidateAllDownloads()
compiler_types.go - (validation in types file)
config_helpers.go - (validation functions mixed with parsing)
error_helpers.go - (error validation)
github_tool_to_toolset.go - (toolset validation)
imports.go - (import validation)
jobs.go - (job validation)
js.go - validateNoRuntimeMixing(), validateNoLocalRequires(), validateNoModuleReferences()
mcp_renderer.go - (MCP validation)
permissions_validator.go - (permission validation - good name but not *_validation.go)
repo_memory.go - (memory validation)
safe_outputs_app.go - (app validation)
Specific Examples:
// pkg/workflow/agentic_engine.go - validation functions in engine file
func (c *Compiler) validateAgentFile(workflowData *WorkflowData, markdownPath string) error { ... }
func (c *Compiler) validateHTTPTransportSupport(tools map[string]any, engine CodingAgentEngine) error { ... }
func (c *Compiler) validateMaxTurnsSupport(frontmatter map[string]any, engine CodingAgentEngine) error { ... }
func (c *Compiler) validateWebSearchSupport(tools map[string]any, engine CodingAgentEngine) { ... }
func (c *Compiler) validateWorkflowRunBranches(workflowData *WorkflowData, markdownPath string) error { ... }
// Recommendation: Move to agent_validation.go// pkg/workflow/js.go - validation functions in JavaScript bundler file
func validateNoRuntimeMixing(mainScript string, sources map[string]string, targetMode RuntimeMode) error { ... }
func validateRuntimeModeRecursive(content string, currentPath string, sources map[string]string, targetMode RuntimeMode, checked map[string]bool) error { ... }
func validateNoLocalRequires(bundledContent string) error { ... }
func validateNoModuleReferences(bundledContent string) error { ... }
func ValidateEmbeddedResourceRequires(sources map[string]string) error { ... }
// Recommendation: Move to js_validation.go or bundler_validation.go// pkg/workflow/artifact_manager.go - validation in manager file
func (am *ArtifactManager) ValidateDownload(download *ArtifactDownload) error { ... }
func (am *ArtifactManager) ValidateAllDownloads() []error { ... }
// These are fine - methods on the manager structRecommendation:
Priority 1: Move Validation to Dedicated Files
- Create
js_validation.goand move all validation functions fromjs.go - Move validation functions from
agentic_engine.gotoagent_validation.go - Create
github_toolset_validation.goand move validation fromgithub_tool_to_toolset.go
Priority 2: Review and Consolidate
- Review
permissions_validator.go- rename topermissions_validation.gofor consistency - Consider if validation functions in small utility files should be colocated or moved
Why This Matters:
- Consistency: Developers expect validation logic in
*_validation.gofiles - Discoverability: Easier to find validation functions when they follow naming conventions
- Testing: Validation tests are easier to organize when validation functions are grouped
Estimated Impact:
- Create 2-3 new validation files
- Move 10-15 validation functions to appropriate files
- Improve code organization consistency
4. String Utility Organization (Exemplary - No Action Needed)
Status: ✅ Excellent Organization
The pkg/stringutil package demonstrates exemplary organization:
pkg/stringutil/
├── identifiers.go - Workflow name normalization, file path conversions
├── paths.go - Path normalization
├── sanitize.go - String sanitization functions
└── stringutil.go - General string utilities (Truncate, NormalizeWhitespace)
Well-Documented Pattern:
- Clear separation between "sanitize" (character validity) and "normalize" (format conversion)
- Documented in
pkg/workflow/strings.gowith guidance on when to use each pattern - Separate test files for each concern
This is a model for other packages to follow!
Detailed Function Clusters
Cluster 1: Config Parsing Functions ⚠️
Pattern: parse*Config functions
Files: 30+ files in pkg/workflow
Total Functions: ~35-40
Functions:
- parseAddLabelsConfig, parseAddReviewerConfig, parseAssignMilestoneConfig
- parseAssignToAgentConfig, parseAssignToUserConfig, parseCloseIssuesConfig
- parseCloseDiscussionsConfig, parseCommentsConfig, parseCopyProjectsConfig
- parseCreateProjectsConfig, parseDiscussionsConfig, parseDispatchWorkflowConfig
- parseHideCommentConfig, parseIssuesConfig, parseLinkSubIssueConfig
- parseMissingDataConfig, parseMissingToolConfig, parseNoOpConfig
- parsePullRequestsConfig, parsePushToPullRequestBranchConfig
- parseUpdateEntityConfig, parseUpdateIssuesConfig, parseUpdateDiscussionsConfig
- ... and more
Analysis: Could benefit from generic implementation to reduce duplication.
Cluster 2: Engine-Specific Functions ✅
Pattern: {engine}_* (claude_, copilot_, codex_)
Files: Well-organized by engine type
Status: ✅ Good organization
Examples:
claude_engine.go,claude_logs.go,claude_mcp.go,claude_tools.gocopilot_engine.go,copilot_logs.go,copilot_mcp.go,copilot_engine_execution.gocodex_engine.go,codex_logs.go,codex_mcp.go
Analysis: Excellent separation by AI engine with consistent file naming.
Cluster 3: Compiler Functions ✅
Pattern: compiler_*
Files: 21 files in pkg/workflow
Status: ✅ Well-organized
Files:
compiler.go- Main compilation entry pointcompiler_activation_jobs.go- Activation job generationcompiler_filters_validation.go- Filter validationcompiler_jobs.go- Job compilationcompiler_orchestrator.go- Orchestration logiccompiler_safe_output*.go(7 files) - Safe output compilationcompiler_types.go- Compiler types and structscompiler_yaml*.go(4 files) - YAML generation- ... and more
Analysis: Excellent breakdown of compiler functionality into focused files.
Cluster 4: Safe Outputs ✅
Pattern: safe_output* and safe_outputs_*
Files: 19 files
Status: ✅ Good organization with room for minor consolidation
Files:
safe_outputs.go- Main safe outputs logicsafe_outputs_app.go- Application-specificsafe_outputs_config*.go(7 files) - Configuration handlingsafe_outputs_domains_validation.go- Domain validationsafe_outputs_env.go- Environment variablessafe_outputs_jobs.go,safe_outputs_steps.go- Job/step generationsafe_output_builder.go,safe_output_config.go,safe_output_validation_config.go
Minor Suggestion: Consider consolidating the 3 safe_outputs_config_helpers*.go files into one.
Cluster 5: Runtime Detection ✅
Pattern: runtime_*
Files: 6 files
Status: ✅ Good organization
Files:
runtime_definitions.go- Runtime type definitionsruntime_detection.go- Auto-detection logicruntime_deduplication.go- Deduplication logicruntime_overrides.go- Manual overridesruntime_step_generator.go- Step generationruntime_validation.go- Validation
Analysis: Clear separation of concerns within runtime handling.
Cluster 6: Create Operations ✅
Pattern: create_*
Files: 8 files
Status: ✅ Well-organized
Files:
create_agent_session.gocreate_code_scanning_alert.gocreate_discussion.gocreate_issue.gocreate_pr_review_comment.gocreate_project.gocreate_project_status_update.gocreate_pull_request.go
Analysis: Each creation operation has its own file - excellent pattern!
Cluster 7: Update Operations ✅
Pattern: update_*
Files: 7 files (including helpers)
Status: ✅ Good organization
Files:
update_discussion.goupdate_entity_helpers.goupdate_issue.goupdate_project.goupdate_project_job.goupdate_pull_request.goupdate_release.go
Analysis: Consistent pattern matching create operations.
Summary of Recommendations
🔴 High Priority
None identified - overall code organization is strong.
🟡 Medium Priority
-
Consolidate parse*Config Pattern (Issue rejig docs #1)
- Use generics to reduce 30+ similar functions to a single implementation
- Estimated effort: 4-6 hours
- Benefits: 500-800 lines reduced, single source of truth
-
Move Validation Functions to Dedicated Files (Issue Add workflow: githubnext/agentics/weekly-research #3)
- Create
js_validation.gofor JavaScript validation - Move engine validation to
agent_validation.go - Estimated effort: 2-3 hours
- Benefits: Improved consistency and discoverability
- Create
🟢 Low Priority
-
Consolidate Single-Function Helper Files (Issue Add workflow: githubnext/agentics/weekly-research #2)
- Move 3 single-function helper files to appropriate locations
- Estimated effort: 1-2 hours
- Benefits: Reduced file count, better discoverability
-
Consider Helper File Consolidation (Issue Add workflow: githubnext/agentics/weekly-research #2)
- Review and potentially merge
safe_outputs_config_helpers*.gofiles - Estimated effort: 2-3 hours
- Benefits: Reduced file count
- Review and potentially merge
Implementation Checklist
Phase 1: Quick Wins (Low Effort, Clear Benefit)
- Move
GetCurrentGitTag()frompkg/workflow/git_helpers.gotopkg/gitutil/gitutil.go - Inline or move
validateIntRange()fromvalidation_helpers.go - Move map utility functions to
config_helpers.goor createpkg/maputilpackage - Remove empty or near-empty helper files
Phase 2: Validation Organization (Medium Effort)
- Create
pkg/workflow/js_validation.gofor JavaScript validation functions - Move validation functions from
agentic_engine.gotoagent_validation.go - Rename
permissions_validator.gotopermissions_validation.gofor consistency - Document validation file naming convention
Phase 3: Config Parser Refactoring (Higher Effort, High Impact)
- Design generic
parseConfig[T]()function using Go generics - Migrate 5-10 parse functions to use generic implementation
- Test and validate approach
- Migrate remaining parse functions
- Update tests to cover generic implementation
- Document new pattern for future config additions
Phase 4: Helper File Review (Optional)
- Audit all
*_helpers.gofiles for consolidation opportunities - Consider merging
safe_outputs_config_helpers*.gofiles - Document when to create vs. when to extend helper files
Positive Findings
✅ Excellent Patterns Observed
- Feature-Based Organization: The codebase follows a clear pattern of organizing files by feature (compiler_, mcp_, runtime_)
- Engine Isolation: AI engine implementations are cleanly separated (claude_, copilot_, codex_)
- Operation Grouping: Create/update operations follow consistent naming and file organization
- String Utilities: The
pkg/stringutilpackage is exceptionally well-organized with clear separation of concerns - Validation Files: Most validation logic is properly organized into dedicated
*_validation.gofiles - CLI Organization: The
pkg/clipackage shows good organization with clear command grouping
📊 Code Organization Metrics
- File Naming Consistency: 95%+ of files follow clear naming conventions
- Feature Cohesion: High - related functionality is properly grouped
- Code Duplication: Low-Medium - most duplication is in boilerplate (parse functions)
- Documentation: Good inline documentation in key files (e.g., strings.go)
Conclusion
The gh-aw codebase demonstrates strong architectural organization with clear patterns and conventions. The primary opportunities for improvement are:
- Reducing boilerplate in config parsing functions through generics
- Moving outlier validation functions to dedicated validation files for consistency
- Minor consolidation of small helper files
These refactoring opportunities are relatively low-risk and would improve maintainability without requiring significant architectural changes.
Overall Assessment: 🟢 Well-Organized Codebase with targeted improvement opportunities.
Next Steps
Please review these findings and prioritize based on:
- Current development velocity
- Team familiarity with Go generics
- Upcoming feature work that might benefit from refactoring
I recommend starting with Phase 1: Quick Wins as these are low-risk changes with immediate benefits.
Analysis Metadata:
- Repository: githubnext/gh-aw
- Workflow Run: §20998672341
- Analysis Date: 2026-01-14
- Files Analyzed: 422 Go source files
- Functions Cataloged: 3,330+ functions
- Primary Packages: pkg/workflow (224 files), pkg/cli (132 files), pkg/parser (26 files)
AI generated by Semantic Function Refactoring