Skip to content

[refactor] Semantic Function Clustering Analysis: Duplicates and Outliers in pkg/ #19255

@github-actions

Description

@github-actions

Automated semantic analysis of 553 non-test Go files across pkg/ identified confirmed cross-package duplicates, outlier functions, and structural patterns that warrant attention.

Analysis date: 2026-03-02 | Workflow run: §22587119823


Executive Summary

  • Total Go files analyzed: 553 (across 18 packages)
  • Primary packages: pkg/workflow (272+ files), pkg/cli (188+ files), pkg/parser (31 files)
  • Confirmed exact duplicates: 1 function (identical implementation in two packages)
  • Confirmed near-duplicates: 2 functions (>90% similar across package boundaries)
  • Structural duplication patterns: 2 (engine log parsers, git root detection)
  • Underutilized utility packages: 1 (pkg/gitutil)

Identified Issues

1. Exact Duplicate: normalizeGitHubHostURL

Severity: High — identical private function implemented in two separate packages.

Location File
pkg/parser/github.go:42 Parser package
pkg/cli/github.go:37 CLI package

Both implementations are byte-for-byte identical:

View duplicate code
// pkg/parser/github.go:42
func normalizeGitHubHostURL(rawHostURL string) string {
    // Remove all trailing slashes
    normalized := strings.TrimRight(rawHostURL, "/")

    // Add https:// scheme if no scheme is present
    if !strings.HasPrefix(normalized, "https://") && !strings.HasPrefix(normalized, "(redacted) {
        normalized = "https://" + normalized
    }

    return normalized
}

// pkg/cli/github.go:37 — IDENTICAL
func normalizeGitHubHostURL(rawHostURL string) string {
    // Remove all trailing slashes
    normalized := strings.TrimRight(rawHostURL, "/")

    // Add https:// scheme if no scheme is present
    if !strings.HasPrefix(normalized, "https://") && !strings.HasPrefix(normalized, "(redacted) {
        normalized = "https://" + normalized
    }

    return normalized
}

Recommendation: Extract to pkg/stringutil as NormalizeGitHubHostURL(rawHostURL string) string (exported). This fits alongside the existing URL utilities in pkg/stringutil/urls.go.


2. Near-Duplicate: extractBaseRepo

Severity: High — nearly identical private function across two packages, same algorithm, different variable names only.

Location File Parameter name
pkg/workflow/action_resolver.go:111 Workflow package repo string
pkg/cli/update_actions.go:21 CLI package actionPath string
View near-duplicate code
// pkg/workflow/action_resolver.go:111
func extractBaseRepo(repo string) string {
    parts := strings.Split(repo, "/")
    if len(parts) >= 2 {
        // Take first two parts (owner/repo)
        return parts[0] + "/" + parts[1]
    }
    return repo
}

// pkg/cli/update_actions.go:21 — same logic, different parameter name and comment
func extractBaseRepo(actionPath string) string {
    parts := strings.Split(actionPath, "/")
    if len(parts) >= 2 {
        // Return owner/repo (first two segments)
        return parts[0] + "/" + parts[1]
    }
    // If less than 2 parts, return as-is (shouldn't happen in practice)
    return actionPath
}

Recommendation: Extract to pkg/gitutil as ExtractBaseRepo(repoPath string) string. The pkg/gitutil package already exists for shared git-related utilities.


3. Similar Logic: findGitRoot in Three Locations

Severity: Medium — the same git rev-parse --show-toplevel shell invocation is independently implemented in two packages, with a third stub for WASM.

Location Return type Error handling
pkg/workflow/git_helpers.go:53 string Returns "" on error
pkg/cli/git.go:23 (string, error) Idiomatic Go error return
pkg/workflow/git_helpers_wasm.go:9 string Always returns "." (WASM stub)

The pkg/gitutil package exists (with IsAuthError and IsHexString) but does not contain git root detection despite being the natural home for it.

Recommendation: Add FindGitRoot() (string, error) to pkg/gitutil/gitutil.go. The pkg/cli/git.go version (with proper error return) is the more idiomatic implementation to use as the basis.


4. Engine Log Parser Structural Duplication

Severity: Medium — four engine log parsers each independently implement the same ParseLogMetrics interface plus a private parseXxxToolCallsWithSequence method, with no shared base utilities.

Engine File Lines Tool call parser method
Claude pkg/workflow/claude_logs.go ~470 parseToolCallsWithSequence
Codex pkg/workflow/codex_logs.go ~383 parseCodexToolCallsWithSequence
Copilot pkg/workflow/copilot_logs.go ~458 parseCopilotToolCallsWithSequence
Gemini pkg/workflow/gemini_logs.go ~106 (no tool call parser)
View pattern repetition across files

All three full engines follow the same pattern:

  1. ParseLogMetrics(logContent string, verbose bool) LogMetrics — public interface method
  2. A line-by-line log scanner loop
  3. A parseXxxToolCallsWithSequence(...) private method that populates a map[string]*ToolCallInfo

The ToolCallInfo population and duration distribution logic in particular shows structural similarity that could be abstracted into shared helper functions.

Recommendation: Extract common log parsing utilities (e.g., distributeToolCallDurations, initToolCallMap) into a shared log_parsing_helpers.go file in the pkg/workflow package. Each engine's unique format detection can remain in its own file.


5. pkg/gitutil Is Underutilized

Severity: Low — the pkg/gitutil package was presumably created as a home for shared git utilities, but currently contains only 2 functions (IsAuthError, IsHexString). Meanwhile, git-related functions are implemented independently in pkg/workflow and pkg/cli.

Functions that would naturally belong in pkg/gitutil:

  • findGitRoot (see issue 3 above)
  • extractBaseRepo (see issue 2 above, related to git repo paths)
  • GetCurrentGitTag — currently only in pkg/workflow/git_helpers.go, but likely useful to pkg/cli as well

Recommendation: Use pkg/gitutil as the canonical home for cross-cutting git utility functions. This gives clear discoverability and avoids future re-duplication.


Refactoring Recommendations

Priority Action Files Affected Estimated Effort
High Extract NormalizeGitHubHostURLpkg/stringutil/urls.go pkg/parser/github.go, pkg/cli/github.go 30 min
High Extract ExtractBaseRepopkg/gitutil/gitutil.go pkg/workflow/action_resolver.go, pkg/cli/update_actions.go 30 min
Medium Add FindGitRoot to pkg/gitutil, update callers pkg/workflow/git_helpers.go, pkg/cli/git.go 1–2 hours
Medium Extract shared log parsing helpers for engine parsers pkg/workflow/*_logs.go (4 files) 2–3 hours
Low Audit pkg/gitutil for other functions that should live there pkg/workflow/git_helpers.go 1 hour

Implementation Checklist

  • Extract normalizeGitHubHostURL to pkg/stringutil as exported NormalizeGitHubHostURL
  • Update pkg/parser/github.go and pkg/cli/github.go to use the shared function
  • Extract extractBaseRepo to pkg/gitutil as exported ExtractBaseRepo
  • Update pkg/workflow/action_resolver.go and pkg/cli/update_actions.go callers
  • Add FindGitRoot() (string, error) to pkg/gitutil/gitutil.go
  • Migrate pkg/cli/git.go's findGitRoot to use pkg/gitutil.FindGitRoot
  • Review engine log parsers for extractable common utilities
  • Run full test suite to verify no regressions

References:

Generated by Semantic Function Refactoring

  • expires on Mar 4, 2026, 5:20 PM UTC

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions