Skip to content

refactor: eliminate cross-package duplicate functions via shared utilities#19309

Merged
pelikhan merged 4 commits intomainfrom
copilot/refactor-semantic-function-clustering-78d16a6e-58f6-4964-a509-d22f4050fb53
Mar 3, 2026
Merged

refactor: eliminate cross-package duplicate functions via shared utilities#19309
pelikhan merged 4 commits intomainfrom
copilot/refactor-semantic-function-clustering-78d16a6e-58f6-4964-a509-d22f4050fb53

Conversation

Copy link
Contributor

Copilot AI commented Mar 3, 2026

  • Plan established
  • Extract normalizeGitHubHostURLpkg/stringutil.NormalizeGitHubHostURL
    • Add exported function to pkg/stringutil/urls.go
    • Update doc comment to mention both http:// and https:// schemes
    • Add unit tests for NormalizeGitHubHostURL in pkg/stringutil/urls_test.go
    • Update pkg/parser/github.go to use shared function
    • Update pkg/cli/github.go to use shared function
  • Extract extractBaseRepopkg/gitutil.ExtractBaseRepo
    • Add exported function to pkg/gitutil/gitutil.go
    • Update pkg/workflow/action_resolver.go to use shared function
    • Update pkg/cli/update_actions.go to use shared function
    • Update test files (action_resolver_test.go, update_actions_test.go, update_actions_integration_test.go) to use shared function
  • Add FindGitRoot() (string, error) to pkg/gitutil/gitutil.go
    • Update pkg/cli/git.go to delegate to gitutil.FindGitRoot
    • Update pkg/workflow/git_helpers.go to delegate to gitutil.FindGitRoot
  • Run make fmt, make build, make lint — all passing
  • Fix CI build failure: update_actions_integration_test.go was still calling the old private extractBaseRepo; updated to gitutil.ExtractBaseRepo
Original prompt

This section details on the original issue you should resolve

<issue_title>[refactor] Semantic Function Clustering Analysis: Duplicates and Outliers in pkg/</issue_title>
<issue_description>Automated semantic analysis of 553 non-test Go files across pkg/ identified confirmed cross-package duplicates, outlier functions, and structural patterns that warrant attention.

Analysis date: 2026-03-02 | Workflow run: §22587119823


Executive Summary

  • Total Go files analyzed: 553 (across 18 packages)
  • Primary packages: pkg/workflow (272+ files), pkg/cli (188+ files), pkg/parser (31 files)
  • Confirmed exact duplicates: 1 function (identical implementation in two packages)
  • Confirmed near-duplicates: 2 functions (>90% similar across package boundaries)
  • Structural duplication patterns: 2 (engine log parsers, git root detection)
  • Underutilized utility packages: 1 (pkg/gitutil)

Identified Issues

1. Exact Duplicate: normalizeGitHubHostURL

Severity: High — identical private function implemented in two separate packages.

Location File
pkg/parser/github.go:42 Parser package
pkg/cli/github.go:37 CLI package

Both implementations are byte-for-byte identical:

View duplicate code
// pkg/parser/github.go:42
func normalizeGitHubHostURL(rawHostURL string) string {
    // Remove all trailing slashes
    normalized := strings.TrimRight(rawHostURL, "/")

    // Add https:// scheme if no scheme is present
    if !strings.HasPrefix(normalized, "https://") && !strings.HasPrefix(normalized, "(redacted) {
        normalized = "https://" + normalized
    }

    return normalized
}

// pkg/cli/github.go:37 — IDENTICAL
func normalizeGitHubHostURL(rawHostURL string) string {
    // Remove all trailing slashes
    normalized := strings.TrimRight(rawHostURL, "/")

    // Add https:// scheme if no scheme is present
    if !strings.HasPrefix(normalized, "https://") && !strings.HasPrefix(normalized, "(redacted) {
        normalized = "https://" + normalized
    }

    return normalized
}

Recommendation: Extract to pkg/stringutil as NormalizeGitHubHostURL(rawHostURL string) string (exported). This fits alongside the existing URL utilities in pkg/stringutil/urls.go.


2. Near-Duplicate: extractBaseRepo

Severity: High — nearly identical private function across two packages, same algorithm, different variable names only.

Location File Parameter name
pkg/workflow/action_resolver.go:111 Workflow package repo string
pkg/cli/update_actions.go:21 CLI package actionPath string
View near-duplicate code
// pkg/workflow/action_resolver.go:111
func extractBaseRepo(repo string) string {
    parts := strings.Split(repo, "/")
    if len(parts) >= 2 {
        // Take first two parts (owner/repo)
        return parts[0] + "/" + parts[1]
    }
    return repo
}

// pkg/cli/update_actions.go:21 — same logic, different parameter name and comment
func extractBaseRepo(actionPath string) string {
    parts := strings.Split(actionPath, "/")
    if len(parts) >= 2 {
        // Return owner/repo (first two segments)
        return parts[0] + "/" + parts[1]
    }
    // If less than 2 parts, return as-is (shouldn't happen in practice)
    return actionPath
}

Recommendation: Extract to pkg/gitutil as ExtractBaseRepo(repoPath string) string. The pkg/gitutil package already exists for shared git-related utilities.


3. Similar Logic: findGitRoot in Three Locations

Severity: Medium — the same git rev-parse --show-toplevel shell invocation is independently implemented in two packages, with a third stub for WASM.

Location Return type Error handling
pkg/workflow/git_helpers.go:53 string Returns "" on error
pkg/cli/git.go:23 (string, error) Idiomatic Go error return
pkg/workflow/git_helpers_wasm.go:9 string Always returns "." (WASM stub)

The pkg/gitutil package exists (with IsAuthError and IsHexString) but does not contain git root detection despite being the natural home for it.

Recommendation: Add FindGitRoot() (string, error) to pkg/gitutil/gitutil.go. The pkg/cli/git.go version (with proper error return) is the more idiomatic implementation to use as the basis.


4. Engine Log Parser Structural Duplication

Severity: Medium — four engine log parsers each independently implement the same ParseLogMetrics interface plus a private parseXxxToolCallsWithSequence method, with no shared base utilities.

Engine File Lines Tool call parser method
Claude `pkg/workfl...

🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

… gitutil)"

- Add NormalizeGitHubHostURL to pkg/stringutil/urls.go
- Add ExtractBaseRepo and FindGitRoot to pkg/gitutil/gitutil.go
- Update pkg/parser/github.go and pkg/cli/github.go to use stringutil.NormalizeGitHubHostURL
- Update pkg/workflow/action_resolver.go and pkg/cli/update_actions.go to use gitutil.ExtractBaseRepo
- Update pkg/cli/git.go to delegate findGitRoot to gitutil.FindGitRoot
- Update pkg/workflow/git_helpers.go to delegate findGitRoot to gitutil.FindGitRoot
- Update test files to use the shared functions

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Refactor semantic function clustering analysis with duplicates and outliers refactor: eliminate cross-package duplicate functions via shared utilities Mar 3, 2026
@pelikhan pelikhan marked this pull request as ready for review March 3, 2026 02:36
Copilot AI review requested due to automatic review settings March 3, 2026 02:36
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors duplicated helper functions across packages by extracting shared GitHub URL normalization and git-related utilities into canonical utility packages, and updating all callers/tests to use them.

Changes:

  • Added stringutil.NormalizeGitHubHostURL and replaced duplicated private normalizers in pkg/parser and pkg/cli.
  • Added gitutil.ExtractBaseRepo and gitutil.FindGitRoot, updating CLI/workflow callers and removing local duplicates.
  • Updated affected tests to call the shared utilities.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pkg/stringutil/urls.go Adds exported NormalizeGitHubHostURL utility for consistent host normalization.
pkg/gitutil/gitutil.go Adds shared ExtractBaseRepo and FindGitRoot utilities.
pkg/parser/github.go Switches host normalization to stringutil.NormalizeGitHubHostURL and removes duplicate helper.
pkg/cli/github.go Switches host normalization to stringutil.NormalizeGitHubHostURL and removes duplicate helper.
pkg/cli/git.go Delegates findGitRoot() to gitutil.FindGitRoot().
pkg/workflow/git_helpers.go Delegates internal findGitRoot() to gitutil.FindGitRoot() while preserving wrapper signature.
pkg/workflow/action_resolver.go Uses gitutil.ExtractBaseRepo and removes local duplicate implementation.
pkg/cli/update_actions.go Uses gitutil.ExtractBaseRepo and removes local duplicate implementation.
pkg/workflow/action_resolver_test.go Updates tests to call gitutil.ExtractBaseRepo.
pkg/cli/update_actions_test.go Updates tests to call gitutil.ExtractBaseRepo.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +8 to +9
// NormalizeGitHubHostURL ensures the host URL has https:// scheme and no trailing slashes.
// It is safe to call with URLs that already have the https:// scheme.
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment says NormalizeGitHubHostURL is safe to call with URLs that already have the https:// scheme, but the implementation also explicitly preserves http://. Update the comment to reflect both schemes (or refer to “a scheme”).

Suggested change
// NormalizeGitHubHostURL ensures the host URL has https:// scheme and no trailing slashes.
// It is safe to call with URLs that already have the https:// scheme.
// NormalizeGitHubHostURL ensures the host URL has a scheme (defaulting to https://) and no trailing slashes.
// It is safe to call with URLs that already have an http:// or https:// scheme.

Copilot uses AI. Check for mistakes.
Comment on lines +8 to +19
// NormalizeGitHubHostURL ensures the host URL has https:// scheme and no trailing slashes.
// It is safe to call with URLs that already have the https:// scheme.
func NormalizeGitHubHostURL(rawHostURL string) string {
// Remove all trailing slashes
normalized := strings.TrimRight(rawHostURL, "/")

// Add https:// scheme if no scheme is present
if !strings.HasPrefix(normalized, "https://") && !strings.HasPrefix(normalized, "http://") {
normalized = "https://" + normalized
}

return normalized
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NormalizeGitHubHostURL is a newly exported helper and is now used by both pkg/parser and pkg/cli, but there are no unit tests covering its behavior (scheme preservation/addition, trailing-slash trimming). Adding focused tests in pkg/stringutil/urls_test.go would help prevent regressions in host resolution across the app.

Copilot uses AI. Check for mistakes.
@pelikhan
Copy link
Contributor

pelikhan commented Mar 3, 2026

@copilot apply comments, review test errors

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 3, 2026

@copilot apply comments, review test errors

Done in d475bcd:

  • Updated the NormalizeGitHubHostURL doc comment to mention both http:// and https:// schemes (per the reviewer suggestion)
  • Added TestNormalizeGitHubHostURL to pkg/stringutil/urls_test.go covering scheme addition, scheme preservation (both http and https), single/multiple trailing slash removal, and the combined case

@pelikhan
Copy link
Contributor

pelikhan commented Mar 3, 2026

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 3, 2026

@copilot fix tests https://github.com/github/gh-aw/actions/runs/22606160802/

Fixed in 2d8c14b. The CI failure was pkg/cli/update_actions_integration_test.go still calling the old private extractBaseRepo — it was a unit-test-only file I'd updated but missed the integration test counterpart. Updated it to import gitutil and call gitutil.ExtractBaseRepo instead.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • https://api.github.com/repos/actions/github-script/git/ref/tags/v8
    • Triggering command: /usr/bin/gh gh api /repos/actions/github-script/git/ref/tags/v8 --jq .object.sha (http block)
    • Triggering command: /usr/bin/gh gh api /repos/actions/github-script/git/ref/tags/v8 --jq .object.sha ub/workflows (http block)
    • Triggering command: /usr/bin/gh gh api /repos/actions/github-script/git/ref/tags/v8 --jq .object.sha 0 -j ACCEPT (http block)
  • https://api.github.com/repos/githubnext/agentics/contents/workflows/shared/reporting.md
    • Triggering command: /tmp/go-build2988552058/b001/cli.test /tmp/go-build2988552058/b001/cli.test -test.testlogfile=/tmp/go-build2988552058/b001/testlog.txt -test.paniconexit0 -test.timeout=10m0s -test.run=TestCompile|TestPoutine|TestExtractBaseRepo (http block)

If you need me to access, download, or install something from one of these locations, you can either:

@pelikhan pelikhan merged commit bf0d2b5 into main Mar 3, 2026
77 checks passed
@pelikhan pelikhan deleted the copilot/refactor-semantic-function-clustering-78d16a6e-58f6-4964-a509-d22f4050fb53 branch March 3, 2026 03:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[refactor] Semantic Function Clustering Analysis: Duplicates and Outliers in pkg/

3 participants