Skip to content

CI Pipeline to verify skill counts match source repos#3

Merged
mosheabr merged 4 commits intoNVIDIA:mainfrom
shljessie:add-ci-pipeline
Apr 7, 2026
Merged

CI Pipeline to verify skill counts match source repos#3
mosheabr merged 4 commits intoNVIDIA:mainfrom
shljessie:add-ci-pipeline

Conversation

@shljessie
Copy link
Copy Markdown
Contributor

@shljessie shljessie commented Apr 6, 2026

Summary

  • Adds a GitHub Actions workflow that checks whether the skill counts listed in the README actually match the skill directories in each source repo
  • Queries each product repo's skills path via the GitHub API and compares the actual number of skill subdirectories against the count claimed in the Available Skills table
  • Outputs a clear drift report table showing expected vs. actual counts per product

Triggers

  • Pull requests to main — ensures README changes don't introduce incorrect counts
  • Daily schedule (08:00 UTC) — catches upstream repos adding/removing skills

@shljessie shljessie requested a review from mosheabr as a code owner April 6, 2026 19:08
@mosheabr mosheabr self-assigned this Apr 6, 2026
Copy link
Copy Markdown
Collaborator

@mosheabr mosheabr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Request Changes

Great idea — automated drift detection is exactly what this catalog needs. A few issues to address before merging:

1. NeMo Evaluator has skills in two paths (will report wrong count)

The workflow only checks packages/nemo-evaluator-launcher/.claude/skills (3 skills) but NeMo Evaluator also has skills at packages/nemo-evaluator/.claude/skills (1 skill: byob). The expected count is 4, but the CI will find 3 and report drift.

Fix: Either add a second entry for the second path, or sum both paths for NeMo Evaluator.

2. sed pattern won't match README labels

The sed command uses labels like Megatron-Core and Megatron-Bridge (with hyphens), but the README uses Megatron Core and Megatron Bridge (with spaces inside **bold** markers). The regex \*\*${label}\*\* won't match.

Fix: Use the display name with spaces in the label field, or adjust the sed pattern.

3. GITHUB_TOKEN may not access private source repos

Several source repos (Model-Optimizer, Megatron-LM, etc.) may be private or have restricted access. The default GITHUB_TOKEN only has permissions for the repo the workflow runs in. The gh api calls to other repos may return 404.

Fix: May need a PAT or GitHub App token with cross-repo read access stored as a repo secret.

4. Permissions are broader than needed for PR checks

permissions: contents: write, pull-requests: write grants write access on every trigger, but PR checks only need read. The write permissions are only needed for the scheduled auto-fix path.

Suggestion: Consider splitting into two jobs — a read-only check job (runs on PRs) and a write job (runs on schedule/dispatch only).


The concept is solid. Issues #1 and #2 will cause false drift reports on day one, so those should be fixed before merge.

Signed-off-by: Seonghee Lee <seongheel@nvidia.com>
Signed-off-by: Seonghee Lee <seongheel@nvidia.com>
Signed-off-by: Seonghee Lee <seongheel@nvidia.com>
Signed-off-by: Seonghee Lee <seongheel@nvidia.com>
Copy link
Copy Markdown
Collaborator

@mosheabr mosheabr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated Review

Thanks for the fixes — the split permissions, NeMo Evaluator dual-path handling, and private repo token fallback all look good. Three of the four original issues are resolved. ✓

Remaining issue: Product names should not be hyphenated in README

The latest commit changes display names in the README from spaces to hyphens:

  • "Model Optimizer" → "Model-Optimizer"
  • "Megatron Core" → "Megatron-Core"
  • "Megatron Bridge" → "Megatron-Bridge"

This was done to make the sed pattern matching work in the CI, but it introduces inconsistency — every other product uses natural names with spaces (cuOpt, TensorRT-LLM, Nemotron Voice Agent, NeMo Gym, NeMo Evaluator).

The fix-drift job already has the right approach with escaped_label using sed 's/ /[[:space:]]/g'. Apply the same escaping in the check job's label definitions and keep the README display names with spaces.

In short: fix the CI to handle spaces, don't change the README to match the CI.

@shljessie
Copy link
Copy Markdown
Contributor Author

The hyphens are actually correct for these products (Megatron-LM, Megatron-Bridge, Model-Optimizer) so I've updated the README.md to match the official repo names.

This is different from products like "Nemotron Voice Agent" or "NeMo Evaluator" where the display name may use spaces. Not under our org so I'm not sure what the convention is here.

@shljessie shljessie requested a review from mosheabr April 7, 2026 17:46
Copy link
Copy Markdown
Collaborator

@mosheabr mosheabr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving — the CI pipeline logic is solid and addresses the key issues (split permissions, dual-path NeMo Evaluator, private repo token fallback).

Note: The hyphenated product names in README (Model-Optimizer, Megatron-Core, Megatron-Bridge) are inconsistent with other entries that use spaces, but we can fix that in a follow-up. Not blocking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants