Skip to content

OSV artifact generation for use in vulnerabilities repository#42203

Merged
ksykulev merged 16 commits intomainfrom
41571-osv-cron
Mar 25, 2026
Merged

OSV artifact generation for use in vulnerabilities repository#42203
ksykulev merged 16 commits intomainfrom
41571-osv-cron

Conversation

@ksykulev
Copy link
Copy Markdown
Contributor

@ksykulev ksykulev commented Mar 20, 2026

Related issue: Resolves #41571

Full Artifacts:
Ubuntu 14.04: 901 KB
Ubuntu 16.04: 2.0 MB
Ubuntu 18.04: 4.3 MB
Ubuntu 20.04: 5.9 MB
Ubuntu 22.04: 5.6 MB
Ubuntu 24.04: 1.7 MB
Ubuntu 24.10: 4.4 KB
Ubuntu 25.04: 6.0 KB
Ubuntu 25.10: 207 KB

Total Size:
All artifacts (full + deltas): 31 MB (was 54 MB)
Full artifacts only: ~20 MB (was ~27 MB)
Delta artifacts: ~11 MB (was ~27 MB)

Testing

  • Added/updated automated tests
  • QA'd all new/changed functionality manually

Summary by CodeRabbit

  • New Features
    • Added a vulnerability data processor that scans OSV JSON inputs, aggregates per-Ubuntu-version artifacts, supports inclusive/exclusive version filters, and can emit optional “today”/“yesterday” delta artifacts.
    • Added a repository sync-and-change-detection tool that generates de-duplicated lists of CVE-related files changed today and yesterday.
    • Processor expands certain package names (e.g., emacs) into additional package entries for broader coverage.

@ksykulev ksykulev requested a review from a team as a code owner March 20, 2026 23:08
Copilot AI review requested due to automatic review settings March 20, 2026 23:08
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review.

@ksykulev
Copy link
Copy Markdown
Contributor Author

ksykulev commented Mar 20, 2026

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 20, 2026

✅ Actions performed

Full review triggered.

1 similar comment
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 20, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new cmd/osv-processor utility to convert Canonical Ubuntu OSV CVE JSON into Fleet-friendly, version-scoped vulnerability artifacts (including delta artifacts) intended for syncing into a vulnerabilities repository.

Changes:

  • Introduces a Go-based processor that walks OSV CVE JSON files, groups vulns per Ubuntu version and package, and writes compressed JSON artifacts (full + optional “today/yesterday” deltas).
  • Adds a shell script to shallow-sync Canonical’s ubuntu-security-notices repo and emit “changed files” lists for delta generation.
  • Adds a transformation hook to expand/override package mappings and CVE-specific adjustments.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File Description
cmd/osv-processor/main.go Core artifact generation (full + delta), OSV parsing, version/CVE extraction, gzip JSON writing.
cmd/osv-processor/transforms.go Transformation/filter hook for package expansion and CVE-specific tweaks.
cmd/osv-processor/sync-and-detect-changes.sh Shallow clone/fetch Canonical OSV repo and produce changed-file manifests for delta builds.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cmd/osv-processor/sync-and-detect-changes.sh Outdated
Comment thread cmd/osv-processor/sync-and-detect-changes.sh Outdated
Comment thread cmd/osv-processor/main.go Outdated
Comment thread cmd/osv-processor/main.go
Comment thread cmd/osv-processor/main.go Outdated
Comment thread cmd/osv-processor/main.go Outdated
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 20, 2026

Walkthrough

Adds a new OSV processing tool: a Go CLI (cmd/osv-processor/main.go) that recursively scans an input directory for .json OSV files, deserializes them into typed models, extracts Ubuntu versions and CVE identifiers, optionally restricts processing using include/exclude version flags or changed-file lists (today/yesterday), aggregates processed vulnerabilities into per-Ubuntu-version in-memory artifacts, computes totals, and writes gzip-compressed, indented JSON artifact files (including optional deltas). Adds a sync script (cmd/osv-processor/sync-and-detect-changes.sh) to sync the ubuntu-security-notices repo and produce changed-file lists, and adds cmd/osv-processor/transforms.go with transformVuln that expands package names (e.g., emacsemacs-common, emacs-el).

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning The pull request description is minimal and lacks required template sections including issue reference, checklist items, testing details, database migrations, and configuration settings guidance. Complete the pull request description using the repository template. Include Related issue reference, relevant checklist items (changes files, validation, testing, migrations), and confirmation of manual testing.
Linked Issues check ❓ Inconclusive The linked issue #41571 provides minimal detail beyond requesting 'OSV feed Ubuntu create vulnerabilities repo cron sync', making it difficult to fully assess whether all requirements are met. Provide more detailed requirements from the linked issue or clarify what specific cron behavior, sync mechanism, and repository integration objectives must be validated.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'OSV artifact generation for use in vulnerabilities repository' directly and clearly summarizes the main change: the PR adds OSV artifact generation functionality.
Out of Scope Changes check ✅ Passed All code changes (OSV processing CLI, bash sync script, and transformation logic) are directly related to OSV artifact generation and repository synchronization as stated in the PR objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch 41571-osv-cron

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can validate your CodeRabbit configuration file in your editor.

If your editor has YAML language server, you can enable auto-completion and validation by adding # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json at the top of your CodeRabbit configuration file.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (5)
cmd/osv-processor/main.go (3)

206-246: Multiple time.Now() calls may cause timestamp inconsistencies across artifacts.

time.Now().UTC() is called repeatedly inside the loop for the Generated field. If processing takes significant time, artifacts created later will have different timestamps. Consider capturing the generation time once at the start of main() for consistency.

💡 Optional: Use consistent timestamp
+	generationTime := time.Now().UTC()
+	generationTimeStr := generationTime.Format(time.RFC3339)
+
 	err := filepath.Walk(*inputDir, func(path string, info os.FileInfo, err error) error {
 		// ...
 		for _, pkg := range packages {
 			if _, exists := artifacts[ubuntuVer]; !exists {
 				artifacts[ubuntuVer] = &ArtifactData{
 					SchemaVersion:   "1.0",
 					UbuntuVersion:   ubuntuVer,
-					Generated:       time.Now().UTC().Format(time.RFC3339),
+					Generated:       generationTimeStr,
 					Vulnerabilities: make(map[string][]ProcessedVuln),
 				}
 			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/osv-processor/main.go` around lines 206 - 246, Capture a single
generation timestamp once (e.g., generationTime :=
time.Now().UTC().Format(time.RFC3339)) near the start of main() and use that
value when constructing ArtifactData instead of calling
time.Now().UTC().Format(time.RFC3339) inline; update the three spots that create
ArtifactData for artifacts[ubuntuVer], todayArtifacts[ubuntuVer], and
yesterdayArtifacts[ubuntuVer] so their Generated field is set to generationTime
for consistent timestamps across all ArtifactData instances.

144-153: Path separator mismatch may cause delta detection failures on Windows.

filepath.Join and filepath.Rel use OS-specific separators, but the changed files from git log use forward slashes. On Windows, this would cause the map lookup to fail (e.g., osv\cve\file.json vs osv/cve/file.json).

If Windows support is needed, normalize to forward slashes:

♻️ Proposed fix: Normalize path separators
 		if generateTodayDeltas || generateYesterdayDeltas {
 			relPath, err := filepath.Rel(*inputDir, path)
 			if err == nil {
-				fullRelPath := filepath.Join("osv/cve", relPath)
+				// Normalize to forward slashes for consistency with git output
+				fullRelPath := "osv/cve/" + filepath.ToSlash(relPath)
 				if generateTodayDeltas {
 					inToday = todayCVEFiles[fullRelPath]
 				}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/osv-processor/main.go` around lines 144 - 153, The delta lookup fails on
Windows because filepath.Join (and filepath.Rel) produce OS-specific separators
while the git-derived keys use forward slashes; update the code around relPath
and fullRelPath so fullRelPath is normalized to use forward slashes before
looking up todayCVEFiles/yesterdayCVEFiles (e.g., build the prefix with
"osv/cve" and then convert any backslashes in relPath to '/' or use path.Join
from the path package which always uses forward slashes) so the map lookups for
generateTodayDeltas/generateYesterdayDeltas succeed cross-platform.

357-368: Version extraction may miss non-standard version formats.

The check len(part) == 5 && strings.Contains(part, ".") works for versions like 24.04 but would miss:

  • Older Ubuntu versions with shorter formats (e.g., 8.04)
  • Future versions if Ubuntu ever exceeds 99.99

Given the PR mentions Ubuntu 14.04 as the oldest version, this is likely fine for current use, but consider a more robust regex pattern for future-proofing.

💡 Optional: More robust version detection
+import "regexp"
+
+var ubuntuVersionRegex = regexp.MustCompile(`^\d{1,2}\.\d{2}$`)
+
 func extractUbuntuVersion(ecosystem string) string {
-	// Example: "Ubuntu:24.04:LTS" -> "24.04"
-	// Example: "Ubuntu:Pro:22.04:LTS" -> "22.04"
 	parts := strings.Split(ecosystem, ":")
 	for _, part := range parts {
-		// Look for version pattern like "24.04", "22.04", "20.04"
-		if len(part) == 5 && strings.Contains(part, ".") {
+		if ubuntuVersionRegex.MatchString(part) {
 			return part
 		}
 	}
 	return ""
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/osv-processor/main.go` around lines 357 - 368, The extractUbuntuVersion
function currently checks parts with len(part) == 5 which misses versions like
"8.04" or future multi-digit versions; replace the fixed-length check with a
regex match that accepts one or more digits, a dot, then one or more digits
(e.g. using regexp.MustCompile(`^\d+\.\d+$`).MatchString) to identify version
tokens robustly and return the first matching part.
cmd/osv-processor/sync-and-detect-changes.sh (1)

60-74: Working directory coupling may cause issues if script is invoked from different locations.

The script writes output files using relative paths (../changed_files_today.txt) and changes directories multiple times. This coupling assumes the script is always run from the same location.

Consider deriving an absolute base path from the script's location for more robust operation:

♻️ Suggested improvement: Use script directory as base
+# Get script directory for consistent path resolution
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+OUTPUT_DIR="${SCRIPT_DIR}"
+
 # Configuration
 REPO_URL="https://github.com/canonical/ubuntu-security-notices.git"
-REPO_DIR="ubuntu-security-notices"
+REPO_DIR="${SCRIPT_DIR}/ubuntu-security-notices"

Then use ${OUTPUT_DIR}/changed_files_today.txt instead of ../changed_files_today.txt.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/osv-processor/sync-and-detect-changes.sh` around lines 60 - 74, The
script currently relies on relative paths and cd'ing into "$REPO_DIR", then
writing "../changed_files_today.txt" and "../changed_files_yesterday.txt", which
breaks when invoked from other locations; change it to compute an absolute
OUTPUT_DIR based on the script location (e.g., derive from $0) and/or make
REPO_DIR absolute, avoid changing working directories, and write outputs to
"${OUTPUT_DIR}/changed_files_today.txt" and
"${OUTPUT_DIR}/changed_files_yesterday.txt" instead of "../...", updating uses
of TODAY_UTC, YESTERDAY_UTC, TODAY_COUNT and YESTERDAY_COUNT to read from those
absolute files.
cmd/osv-processor/transforms.go (1)

4-29: Unused parameters may trigger linter warnings.

The cveID and vuln parameters are currently only referenced in commented-out code. While the comments illustrate intended extension points, unused parameters may trigger linter warnings (e.g., ineffassign, unparam).

Consider either:

  1. Prefixing with _ to indicate intentional non-use: _ string, _ *ProcessedVuln
  2. Adding a //nolint directive if you prefer to keep the parameter names for documentation
💡 Optional: Silence unused parameter warnings
-func transformVuln(packageName, cveID string, vuln *ProcessedVuln) (packages []string, modifiedVuln *ProcessedVuln) {
+func transformVuln(packageName, _ string, _ *ProcessedVuln) (packages []string, modifiedVuln *ProcessedVuln) {

Or keep names with a nolint directive:

//nolint:unparam // cveID and vuln reserved for future CVE-specific transformations
func transformVuln(packageName, cveID string, vuln *ProcessedVuln) (packages []string, modifiedVuln *ProcessedVuln) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/osv-processor/transforms.go` around lines 4 - 29, The function
transformVuln currently doesn't use the cveID and vuln parameters which can
trigger unparam/unused warnings; update the signature or annotate it: either
rename parameters to use blank identifier (e.g., packageName, _ string, _
*ProcessedVuln) to signal intentional non-use, or add a function-level nolint
directive (e.g., //nolint:unparam // cveID and vuln reserved for future
CVE-specific transformations) above transformVuln to suppress the warning while
keeping the current names for documentation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/osv-processor/main.go`:
- Around line 414-428: The function writeArtifact currently defers
gzWriter.Close() which ignores any error from closing the gzip writer; change
the cleanup to explicitly close gzWriter and file and return any Close errors:
call encoder.Encode(artifact) first, then call gzWriter.Close() and if it
returns an error return that error (or wrap/append it), then call file.Close()
and return any file.Close() error (or the prior error if present). Update
writeArtifact to remove the deferred gzWriter.Close() and ensure both
gzip.Writer.Close() and os.File.Close() errors are checked and returned
(referencing writeArtifact, gzWriter.Close, file.Close, and encoder.Encode).

In `@cmd/osv-processor/sync-and-detect-changes.sh`:
- Around line 18-19: The variable DELTA_DAYS is defined but never used; either
remove the unused DELTA_DAYS definition or use it to drive the delta-generation
loop instead of the current hardcoded "today + yesterday" logic—update the
script's delta generation section to iterate DELTA_DAYS times (e.g., for i in
0..DELTA_DAYS-1) to generate deltas for N days, or simply delete the DELTA_DAYS
line if you prefer a simpler cleanup; keep DAYS_TO_KEEP unchanged if still used
elsewhere.

---

Nitpick comments:
In `@cmd/osv-processor/main.go`:
- Around line 206-246: Capture a single generation timestamp once (e.g.,
generationTime := time.Now().UTC().Format(time.RFC3339)) near the start of
main() and use that value when constructing ArtifactData instead of calling
time.Now().UTC().Format(time.RFC3339) inline; update the three spots that create
ArtifactData for artifacts[ubuntuVer], todayArtifacts[ubuntuVer], and
yesterdayArtifacts[ubuntuVer] so their Generated field is set to generationTime
for consistent timestamps across all ArtifactData instances.
- Around line 144-153: The delta lookup fails on Windows because filepath.Join
(and filepath.Rel) produce OS-specific separators while the git-derived keys use
forward slashes; update the code around relPath and fullRelPath so fullRelPath
is normalized to use forward slashes before looking up
todayCVEFiles/yesterdayCVEFiles (e.g., build the prefix with "osv/cve" and then
convert any backslashes in relPath to '/' or use path.Join from the path package
which always uses forward slashes) so the map lookups for
generateTodayDeltas/generateYesterdayDeltas succeed cross-platform.
- Around line 357-368: The extractUbuntuVersion function currently checks parts
with len(part) == 5 which misses versions like "8.04" or future multi-digit
versions; replace the fixed-length check with a regex match that accepts one or
more digits, a dot, then one or more digits (e.g. using
regexp.MustCompile(`^\d+\.\d+$`).MatchString) to identify version tokens
robustly and return the first matching part.

In `@cmd/osv-processor/sync-and-detect-changes.sh`:
- Around line 60-74: The script currently relies on relative paths and cd'ing
into "$REPO_DIR", then writing "../changed_files_today.txt" and
"../changed_files_yesterday.txt", which breaks when invoked from other
locations; change it to compute an absolute OUTPUT_DIR based on the script
location (e.g., derive from $0) and/or make REPO_DIR absolute, avoid changing
working directories, and write outputs to
"${OUTPUT_DIR}/changed_files_today.txt" and
"${OUTPUT_DIR}/changed_files_yesterday.txt" instead of "../...", updating uses
of TODAY_UTC, YESTERDAY_UTC, TODAY_COUNT and YESTERDAY_COUNT to read from those
absolute files.

In `@cmd/osv-processor/transforms.go`:
- Around line 4-29: The function transformVuln currently doesn't use the cveID
and vuln parameters which can trigger unparam/unused warnings; update the
signature or annotate it: either rename parameters to use blank identifier
(e.g., packageName, _ string, _ *ProcessedVuln) to signal intentional non-use,
or add a function-level nolint directive (e.g., //nolint:unparam // cveID and
vuln reserved for future CVE-specific transformations) above transformVuln to
suppress the warning while keeping the current names for documentation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 94cd94e0-b7af-4e23-b4fe-4a85c561c89d

📥 Commits

Reviewing files that changed from the base of the PR and between 795f63b and 2f1a052.

📒 Files selected for processing (3)
  • cmd/osv-processor/main.go
  • cmd/osv-processor/sync-and-detect-changes.sh
  • cmd/osv-processor/transforms.go

Comment thread cmd/osv-processor/main.go Outdated
Comment thread cmd/osv-processor/sync-and-detect-changes.sh Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
cmd/osv-processor/main.go (1)

414-428: ⚠️ Potential issue | 🟠 Major

Propagate Close() errors when finishing the gzip stream.

A nil return from encoder.Encode does not mean the gzip footer and final file buffers were written successfully. Deferring both closes drops those flush errors, so disk-full or short-write cases can still leave a corrupt artifact reported as success.

🐛 Proposed fix
 func writeArtifact(path string, artifact *ArtifactData) error {
 	file, err := os.Create(path)
 	if err != nil {
 		return err
 	}
-	defer file.Close()
 
 	gzWriter := gzip.NewWriter(file)
-	defer gzWriter.Close()
-
 	encoder := json.NewEncoder(gzWriter)
 	encoder.SetIndent("", "  ")
 
-	return encoder.Encode(artifact)
+	if err := encoder.Encode(artifact); err != nil {
+		_ = gzWriter.Close()
+		_ = file.Close()
+		return err
+	}
+	if err := gzWriter.Close(); err != nil {
+		_ = file.Close()
+		return err
+	}
+	return file.Close()
 }
In Go's standard library, can `compress/gzip.Writer.Close()` or `os.File.Close()` return the final flush/write error when producing an output file, and does that mean those errors must be checked instead of ignored in a deferred close?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/osv-processor/main.go` around lines 414 - 428, The writeArtifact function
currently defers gzWriter.Close() and file.Close(), which can swallow final
write/flush errors; change it to capture and return close errors instead of
deferring both: call encoder.Encode(artifact) and store its error, then
explicitly close the gzip writer (gzipWriter.Close()) and check its error, then
close the file (file.Close()) and check its error, returning the first non-nil
error among encodeErr, gzipCloseErr, fileCloseErr (ensure gzip.Close is called
before file.Close so flush happens while file is open); reference the
writeArtifact function, the encoder.Encode call, gzWriter.Close, and file.Close
when making this change.
🧹 Nitpick comments (2)
cmd/osv-processor/sync-and-detect-changes.sh (1)

16-19: Anchor repo and output paths to a stable base directory.

These paths all depend on the caller’s current working directory, so invoking the script from a different location changes where the repo is cloned, where the changed-file lists are written, and what the emitted OSV_DIR / CHANGED_FILES_* values mean. For the cron flow, resolve one base dir up front and build all paths from it.

Also applies to: 64-70, 88-90

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/osv-processor/sync-and-detect-changes.sh` around lines 16 - 19, The
script uses relative paths (REPO_DIR, OSV_DIR, CHANGED_FILES_*, etc.) that vary
with the caller's CWD; resolve a stable BASE_DIR at the top (e.g., realpath of
the script's parent or a fixed workspace path) and rebuild all path variables
from that base so every path is anchored consistently; update REPO_DIR, OSV_DIR,
any CHANGED_FILES_* variables and any other file/directory constants referenced
around REPO_URL/REPO_DIR/DAYS_TO_KEEP/DELTA_DAYS and in the sections noted
(lines ~64–70 and ~88–90) to be BASE_DIR + relative subpaths rather than plain
relative names.
cmd/osv-processor/transforms.go (1)

5-8: Trim the commented-out rule templates.

The inactive if cveID == ... examples read like dead code and will drift. Keep a short extension-point comment here, then add real rules when the first CVE-specific override is needed.

Also applies to: 21-26

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/osv-processor/transforms.go` around lines 5 - 8, Remove the commented-out
CVE rule template lines (the inactive "if cveID == 'CVE-YYYY-XXXXX' { return
nil, nil }" block) and replace them with a single brief extension-point comment
such as "/* Add CVE-specific overrides here when needed */"; apply the same
change for the similar commented template at lines around the second occurrence
(the block referencing cveID and returning nil, nil). Keep references to the
cveID override concept so future additions are obvious but avoid leaving dead
example code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/osv-processor/main.go`:
- Around line 268-270: The artifact filenames are being generated with
time.Now() (local wall clock) which can diverge from the UTC date window used to
compute changed-file lists; replace time.Now() calls in filename construction
(e.g., the outputFile fmt.Sprintf that currently uses time.Now()) and the
related filename code block (the other occurrences around the delta/name
generation) with a single explicit UTC run date value (the same time.Time used
to compute the changed-file lists), formatting it with
runDate.UTC().Format("2006-01-02") and for prior-day names use
runDate.AddDate(0,0,-1).UTC().Format(...), so all filenames consistently derive
from that one UTC runDate rather than the host local clock.
- Around line 144-151: relPath from filepath.Rel can already include an
"osv/cve" prefix, causing double-prefixing when you unconditionally do
fullRelPath := filepath.Join("osv/cve", relPath); normalize relPath first by
cleaning it (filepath.Clean) and removing any leading "osv/cve" path segment
(e.g. check strings.HasPrefix(relPathClean,
filepath.Join("osv","cve")+string(os.PathSeparator)) or equals
filepath.Join("osv","cve") and strip that prefix) before constructing
fullRelPath, then use that normalized fullRelPath for the todayCVEFiles and
yesterdayCVEFiles lookups (references: filepath.Rel, relPath, fullRelPath,
todayCVEFiles, yesterdayCVEFiles, generateTodayDeltas, generateYesterdayDeltas).

---

Duplicate comments:
In `@cmd/osv-processor/main.go`:
- Around line 414-428: The writeArtifact function currently defers
gzWriter.Close() and file.Close(), which can swallow final write/flush errors;
change it to capture and return close errors instead of deferring both: call
encoder.Encode(artifact) and store its error, then explicitly close the gzip
writer (gzipWriter.Close()) and check its error, then close the file
(file.Close()) and check its error, returning the first non-nil error among
encodeErr, gzipCloseErr, fileCloseErr (ensure gzip.Close is called before
file.Close so flush happens while file is open); reference the writeArtifact
function, the encoder.Encode call, gzWriter.Close, and file.Close when making
this change.

---

Nitpick comments:
In `@cmd/osv-processor/sync-and-detect-changes.sh`:
- Around line 16-19: The script uses relative paths (REPO_DIR, OSV_DIR,
CHANGED_FILES_*, etc.) that vary with the caller's CWD; resolve a stable
BASE_DIR at the top (e.g., realpath of the script's parent or a fixed workspace
path) and rebuild all path variables from that base so every path is anchored
consistently; update REPO_DIR, OSV_DIR, any CHANGED_FILES_* variables and any
other file/directory constants referenced around
REPO_URL/REPO_DIR/DAYS_TO_KEEP/DELTA_DAYS and in the sections noted (lines
~64–70 and ~88–90) to be BASE_DIR + relative subpaths rather than plain relative
names.

In `@cmd/osv-processor/transforms.go`:
- Around line 5-8: Remove the commented-out CVE rule template lines (the
inactive "if cveID == 'CVE-YYYY-XXXXX' { return nil, nil }" block) and replace
them with a single brief extension-point comment such as "/* Add CVE-specific
overrides here when needed */"; apply the same change for the similar commented
template at lines around the second occurrence (the block referencing cveID and
returning nil, nil). Keep references to the cveID override concept so future
additions are obvious but avoid leaving dead example code.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6e24022f-a121-475a-a7a9-42d467bc177d

📥 Commits

Reviewing files that changed from the base of the PR and between 795f63b and 2f1a052.

📒 Files selected for processing (3)
  • cmd/osv-processor/main.go
  • cmd/osv-processor/sync-and-detect-changes.sh
  • cmd/osv-processor/transforms.go

Comment thread cmd/osv-processor/main.go Outdated
Comment thread cmd/osv-processor/main.go Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 20, 2026

Codecov Report

❌ Patch coverage is 72.93233% with 72 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.56%. Comparing base (3c300e9) to head (0244bf1).
⚠️ Report is 35 commits behind head on main.

Files with missing lines Patch % Lines
cmd/osv-processor/main.go 72.41% 48 Missing and 24 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #42203      +/-   ##
==========================================
+ Coverage   66.51%   66.56%   +0.05%     
==========================================
  Files        2526     2523       -3     
  Lines      202660   202799     +139     
  Branches     9026     9030       +4     
==========================================
+ Hits       134793   134989     +196     
+ Misses      55694    55608      -86     
- Partials    12173    12202      +29     
Flag Coverage Δ
backend 68.31% <72.93%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ksykulev ksykulev marked this pull request as draft March 20, 2026 23:48
@ksykulev
Copy link
Copy Markdown
Contributor Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 21, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
cmd/osv-processor/main.go (3)

421-450: Remove redundant defer statements.

The explicit Close() calls (lines 440-447) properly handle error checking, making the defer statements on lines 426 and 429 redundant. While double-closing is harmless, removing the defers improves clarity.

♻️ Proposed cleanup
 func writeArtifact(path string, artifact *ArtifactData) error {
 	file, err := os.Create(path)
 	if err != nil {
 		return err
 	}
-	defer file.Close()

 	gzWriter := gzip.NewWriter(file)
-	defer gzWriter.Close()

 	encoder := json.NewEncoder(gzWriter)
 	encoder.SetIndent("", "  ")

 	if err := encoder.Encode(artifact); err != nil {
 		_ = gzWriter.Close()
 		_ = file.Close()
 		return err
 	}

 	if err := gzWriter.Close(); err != nil {
 		_ = file.Close()
 		return err
 	}

 	if err := file.Close(); err != nil {
 		return err
 	}

 	return nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/osv-processor/main.go` around lines 421 - 450, In writeArtifact, remove
the redundant defer calls that close file and gzWriter (the defer file.Close()
and defer gzWriter.Close()) because the function already performs explicit
gzWriter.Close() and file.Close() with error handling; keep the explicit Close()
calls and their error checks so resources are closed deterministically and
errors are propagated.

212-246: Multiple time.Now() calls may produce inconsistent timestamps.

time.Now().UTC() is called separately for each artifact's Generated field (lines 212, 226, 241). If processing takes significant time, artifacts may have slightly different timestamps. Consider capturing a single run timestamp at the start.

♻️ Proposed fix: Use single run timestamp

Add near the top of main():

runTime := time.Now().UTC()

Then replace each time.Now().UTC().Format(time.RFC3339) with:

-Generated:       time.Now().UTC().Format(time.RFC3339),
+Generated:       runTime.Format(time.RFC3339),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/osv-processor/main.go` around lines 212 - 246, The ArtifactData Generated
timestamps are created with multiple time.Now().UTC() calls (seen when
initializing ArtifactData for artifacts, todayArtifacts, and
yesterdayArtifacts), causing inconsistent timestamps; capture a single run
timestamp (e.g., runTime := time.Now().UTC()) near the start of main() and
replace all occurrences of time.Now().UTC().Format(time.RFC3339) used for
ArtifactData.Generated with runTime.Format(time.RFC3339) so artifacts,
todayArtifacts, and yesterdayArtifacts share the same Generated value.

269-271: Filename dates should derive from a single UTC run timestamp.

The artifact filenames use time.Now().UTC() at write time. If processing crosses midnight UTC, full artifacts and delta artifacts could have mismatched dates (e.g., full dated 2026-03-21, delta dated 2026-03-22). Use the same runTime variable suggested above for filenames.

♻️ Proposed fix: Use consistent date for filenames
 outputFile := filepath.Join(*outputDir, fmt.Sprintf("osv-ubuntu-%s-%s.json.gz",
     strings.ReplaceAll(ver, ".", ""),
-    time.Now().UTC().Format("2006-01-02")))
+    runTime.Format("2006-01-02")))

Apply similar changes to delta artifact filenames (lines 283, 290, 302, 309).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/osv-processor/main.go` around lines 269 - 271, The filename generation
currently uses time.Now().UTC() at write time (see outputFile created with
filepath.Join and fmt.Sprintf), which can cause inconsistent dates if the run
crosses UTC midnight; change the code to use a single runTime variable (e.g.,
runTime := time.Now().UTC() set once at startup) and replace all occurrences of
time.Now().UTC().Format("2006-01-02") in the filename fmt.Sprintf calls
(including the outputFile for "osv-ubuntu-..." and the delta artifact filename
constructions referenced in the diff) to use runTime.Format("2006-01-02") so all
artifacts share the same UTC date.
cmd/osv-processor/sync-and-detect-changes.sh (1)

63-69: Consider handling empty git log output gracefully.

When there are no commits in the time window (e.g., no changes today), git log outputs nothing and sort -u creates an empty file. This is technically correct, but consider adding a brief log message when the resulting file is empty to aid debugging.

♻️ Optional: Add informational log for empty results
 git log --since="${TODAY_UTC}T00:00:00Z" --name-only --pretty="" -- osv/cve \
     | sort -u > "../changed_files_today.txt"

+if [ ! -s "../changed_files_today.txt" ]; then
+    echo "  Note: No CVE files changed today"
+fi
+
 # Get files changed yesterday (from midnight yesterday to midnight today UTC)
 YESTERDAY_UTC=$(date -u -v-1d +%Y-%m-%d 2>/dev/null || date -u -d "yesterday" +%Y-%m-%d)
 git log --since="${YESTERDAY_UTC}T00:00:00Z" --until="${TODAY_UTC}T00:00:00Z" --name-only --pretty="" -- osv/cve \
     | sort -u > "../changed_files_yesterday.txt"
+
+if [ ! -s "../changed_files_yesterday.txt" ]; then
+    echo "  Note: No CVE files changed yesterday"
+fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/osv-processor/sync-and-detect-changes.sh` around lines 63 - 69, After
generating "../changed_files_today.txt" and "../changed_files_yesterday.txt"
from the git log commands that use TODAY_UTC and YESTERDAY_UTC, add a simple
existence/size check for each file (e.g., test -s "../changed_files_today.txt")
and emit a brief informational log when the file is empty so it's clear there
were no commits in that window; update the script around the git log blocks that
produce those files to perform the check and print concise messages referencing
the corresponding filename and time window.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@cmd/osv-processor/main.go`:
- Around line 421-450: In writeArtifact, remove the redundant defer calls that
close file and gzWriter (the defer file.Close() and defer gzWriter.Close())
because the function already performs explicit gzWriter.Close() and file.Close()
with error handling; keep the explicit Close() calls and their error checks so
resources are closed deterministically and errors are propagated.
- Around line 212-246: The ArtifactData Generated timestamps are created with
multiple time.Now().UTC() calls (seen when initializing ArtifactData for
artifacts, todayArtifacts, and yesterdayArtifacts), causing inconsistent
timestamps; capture a single run timestamp (e.g., runTime := time.Now().UTC())
near the start of main() and replace all occurrences of
time.Now().UTC().Format(time.RFC3339) used for ArtifactData.Generated with
runTime.Format(time.RFC3339) so artifacts, todayArtifacts, and
yesterdayArtifacts share the same Generated value.
- Around line 269-271: The filename generation currently uses time.Now().UTC()
at write time (see outputFile created with filepath.Join and fmt.Sprintf), which
can cause inconsistent dates if the run crosses UTC midnight; change the code to
use a single runTime variable (e.g., runTime := time.Now().UTC() set once at
startup) and replace all occurrences of time.Now().UTC().Format("2006-01-02") in
the filename fmt.Sprintf calls (including the outputFile for "osv-ubuntu-..."
and the delta artifact filename constructions referenced in the diff) to use
runTime.Format("2006-01-02") so all artifacts share the same UTC date.

In `@cmd/osv-processor/sync-and-detect-changes.sh`:
- Around line 63-69: After generating "../changed_files_today.txt" and
"../changed_files_yesterday.txt" from the git log commands that use TODAY_UTC
and YESTERDAY_UTC, add a simple existence/size check for each file (e.g., test
-s "../changed_files_today.txt") and emit a brief informational log when the
file is empty so it's clear there were no commits in that window; update the
script around the git log blocks that produce those files to perform the check
and print concise messages referencing the corresponding filename and time
window.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3027ea95-74b9-4c9d-8f70-9df587cd3f4d

📥 Commits

Reviewing files that changed from the base of the PR and between 795f63b and aa061d0.

📒 Files selected for processing (3)
  • cmd/osv-processor/main.go
  • cmd/osv-processor/sync-and-detect-changes.sh
  • cmd/osv-processor/transforms.go

@ksykulev ksykulev marked this pull request as ready for review March 21, 2026 01:28
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cmd/osv-processor/main.go Outdated
Comment thread cmd/osv-processor/sync-and-detect-changes.sh Outdated
Comment thread cmd/osv-processor/sync-and-detect-changes.sh Outdated
Comment thread cmd/osv-processor/sync-and-detect-changes.sh Outdated
Comment thread cmd/osv-processor/main.go
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cmd/osv-processor/sync-and-detect-changes.sh Outdated
Comment thread cmd/osv-processor/sync-and-detect-changes.sh
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cmd/osv-processor/sync-and-detect-changes.sh Outdated
Comment thread cmd/osv-processor/main.go
Comment thread cmd/osv-processor/main_test.go Outdated
@getvictor getvictor self-assigned this Mar 23, 2026
getvictor
getvictor previously approved these changes Mar 24, 2026
Copy link
Copy Markdown
Member

@getvictor getvictor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. I left a few comments, but they are non-blocking and can be addressed in subsequent PRs.

Comment thread cmd/osv-processor/main.go
Comment on lines +74 to +75
changedFilesToday := flag.String("changed-files-today", "", "Path to file containing CVE files changed today (generates today's deltas)")
changedFilesYesterday := flag.String("changed-files-yesterday", "", "Path to file containing CVE files changed yesterday (generates yesterday's deltas)")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tool doesn't actually care about calendar days, right? So maybe something like this would be better for these switches: --delta-list / --prior-delta-list

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically no. However later in the method I do hard code the date when generating the artifacts. For example
osv-ubuntu-2204-delta-2024-01-03.json.gz

So today's date and yesterday's date are baked into the script. Changing the names of the params to --delta-list / --prior-delta-list would allow us to generify the timeframe, but then we would probably have to add additional parameters to modify the dates of the artifacts, or remove the dates entirely.

package main

// transformVuln applies transformations and filters to OSV vulnerability data.
func transformVuln(packageName, cveID string, vuln *ProcessedVuln) (packages []string, modifiedVuln *ProcessedVuln) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you want to centralize transforms somehow? So that there is one location (package?) we can go to where we see and pick the right transform to use? Not for this PR. Just asking.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Ugh yes, that would be great. We have so many places we do some sort of transformation.
Closely related is transformVuln in server/vulnerabilities/nvd/sync/cve_syncer.go

But then we have an assortment of other ones that run during vulnerability scanning
softwareTransformers in server/vulnerabilities/nvd/cpe.go
server/vulnerabilities/nvd/cpe_translations.json
expandCPEAliases in server/vulnerabilities/nvd/cve.go
GetKnownNVDBugRules in server/vulnerabilities/nvd/cpe_matching_rules.go
getCVEMatchingRules in server/vulnerabilities/customcve/matching_rules.go

And this is just adding to the mess. It would be nice to think about how to organize these better.

Comment thread cmd/osv-processor/main.go
Vulnerabilities map[string][]ProcessedVuln `json:"vulnerabilities"`
}

func main() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This main loop doesn't have a test. I recommend refactoring it into 1 or more functions and adding some basic tests for that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a bunch of tests in cmd/osv-processor/main_test.go for the assorted methods used in main().

Do you want me to extract even more out of main? I think trying to create a full integration test for main with all the file system jazz going on in there would be a lot of work / stubbing methods.

Happy to give it a try if you think it's worth it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is up to you. Codecov says 21% coverage, which is pretty low.

cmd/osv-processor/main.go 21.68% 194 Missing and 1 partial ⚠️

cmd/osv-processor/main.go 21.68% 194 Missing and 1 partial ⚠️

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a little better.

Files with missing lines Patch % Lines
cmd/osv-processor/main.go 72.41% 48 Missing and 24 partials ⚠️

I extracted the functionality of main and put it into a run method that is more easily tested. Then added some integration tests for the run method.

@ksykulev ksykulev merged commit 8eaecfc into main Mar 25, 2026
48 checks passed
@ksykulev ksykulev deleted the 41571-osv-cron branch March 25, 2026 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OSV feed Ubuntu create vulnerabilities repo cron sync

3 participants