OSV artifact generation for use in vulnerabilities repository#42203
OSV artifact generation for use in vulnerabilities repository#42203
Conversation
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
1 similar comment
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Pull request overview
Adds a new cmd/osv-processor utility to convert Canonical Ubuntu OSV CVE JSON into Fleet-friendly, version-scoped vulnerability artifacts (including delta artifacts) intended for syncing into a vulnerabilities repository.
Changes:
- Introduces a Go-based processor that walks OSV CVE JSON files, groups vulns per Ubuntu version and package, and writes compressed JSON artifacts (full + optional “today/yesterday” deltas).
- Adds a shell script to shallow-sync Canonical’s
ubuntu-security-noticesrepo and emit “changed files” lists for delta generation. - Adds a transformation hook to expand/override package mappings and CVE-specific adjustments.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| cmd/osv-processor/main.go | Core artifact generation (full + delta), OSV parsing, version/CVE extraction, gzip JSON writing. |
| cmd/osv-processor/transforms.go | Transformation/filter hook for package expansion and CVE-specific tweaks. |
| cmd/osv-processor/sync-and-detect-changes.sh | Shallow clone/fetch Canonical OSV repo and produce changed-file manifests for delta builds. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
WalkthroughAdds a new OSV processing tool: a Go CLI ( 🚥 Pre-merge checks | ✅ 2 | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip You can validate your CodeRabbit configuration file in your editor.If your editor has YAML language server, you can enable auto-completion and validation by adding |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (5)
cmd/osv-processor/main.go (3)
206-246: Multipletime.Now()calls may cause timestamp inconsistencies across artifacts.
time.Now().UTC()is called repeatedly inside the loop for theGeneratedfield. If processing takes significant time, artifacts created later will have different timestamps. Consider capturing the generation time once at the start ofmain()for consistency.💡 Optional: Use consistent timestamp
+ generationTime := time.Now().UTC() + generationTimeStr := generationTime.Format(time.RFC3339) + err := filepath.Walk(*inputDir, func(path string, info os.FileInfo, err error) error { // ... for _, pkg := range packages { if _, exists := artifacts[ubuntuVer]; !exists { artifacts[ubuntuVer] = &ArtifactData{ SchemaVersion: "1.0", UbuntuVersion: ubuntuVer, - Generated: time.Now().UTC().Format(time.RFC3339), + Generated: generationTimeStr, Vulnerabilities: make(map[string][]ProcessedVuln), } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/osv-processor/main.go` around lines 206 - 246, Capture a single generation timestamp once (e.g., generationTime := time.Now().UTC().Format(time.RFC3339)) near the start of main() and use that value when constructing ArtifactData instead of calling time.Now().UTC().Format(time.RFC3339) inline; update the three spots that create ArtifactData for artifacts[ubuntuVer], todayArtifacts[ubuntuVer], and yesterdayArtifacts[ubuntuVer] so their Generated field is set to generationTime for consistent timestamps across all ArtifactData instances.
144-153: Path separator mismatch may cause delta detection failures on Windows.
filepath.Joinandfilepath.Reluse OS-specific separators, but the changed files fromgit loguse forward slashes. On Windows, this would cause the map lookup to fail (e.g.,osv\cve\file.jsonvsosv/cve/file.json).If Windows support is needed, normalize to forward slashes:
♻️ Proposed fix: Normalize path separators
if generateTodayDeltas || generateYesterdayDeltas { relPath, err := filepath.Rel(*inputDir, path) if err == nil { - fullRelPath := filepath.Join("osv/cve", relPath) + // Normalize to forward slashes for consistency with git output + fullRelPath := "osv/cve/" + filepath.ToSlash(relPath) if generateTodayDeltas { inToday = todayCVEFiles[fullRelPath] }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/osv-processor/main.go` around lines 144 - 153, The delta lookup fails on Windows because filepath.Join (and filepath.Rel) produce OS-specific separators while the git-derived keys use forward slashes; update the code around relPath and fullRelPath so fullRelPath is normalized to use forward slashes before looking up todayCVEFiles/yesterdayCVEFiles (e.g., build the prefix with "osv/cve" and then convert any backslashes in relPath to '/' or use path.Join from the path package which always uses forward slashes) so the map lookups for generateTodayDeltas/generateYesterdayDeltas succeed cross-platform.
357-368: Version extraction may miss non-standard version formats.The check
len(part) == 5 && strings.Contains(part, ".")works for versions like24.04but would miss:
- Older Ubuntu versions with shorter formats (e.g.,
8.04)- Future versions if Ubuntu ever exceeds
99.99Given the PR mentions Ubuntu 14.04 as the oldest version, this is likely fine for current use, but consider a more robust regex pattern for future-proofing.
💡 Optional: More robust version detection
+import "regexp" + +var ubuntuVersionRegex = regexp.MustCompile(`^\d{1,2}\.\d{2}$`) + func extractUbuntuVersion(ecosystem string) string { - // Example: "Ubuntu:24.04:LTS" -> "24.04" - // Example: "Ubuntu:Pro:22.04:LTS" -> "22.04" parts := strings.Split(ecosystem, ":") for _, part := range parts { - // Look for version pattern like "24.04", "22.04", "20.04" - if len(part) == 5 && strings.Contains(part, ".") { + if ubuntuVersionRegex.MatchString(part) { return part } } return "" }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/osv-processor/main.go` around lines 357 - 368, The extractUbuntuVersion function currently checks parts with len(part) == 5 which misses versions like "8.04" or future multi-digit versions; replace the fixed-length check with a regex match that accepts one or more digits, a dot, then one or more digits (e.g. using regexp.MustCompile(`^\d+\.\d+$`).MatchString) to identify version tokens robustly and return the first matching part.cmd/osv-processor/sync-and-detect-changes.sh (1)
60-74: Working directory coupling may cause issues if script is invoked from different locations.The script writes output files using relative paths (
../changed_files_today.txt) and changes directories multiple times. This coupling assumes the script is always run from the same location.Consider deriving an absolute base path from the script's location for more robust operation:
♻️ Suggested improvement: Use script directory as base
+# Get script directory for consistent path resolution +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +OUTPUT_DIR="${SCRIPT_DIR}" + # Configuration REPO_URL="https://github.com/canonical/ubuntu-security-notices.git" -REPO_DIR="ubuntu-security-notices" +REPO_DIR="${SCRIPT_DIR}/ubuntu-security-notices"Then use
${OUTPUT_DIR}/changed_files_today.txtinstead of../changed_files_today.txt.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/osv-processor/sync-and-detect-changes.sh` around lines 60 - 74, The script currently relies on relative paths and cd'ing into "$REPO_DIR", then writing "../changed_files_today.txt" and "../changed_files_yesterday.txt", which breaks when invoked from other locations; change it to compute an absolute OUTPUT_DIR based on the script location (e.g., derive from $0) and/or make REPO_DIR absolute, avoid changing working directories, and write outputs to "${OUTPUT_DIR}/changed_files_today.txt" and "${OUTPUT_DIR}/changed_files_yesterday.txt" instead of "../...", updating uses of TODAY_UTC, YESTERDAY_UTC, TODAY_COUNT and YESTERDAY_COUNT to read from those absolute files.cmd/osv-processor/transforms.go (1)
4-29: Unused parameters may trigger linter warnings.The
cveIDandvulnparameters are currently only referenced in commented-out code. While the comments illustrate intended extension points, unused parameters may trigger linter warnings (e.g.,ineffassign,unparam).Consider either:
- Prefixing with
_to indicate intentional non-use:_ string, _ *ProcessedVuln- Adding a
//nolintdirective if you prefer to keep the parameter names for documentation💡 Optional: Silence unused parameter warnings
-func transformVuln(packageName, cveID string, vuln *ProcessedVuln) (packages []string, modifiedVuln *ProcessedVuln) { +func transformVuln(packageName, _ string, _ *ProcessedVuln) (packages []string, modifiedVuln *ProcessedVuln) {Or keep names with a nolint directive:
//nolint:unparam // cveID and vuln reserved for future CVE-specific transformations func transformVuln(packageName, cveID string, vuln *ProcessedVuln) (packages []string, modifiedVuln *ProcessedVuln) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/osv-processor/transforms.go` around lines 4 - 29, The function transformVuln currently doesn't use the cveID and vuln parameters which can trigger unparam/unused warnings; update the signature or annotate it: either rename parameters to use blank identifier (e.g., packageName, _ string, _ *ProcessedVuln) to signal intentional non-use, or add a function-level nolint directive (e.g., //nolint:unparam // cveID and vuln reserved for future CVE-specific transformations) above transformVuln to suppress the warning while keeping the current names for documentation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/osv-processor/main.go`:
- Around line 414-428: The function writeArtifact currently defers
gzWriter.Close() which ignores any error from closing the gzip writer; change
the cleanup to explicitly close gzWriter and file and return any Close errors:
call encoder.Encode(artifact) first, then call gzWriter.Close() and if it
returns an error return that error (or wrap/append it), then call file.Close()
and return any file.Close() error (or the prior error if present). Update
writeArtifact to remove the deferred gzWriter.Close() and ensure both
gzip.Writer.Close() and os.File.Close() errors are checked and returned
(referencing writeArtifact, gzWriter.Close, file.Close, and encoder.Encode).
In `@cmd/osv-processor/sync-and-detect-changes.sh`:
- Around line 18-19: The variable DELTA_DAYS is defined but never used; either
remove the unused DELTA_DAYS definition or use it to drive the delta-generation
loop instead of the current hardcoded "today + yesterday" logic—update the
script's delta generation section to iterate DELTA_DAYS times (e.g., for i in
0..DELTA_DAYS-1) to generate deltas for N days, or simply delete the DELTA_DAYS
line if you prefer a simpler cleanup; keep DAYS_TO_KEEP unchanged if still used
elsewhere.
---
Nitpick comments:
In `@cmd/osv-processor/main.go`:
- Around line 206-246: Capture a single generation timestamp once (e.g.,
generationTime := time.Now().UTC().Format(time.RFC3339)) near the start of
main() and use that value when constructing ArtifactData instead of calling
time.Now().UTC().Format(time.RFC3339) inline; update the three spots that create
ArtifactData for artifacts[ubuntuVer], todayArtifacts[ubuntuVer], and
yesterdayArtifacts[ubuntuVer] so their Generated field is set to generationTime
for consistent timestamps across all ArtifactData instances.
- Around line 144-153: The delta lookup fails on Windows because filepath.Join
(and filepath.Rel) produce OS-specific separators while the git-derived keys use
forward slashes; update the code around relPath and fullRelPath so fullRelPath
is normalized to use forward slashes before looking up
todayCVEFiles/yesterdayCVEFiles (e.g., build the prefix with "osv/cve" and then
convert any backslashes in relPath to '/' or use path.Join from the path package
which always uses forward slashes) so the map lookups for
generateTodayDeltas/generateYesterdayDeltas succeed cross-platform.
- Around line 357-368: The extractUbuntuVersion function currently checks parts
with len(part) == 5 which misses versions like "8.04" or future multi-digit
versions; replace the fixed-length check with a regex match that accepts one or
more digits, a dot, then one or more digits (e.g. using
regexp.MustCompile(`^\d+\.\d+$`).MatchString) to identify version tokens
robustly and return the first matching part.
In `@cmd/osv-processor/sync-and-detect-changes.sh`:
- Around line 60-74: The script currently relies on relative paths and cd'ing
into "$REPO_DIR", then writing "../changed_files_today.txt" and
"../changed_files_yesterday.txt", which breaks when invoked from other
locations; change it to compute an absolute OUTPUT_DIR based on the script
location (e.g., derive from $0) and/or make REPO_DIR absolute, avoid changing
working directories, and write outputs to
"${OUTPUT_DIR}/changed_files_today.txt" and
"${OUTPUT_DIR}/changed_files_yesterday.txt" instead of "../...", updating uses
of TODAY_UTC, YESTERDAY_UTC, TODAY_COUNT and YESTERDAY_COUNT to read from those
absolute files.
In `@cmd/osv-processor/transforms.go`:
- Around line 4-29: The function transformVuln currently doesn't use the cveID
and vuln parameters which can trigger unparam/unused warnings; update the
signature or annotate it: either rename parameters to use blank identifier
(e.g., packageName, _ string, _ *ProcessedVuln) to signal intentional non-use,
or add a function-level nolint directive (e.g., //nolint:unparam // cveID and
vuln reserved for future CVE-specific transformations) above transformVuln to
suppress the warning while keeping the current names for documentation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 94cd94e0-b7af-4e23-b4fe-4a85c561c89d
📒 Files selected for processing (3)
cmd/osv-processor/main.gocmd/osv-processor/sync-and-detect-changes.shcmd/osv-processor/transforms.go
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
cmd/osv-processor/main.go (1)
414-428:⚠️ Potential issue | 🟠 MajorPropagate
Close()errors when finishing the gzip stream.A nil return from
encoder.Encodedoes not mean the gzip footer and final file buffers were written successfully. Deferring both closes drops those flush errors, so disk-full or short-write cases can still leave a corrupt artifact reported as success.🐛 Proposed fix
func writeArtifact(path string, artifact *ArtifactData) error { file, err := os.Create(path) if err != nil { return err } - defer file.Close() gzWriter := gzip.NewWriter(file) - defer gzWriter.Close() - encoder := json.NewEncoder(gzWriter) encoder.SetIndent("", " ") - return encoder.Encode(artifact) + if err := encoder.Encode(artifact); err != nil { + _ = gzWriter.Close() + _ = file.Close() + return err + } + if err := gzWriter.Close(); err != nil { + _ = file.Close() + return err + } + return file.Close() }In Go's standard library, can `compress/gzip.Writer.Close()` or `os.File.Close()` return the final flush/write error when producing an output file, and does that mean those errors must be checked instead of ignored in a deferred close?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/osv-processor/main.go` around lines 414 - 428, The writeArtifact function currently defers gzWriter.Close() and file.Close(), which can swallow final write/flush errors; change it to capture and return close errors instead of deferring both: call encoder.Encode(artifact) and store its error, then explicitly close the gzip writer (gzipWriter.Close()) and check its error, then close the file (file.Close()) and check its error, returning the first non-nil error among encodeErr, gzipCloseErr, fileCloseErr (ensure gzip.Close is called before file.Close so flush happens while file is open); reference the writeArtifact function, the encoder.Encode call, gzWriter.Close, and file.Close when making this change.
🧹 Nitpick comments (2)
cmd/osv-processor/sync-and-detect-changes.sh (1)
16-19: Anchor repo and output paths to a stable base directory.These paths all depend on the caller’s current working directory, so invoking the script from a different location changes where the repo is cloned, where the changed-file lists are written, and what the emitted
OSV_DIR/CHANGED_FILES_*values mean. For the cron flow, resolve one base dir up front and build all paths from it.Also applies to: 64-70, 88-90
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/osv-processor/sync-and-detect-changes.sh` around lines 16 - 19, The script uses relative paths (REPO_DIR, OSV_DIR, CHANGED_FILES_*, etc.) that vary with the caller's CWD; resolve a stable BASE_DIR at the top (e.g., realpath of the script's parent or a fixed workspace path) and rebuild all path variables from that base so every path is anchored consistently; update REPO_DIR, OSV_DIR, any CHANGED_FILES_* variables and any other file/directory constants referenced around REPO_URL/REPO_DIR/DAYS_TO_KEEP/DELTA_DAYS and in the sections noted (lines ~64–70 and ~88–90) to be BASE_DIR + relative subpaths rather than plain relative names.cmd/osv-processor/transforms.go (1)
5-8: Trim the commented-out rule templates.The inactive
if cveID == ...examples read like dead code and will drift. Keep a short extension-point comment here, then add real rules when the first CVE-specific override is needed.Also applies to: 21-26
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/osv-processor/transforms.go` around lines 5 - 8, Remove the commented-out CVE rule template lines (the inactive "if cveID == 'CVE-YYYY-XXXXX' { return nil, nil }" block) and replace them with a single brief extension-point comment such as "/* Add CVE-specific overrides here when needed */"; apply the same change for the similar commented template at lines around the second occurrence (the block referencing cveID and returning nil, nil). Keep references to the cveID override concept so future additions are obvious but avoid leaving dead example code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/osv-processor/main.go`:
- Around line 268-270: The artifact filenames are being generated with
time.Now() (local wall clock) which can diverge from the UTC date window used to
compute changed-file lists; replace time.Now() calls in filename construction
(e.g., the outputFile fmt.Sprintf that currently uses time.Now()) and the
related filename code block (the other occurrences around the delta/name
generation) with a single explicit UTC run date value (the same time.Time used
to compute the changed-file lists), formatting it with
runDate.UTC().Format("2006-01-02") and for prior-day names use
runDate.AddDate(0,0,-1).UTC().Format(...), so all filenames consistently derive
from that one UTC runDate rather than the host local clock.
- Around line 144-151: relPath from filepath.Rel can already include an
"osv/cve" prefix, causing double-prefixing when you unconditionally do
fullRelPath := filepath.Join("osv/cve", relPath); normalize relPath first by
cleaning it (filepath.Clean) and removing any leading "osv/cve" path segment
(e.g. check strings.HasPrefix(relPathClean,
filepath.Join("osv","cve")+string(os.PathSeparator)) or equals
filepath.Join("osv","cve") and strip that prefix) before constructing
fullRelPath, then use that normalized fullRelPath for the todayCVEFiles and
yesterdayCVEFiles lookups (references: filepath.Rel, relPath, fullRelPath,
todayCVEFiles, yesterdayCVEFiles, generateTodayDeltas, generateYesterdayDeltas).
---
Duplicate comments:
In `@cmd/osv-processor/main.go`:
- Around line 414-428: The writeArtifact function currently defers
gzWriter.Close() and file.Close(), which can swallow final write/flush errors;
change it to capture and return close errors instead of deferring both: call
encoder.Encode(artifact) and store its error, then explicitly close the gzip
writer (gzipWriter.Close()) and check its error, then close the file
(file.Close()) and check its error, returning the first non-nil error among
encodeErr, gzipCloseErr, fileCloseErr (ensure gzip.Close is called before
file.Close so flush happens while file is open); reference the writeArtifact
function, the encoder.Encode call, gzWriter.Close, and file.Close when making
this change.
---
Nitpick comments:
In `@cmd/osv-processor/sync-and-detect-changes.sh`:
- Around line 16-19: The script uses relative paths (REPO_DIR, OSV_DIR,
CHANGED_FILES_*, etc.) that vary with the caller's CWD; resolve a stable
BASE_DIR at the top (e.g., realpath of the script's parent or a fixed workspace
path) and rebuild all path variables from that base so every path is anchored
consistently; update REPO_DIR, OSV_DIR, any CHANGED_FILES_* variables and any
other file/directory constants referenced around
REPO_URL/REPO_DIR/DAYS_TO_KEEP/DELTA_DAYS and in the sections noted (lines
~64–70 and ~88–90) to be BASE_DIR + relative subpaths rather than plain relative
names.
In `@cmd/osv-processor/transforms.go`:
- Around line 5-8: Remove the commented-out CVE rule template lines (the
inactive "if cveID == 'CVE-YYYY-XXXXX' { return nil, nil }" block) and replace
them with a single brief extension-point comment such as "/* Add CVE-specific
overrides here when needed */"; apply the same change for the similar commented
template at lines around the second occurrence (the block referencing cveID and
returning nil, nil). Keep references to the cveID override concept so future
additions are obvious but avoid leaving dead example code.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6e24022f-a121-475a-a7a9-42d467bc177d
📒 Files selected for processing (3)
cmd/osv-processor/main.gocmd/osv-processor/sync-and-detect-changes.shcmd/osv-processor/transforms.go
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #42203 +/- ##
==========================================
+ Coverage 66.51% 66.56% +0.05%
==========================================
Files 2526 2523 -3
Lines 202660 202799 +139
Branches 9026 9030 +4
==========================================
+ Hits 134793 134989 +196
+ Misses 55694 55608 -86
- Partials 12173 12202 +29
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
🧹 Nitpick comments (4)
cmd/osv-processor/main.go (3)
421-450: Remove redundantdeferstatements.The explicit
Close()calls (lines 440-447) properly handle error checking, making thedeferstatements on lines 426 and 429 redundant. While double-closing is harmless, removing the defers improves clarity.♻️ Proposed cleanup
func writeArtifact(path string, artifact *ArtifactData) error { file, err := os.Create(path) if err != nil { return err } - defer file.Close() gzWriter := gzip.NewWriter(file) - defer gzWriter.Close() encoder := json.NewEncoder(gzWriter) encoder.SetIndent("", " ") if err := encoder.Encode(artifact); err != nil { _ = gzWriter.Close() _ = file.Close() return err } if err := gzWriter.Close(); err != nil { _ = file.Close() return err } if err := file.Close(); err != nil { return err } return nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/osv-processor/main.go` around lines 421 - 450, In writeArtifact, remove the redundant defer calls that close file and gzWriter (the defer file.Close() and defer gzWriter.Close()) because the function already performs explicit gzWriter.Close() and file.Close() with error handling; keep the explicit Close() calls and their error checks so resources are closed deterministically and errors are propagated.
212-246: Multipletime.Now()calls may produce inconsistent timestamps.
time.Now().UTC()is called separately for each artifact'sGeneratedfield (lines 212, 226, 241). If processing takes significant time, artifacts may have slightly different timestamps. Consider capturing a single run timestamp at the start.♻️ Proposed fix: Use single run timestamp
Add near the top of
main():runTime := time.Now().UTC()Then replace each
time.Now().UTC().Format(time.RFC3339)with:-Generated: time.Now().UTC().Format(time.RFC3339), +Generated: runTime.Format(time.RFC3339),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/osv-processor/main.go` around lines 212 - 246, The ArtifactData Generated timestamps are created with multiple time.Now().UTC() calls (seen when initializing ArtifactData for artifacts, todayArtifacts, and yesterdayArtifacts), causing inconsistent timestamps; capture a single run timestamp (e.g., runTime := time.Now().UTC()) near the start of main() and replace all occurrences of time.Now().UTC().Format(time.RFC3339) used for ArtifactData.Generated with runTime.Format(time.RFC3339) so artifacts, todayArtifacts, and yesterdayArtifacts share the same Generated value.
269-271: Filename dates should derive from a single UTC run timestamp.The artifact filenames use
time.Now().UTC()at write time. If processing crosses midnight UTC, full artifacts and delta artifacts could have mismatched dates (e.g., full dated2026-03-21, delta dated2026-03-22). Use the samerunTimevariable suggested above for filenames.♻️ Proposed fix: Use consistent date for filenames
outputFile := filepath.Join(*outputDir, fmt.Sprintf("osv-ubuntu-%s-%s.json.gz", strings.ReplaceAll(ver, ".", ""), - time.Now().UTC().Format("2006-01-02"))) + runTime.Format("2006-01-02")))Apply similar changes to delta artifact filenames (lines 283, 290, 302, 309).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/osv-processor/main.go` around lines 269 - 271, The filename generation currently uses time.Now().UTC() at write time (see outputFile created with filepath.Join and fmt.Sprintf), which can cause inconsistent dates if the run crosses UTC midnight; change the code to use a single runTime variable (e.g., runTime := time.Now().UTC() set once at startup) and replace all occurrences of time.Now().UTC().Format("2006-01-02") in the filename fmt.Sprintf calls (including the outputFile for "osv-ubuntu-..." and the delta artifact filename constructions referenced in the diff) to use runTime.Format("2006-01-02") so all artifacts share the same UTC date.cmd/osv-processor/sync-and-detect-changes.sh (1)
63-69: Consider handling empty git log output gracefully.When there are no commits in the time window (e.g., no changes today),
git logoutputs nothing andsort -ucreates an empty file. This is technically correct, but consider adding a brief log message when the resulting file is empty to aid debugging.♻️ Optional: Add informational log for empty results
git log --since="${TODAY_UTC}T00:00:00Z" --name-only --pretty="" -- osv/cve \ | sort -u > "../changed_files_today.txt" +if [ ! -s "../changed_files_today.txt" ]; then + echo " Note: No CVE files changed today" +fi + # Get files changed yesterday (from midnight yesterday to midnight today UTC) YESTERDAY_UTC=$(date -u -v-1d +%Y-%m-%d 2>/dev/null || date -u -d "yesterday" +%Y-%m-%d) git log --since="${YESTERDAY_UTC}T00:00:00Z" --until="${TODAY_UTC}T00:00:00Z" --name-only --pretty="" -- osv/cve \ | sort -u > "../changed_files_yesterday.txt" + +if [ ! -s "../changed_files_yesterday.txt" ]; then + echo " Note: No CVE files changed yesterday" +fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/osv-processor/sync-and-detect-changes.sh` around lines 63 - 69, After generating "../changed_files_today.txt" and "../changed_files_yesterday.txt" from the git log commands that use TODAY_UTC and YESTERDAY_UTC, add a simple existence/size check for each file (e.g., test -s "../changed_files_today.txt") and emit a brief informational log when the file is empty so it's clear there were no commits in that window; update the script around the git log blocks that produce those files to perform the check and print concise messages referencing the corresponding filename and time window.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@cmd/osv-processor/main.go`:
- Around line 421-450: In writeArtifact, remove the redundant defer calls that
close file and gzWriter (the defer file.Close() and defer gzWriter.Close())
because the function already performs explicit gzWriter.Close() and file.Close()
with error handling; keep the explicit Close() calls and their error checks so
resources are closed deterministically and errors are propagated.
- Around line 212-246: The ArtifactData Generated timestamps are created with
multiple time.Now().UTC() calls (seen when initializing ArtifactData for
artifacts, todayArtifacts, and yesterdayArtifacts), causing inconsistent
timestamps; capture a single run timestamp (e.g., runTime := time.Now().UTC())
near the start of main() and replace all occurrences of
time.Now().UTC().Format(time.RFC3339) used for ArtifactData.Generated with
runTime.Format(time.RFC3339) so artifacts, todayArtifacts, and
yesterdayArtifacts share the same Generated value.
- Around line 269-271: The filename generation currently uses time.Now().UTC()
at write time (see outputFile created with filepath.Join and fmt.Sprintf), which
can cause inconsistent dates if the run crosses UTC midnight; change the code to
use a single runTime variable (e.g., runTime := time.Now().UTC() set once at
startup) and replace all occurrences of time.Now().UTC().Format("2006-01-02") in
the filename fmt.Sprintf calls (including the outputFile for "osv-ubuntu-..."
and the delta artifact filename constructions referenced in the diff) to use
runTime.Format("2006-01-02") so all artifacts share the same UTC date.
In `@cmd/osv-processor/sync-and-detect-changes.sh`:
- Around line 63-69: After generating "../changed_files_today.txt" and
"../changed_files_yesterday.txt" from the git log commands that use TODAY_UTC
and YESTERDAY_UTC, add a simple existence/size check for each file (e.g., test
-s "../changed_files_today.txt") and emit a brief informational log when the
file is empty so it's clear there were no commits in that window; update the
script around the git log blocks that produce those files to perform the check
and print concise messages referencing the corresponding filename and time
window.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 3027ea95-74b9-4c9d-8f70-9df587cd3f4d
📒 Files selected for processing (3)
cmd/osv-processor/main.gocmd/osv-processor/sync-and-detect-changes.shcmd/osv-processor/transforms.go
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
getvictor
left a comment
There was a problem hiding this comment.
Looks good overall. I left a few comments, but they are non-blocking and can be addressed in subsequent PRs.
| changedFilesToday := flag.String("changed-files-today", "", "Path to file containing CVE files changed today (generates today's deltas)") | ||
| changedFilesYesterday := flag.String("changed-files-yesterday", "", "Path to file containing CVE files changed yesterday (generates yesterday's deltas)") |
There was a problem hiding this comment.
This tool doesn't actually care about calendar days, right? So maybe something like this would be better for these switches: --delta-list / --prior-delta-list
There was a problem hiding this comment.
technically no. However later in the method I do hard code the date when generating the artifacts. For example
osv-ubuntu-2204-delta-2024-01-03.json.gz
So today's date and yesterday's date are baked into the script. Changing the names of the params to --delta-list / --prior-delta-list would allow us to generify the timeframe, but then we would probably have to add additional parameters to modify the dates of the artifacts, or remove the dates entirely.
| package main | ||
|
|
||
| // transformVuln applies transformations and filters to OSV vulnerability data. | ||
| func transformVuln(packageName, cveID string, vuln *ProcessedVuln) (packages []string, modifiedVuln *ProcessedVuln) { |
There was a problem hiding this comment.
Did you want to centralize transforms somehow? So that there is one location (package?) we can go to where we see and pick the right transform to use? Not for this PR. Just asking.
There was a problem hiding this comment.
👍 Ugh yes, that would be great. We have so many places we do some sort of transformation.
Closely related is transformVuln in server/vulnerabilities/nvd/sync/cve_syncer.go
But then we have an assortment of other ones that run during vulnerability scanning
softwareTransformers in server/vulnerabilities/nvd/cpe.go
server/vulnerabilities/nvd/cpe_translations.json
expandCPEAliases in server/vulnerabilities/nvd/cve.go
GetKnownNVDBugRules in server/vulnerabilities/nvd/cpe_matching_rules.go
getCVEMatchingRules in server/vulnerabilities/customcve/matching_rules.go
And this is just adding to the mess. It would be nice to think about how to organize these better.
| Vulnerabilities map[string][]ProcessedVuln `json:"vulnerabilities"` | ||
| } | ||
|
|
||
| func main() { |
There was a problem hiding this comment.
This main loop doesn't have a test. I recommend refactoring it into 1 or more functions and adding some basic tests for that.
There was a problem hiding this comment.
I have a bunch of tests in cmd/osv-processor/main_test.go for the assorted methods used in main().
Do you want me to extract even more out of main? I think trying to create a full integration test for main with all the file system jazz going on in there would be a lot of work / stubbing methods.
Happy to give it a try if you think it's worth it.
There was a problem hiding this comment.
It is up to you. Codecov says 21% coverage, which is pretty low.
| cmd/osv-processor/main.go | 21.68% | 194 Missing and 1 partial |
|---|
cmd/osv-processor/main.go 21.68% 194 Missing and 1 partial
There was a problem hiding this comment.
a little better.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| cmd/osv-processor/main.go | 72.41% | 48 Missing and 24 partials |
I extracted the functionality of main and put it into a run method that is more easily tested. Then added some integration tests for the run method.
Related issue: Resolves #41571
Full Artifacts:
Ubuntu 14.04: 901 KB
Ubuntu 16.04: 2.0 MB
Ubuntu 18.04: 4.3 MB
Ubuntu 20.04: 5.9 MB
Ubuntu 22.04: 5.6 MB
Ubuntu 24.04: 1.7 MB
Ubuntu 24.10: 4.4 KB
Ubuntu 25.04: 6.0 KB
Ubuntu 25.10: 207 KB
Total Size:
All artifacts (full + deltas): 31 MB (was 54 MB)
Full artifacts only: ~20 MB (was ~27 MB)
Delta artifacts: ~11 MB (was ~27 MB)
Testing
Summary by CodeRabbit