-
Notifications
You must be signed in to change notification settings - Fork 98
New APIs for all failures from a PR, and test failure messages from a PR #3133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
WalkthroughIntroduces PR test results API functionality by adding a new module to retrieve and serve test failure data from BigQuery, integrating it into the sippyserver with appropriate HTTP handlers and parameter validation for date ranges and PR identifiers. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dgoodwin The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (3)
pkg/api/prtestresults.go(1 hunks)pkg/sippyserver/server.go(2 hunks)pkg/util/param/param.go(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
pkg/api/prtestresults.gopkg/sippyserver/server.gopkg/util/param/param.go
🧬 Code graph analysis (1)
pkg/sippyserver/server.go (2)
pkg/api/prtestresults.go (1)
PrintPRTestResultsJSON(290-402)pkg/sippyserver/capabilities.go (1)
ComponentReadinessCapability(15-15)
🔇 Additional comments (5)
pkg/util/param/param.go (1)
29-29: LGTM!The new parameter validations follow the established patterns in this file. The
dateRegexpvalidates the YYYY-MM-DD format structurally, and semantic date validation (e.g., rejecting invalid dates like2024-99-99) is correctly handled bytime.Parsein the API handler.Also applies to: 44-46, 52-53
pkg/sippyserver/server.go (2)
1026-1032: LGTM!The handler follows the established pattern for BigQuery-dependent endpoints, with proper nil-check for
bigQueryClientand delegation to the API implementation.
2028-2033: LGTM!The endpoint registration is well-structured with appropriate capability requirements and clear documentation.
pkg/api/prtestresults.go (2)
39-75: LGTM!The function correctly queries both BigQuery tables and combines results. The logging is comprehensive for debugging.
212-240: LGTM!The query execution follows proper BigQuery iteration patterns with appropriate error handling.
| for i, field := range schema { | ||
| switch field.Name { | ||
| case "prowjob_build_id": | ||
| result.ProwJobBuildID = row[i].(string) | ||
| case "prowjob_name": | ||
| result.ProwJobName = row[i].(string) | ||
| case "prowjob_url": | ||
| if row[i] != nil { | ||
| result.ProwJobURL = row[i].(string) | ||
| } | ||
| case "pr_sha": | ||
| if row[i] != nil { | ||
| result.PRSha = row[i].(string) | ||
| } | ||
| case "prowjob_start": | ||
| if row[i] != nil { | ||
| // BigQuery returns civil.DateTime for DATETIME columns | ||
| civilDT := row[i].(civil.DateTime) | ||
| layout := "2006-01-02T15:04:05" | ||
| parsedTime, err := time.Parse(layout, civilDT.String()) | ||
| if err != nil { | ||
| return PRTestResult{}, errors.Wrap(err, "failed to parse prowjob_start") | ||
| } | ||
| result.ProwJobStart = parsedTime | ||
| } | ||
| case "test_name": | ||
| result.TestName = row[i].(string) | ||
| case "testsuite": | ||
| result.TestSuite = row[i].(string) | ||
| case "flaked": | ||
| result.Flaked = row[i].(bool) | ||
| case "success": | ||
| result.Success = row[i].(bool) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type assertions may panic on nil values.
Fields like prowjob_build_id, prowjob_name, test_name, testsuite, flaked, and success are type-asserted without nil checks. If BigQuery returns NULL for any of these columns, the code will panic at runtime.
Consider adding nil checks for all fields, consistent with how prowjob_url and pr_sha are handled:
case "prowjob_build_id":
- result.ProwJobBuildID = row[i].(string)
+ if row[i] != nil {
+ result.ProwJobBuildID = row[i].(string)
+ }
case "prowjob_name":
- result.ProwJobName = row[i].(string)
+ if row[i] != nil {
+ result.ProwJobName = row[i].(string)
+ }Apply similar changes to test_name, testsuite, flaked, and success.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In pkg/api/prtestresults.go around lines 250 to 284, several case branches
perform direct type assertions that will panic if row[i] is nil; make them
consistent with prowjob_url and pr_sha by checking row[i] != nil before
asserting. For string fields (prowjob_build_id, prowjob_name, test_name,
testsuite) wrap the assignment in an if row[i] != nil { result.Field =
row[i].(string) } else leave as empty string; for boolean fields (flaked,
success) likewise check if row[i] != nil before asserting and assign only when
non-nil (otherwise leave default false). Keep the existing prowjob_start parsing
as-is.
pkg/api/prtestresults.go
Outdated
| // Add one day to end_date to make it inclusive | ||
| endDate = endDate.AddDate(0, 0, 1) | ||
|
|
||
| // Validate date range | ||
| if endDate.Before(startDate) { | ||
| RespondWithJSON(http.StatusBadRequest, w, map[string]interface{}{ | ||
| "code": http.StatusBadRequest, | ||
| "message": "end_date must be after start_date", | ||
| }) | ||
| return | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Date validation order issue: adding a day before range check may allow invalid ranges.
The code adds 1 day to end_date (line 365) before validating that end_date is after start_date (line 368). This allows edge cases where the original end_date equals start_date minus 1 day to pass validation.
Example: start_date=2024-01-02, end_date=2024-01-01
- After line 365:
endDate = 2024-01-02 - Line 368:
endDate.Before(startDate)isfalse, validation passes - But the user intended an invalid range where end is before start.
Consider validating the original dates before adjusting:
+ // Validate date range before adjustment
+ if endDate.Before(startDate) {
+ RespondWithJSON(http.StatusBadRequest, w, map[string]interface{}{
+ "code": http.StatusBadRequest,
+ "message": "end_date must be after or equal to start_date",
+ })
+ return
+ }
+
// Add one day to end_date to make it inclusive
endDate = endDate.AddDate(0, 0, 1)
-
- // Validate date range
- if endDate.Before(startDate) {
- RespondWithJSON(http.StatusBadRequest, w, map[string]interface{}{
- "code": http.StatusBadRequest,
- "message": "end_date must be after start_date",
- })
- return
- }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Add one day to end_date to make it inclusive | |
| endDate = endDate.AddDate(0, 0, 1) | |
| // Validate date range | |
| if endDate.Before(startDate) { | |
| RespondWithJSON(http.StatusBadRequest, w, map[string]interface{}{ | |
| "code": http.StatusBadRequest, | |
| "message": "end_date must be after start_date", | |
| }) | |
| return | |
| } | |
| // Validate date range before adjustment | |
| if endDate.Before(startDate) { | |
| RespondWithJSON(http.StatusBadRequest, w, map[string]interface{}{ | |
| "code": http.StatusBadRequest, | |
| "message": "end_date must be after or equal to start_date", | |
| }) | |
| return | |
| } | |
| // Add one day to end_date to make it inclusive | |
| endDate = endDate.AddDate(0, 0, 1) |
🤖 Prompt for AI Agents
In pkg/api/prtestresults.go around lines 364 to 374, the code currently adds one
day to endDate before checking the date range which can make originally-invalid
ranges appear valid; to fix this, validate that the original endDate is not
before startDate first (or compare startDate to a copy of endDateBeforeAdjust :=
endDate), return the bad request if endDate < startDate, and only after that add
one day to endDate to make the range inclusive.
|
Scheduling required tests: |
|
@dgoodwin: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Starting to drift to a claude code focus for some of our tooling to help QSEs and others interpret and work with regressions and PRs.
The first pulls back all failures for a given PR, across both presubmits and /payload jobs.
Example:
[ { "prowjob_build_id": "2013067627370385408", "prowjob_name": "openshift-origin-30654-nightly-4.21-e2e-vsphere-ovn-techpreview-serial", "prowjob_url": "https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-origin-30654-nightly-4.21-e2e-vsphere-ovn-techpreview-serial/2013067627370385408", "pr_sha": "8deb29e6c5ffa7de214caac7c659575f502111bf", "prowjob_start": "2026-01-19T01:54:57Z", "test_name": "[Monitor:legacy-test-framework-invariants-pathological][sig-arch] events should not repeat pathologically", "test_suite": "openshift-tests", "success": false, "flaked": false, "failure_content": "4 events happened too frequently\n\nevent happened 21 times, something is wrong: node/ci-op-2bn7n163-32075-kwbbl-master-0 hmsg/f99f0db431 - reason/AddSigtermProtection Adding SIGTERM protection (06:16:55Z) result=reject \nevent happened 21 times, something is wrong: node/ci-op-2bn7n163-32075-kwbbl-master-1 hmsg/f99f0db431 - reason/AddSigtermProtection Adding SIGTERM protection (06:17:13Z) result=reject \nevent happened 23 times, something is wrong: node/ci-op-2bn7n163-32075-kwbbl-worker-0-vzklw hmsg/f99f0db431 - reason/AddSigtermProtection Adding SIGTERM protection (06:17:15Z) result=reject \nevent happened 21 times, something is wrong: node/ci-op-2bn7n163-32075-kwbbl-master-2 hmsg/f99f0db431 - reason/AddSigtermProtection Adding SIGTERM protection (06:17:30Z) result=reject" }, { "prowjob_build_id": "2013067628293132288", "prowjob_name": "openshift-origin-30654-nightly-4.21-e2e-vsphere-ovn-techpreview-serial", "prowjob_url": "https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-origin-30654-nightly-4.21-e2e-vsphere-ovn-techpreview-serial/2013067628293132288", "pr_sha": "8deb29e6c5ffa7de214caac7c659575f502111bf", "prowjob_start": "2026-01-19T01:54:57Z", "test_name": "[Monitor:legacy-test-framework-invariants-pathological][sig-arch] events should not repeat pathologically", "test_suite": "openshift-tests", "success": false, "flaked": false, "failure_content": "4 events happened too frequently\n\nevent happened 21 times, something is wrong: node/ci-op-2bn7n163-c3221-nkfbd-master-2 hmsg/f99f0db431 - reason/AddSigtermProtection Adding SIGTERM protection (06:25:36Z) result=reject \nevent happened 21 times, something is wrong: node/ci-op-2bn7n163-c3221-nkfbd-master-1 hmsg/f99f0db431 - reason/AddSigtermProtection Adding SIGTERM protection (06:24:59Z) result=reject \nevent happened 21 times, something is wrong: node/ci-op-2bn7n163-c3221-nkfbd-master-0 hmsg/f99f0db431 - reason/AddSigtermProtection Adding SIGTERM protection (06:25:17Z) result=reject \nevent happened 23 times, something is wrong: node/ci-op-2bn7n163-c3221-nkfbd-worker-0-m2wpd hmsg/f99f0db431 - reason/AddSigtermProtection Adding SIGTERM protection (06:25:18Z) result=reject" }, ...Goal is to have AI tooling scan the list of failures looking for patterns, optionally with a new test name it detected to make sure it was running successfully.
The second is a new version of the test outputs scanner using a test ID from CR, and a list of job runs. This will be used to see if a test is failing the same way in all failures in a test details report.
Example:
[ { "url": "https://prow.ci.openshift.org/view/gs/test-platform-results/logs/2013242091152347136", "output": "1 events happened too frequently\n\nevent happened 33 times, something is wrong: namespace/openshift-cluster-csi-drivers deployment/vmware-vsphere-csi-driver-operator hmsg/f08cbd1e38 - reason/DaemonSetUpdated Updated DaemonSet.apps/vmware-vsphere-csi-driver-node -n openshift-cluster-csi-drivers because it changed (16:59:53Z) result=reject", "test_name": "[Monitor:legacy-test-framework-invariants-pathological][sig-arch] events should not repeat pathologically for ns/openshift-cluster-csi-drivers" }, { "url": "https://prow.ci.openshift.org/view/gs/test-platform-results/logs/2013519360701763584", "output": "4 events happened too frequently\n\nevent happened 65 times, something is wrong: namespace/openshift-cluster-csi-drivers deployment/vmware-vsphere-csi-driver-operator hmsg/4ef95af7b6 - reason/SecretUpdated Updated Secret/vsphere-csi-config-secret -n openshift-cluster-csi-drivers because it changed (11:09:57Z) result=reject \nevent happened 49 times, something is wrong: namespace/openshift-cluster-csi-drivers deployment/vmware-vsphere-csi-driver-operator hmsg/5fc8006e70 - reason/DeploymentUpdated Updated Deployment.apps/vmware-vsphere-csi-driver-controller -n openshift-cluster-csi-drivers because it changed (11:09:57Z) result=reject \nevent happened 58 times, something is wrong: namespace/openshift-cluster-csi-drivers deployment/vmware-vsphere-csi-driver-operator hmsg/f08cbd1e38 - reason/DaemonSetUpdated Updated DaemonSet.apps/vmware-vsphere-csi-driver-node -n openshift-cluster-csi-drivers because it changed (11:10:10Z) result=reject \nevent happened 26 times, something is wrong: namespace/openshift-cluster-csi-drivers deployment/vmware-vsphere-csi-driver-controller hmsg/33b34fff13 - reason/ScalingReplicaSet (combined from similar events): Scaled down replica set vmware-vsphere-csi-driver-controller-6c5f784757 from 2 to 1 (11:09:57Z) result=reject", "test_name": "[Monitor:legacy-test-framework-invariants-pathological][sig-arch] events should not repeat pathologically for ns/openshift-cluster-csi-drivers" }, { "url": "https://prow.ci.openshift.org/view/gs/test-platform-results/logs/2013111257472700416", "output": "3 events happened too frequently\n\nevent happened 31 times, something is wrong: namespace/openshift-cluster-csi-drivers deployment/vmware-vsphere-csi-driver-operator hmsg/4ef95af7b6 - reason/SecretUpdated Updated Secret/vsphere-csi-config-secret -n openshift-cluster-csi-drivers because it changed (08:03:00Z) result=reject \nevent happened 35 times, something is wrong: namespace/openshift-cluster-csi-drivers deployment/vmware-vsphere-csi-driver-operator hmsg/f08cbd1e38 - reason/DaemonSetUpdated Updated DaemonSet.apps/vmware-vsphere-csi-driver-node -n openshift-cluster-csi-drivers because it changed (08:03:13Z) result=reject \nevent happened 25 times, something is wrong: namespace/openshift-cluster-csi-drivers deployment/vmware-vsphere-csi-driver-operator hmsg/5fc8006e70 - reason/DeploymentUpdated Updated Deployment.apps/vmware-vsphere-csi-driver-controller -n openshift-cluster-csi-drivers because it changed (08:03:02Z) result=reject", "test_name": "[Monitor:legacy-test-framework-invariants-pathological][sig-arch] events should not repeat pathologically for ns/openshift-cluster-csi-drivers" }, ...Assisted by: Claude