volcano job collector#207
Conversation
WalkthroughAdds support for collecting Volcano Job resources: new VolcanoJobCollector with dynamic informer watching for batch.volcano.sh jobs, ResourceType enum extended, RBAC rule added, and PolicyConfig updated to support excluded volcano jobs and registration in controller flows. Changes
Sequence DiagramsequenceDiagram
participant DynClient as DynamicClient
participant Inf as SharedInformer
participant Col as VolcanoJobCollector
participant Batch as ResourcesBatcher
participant Ch as ResourceChannel
rect rgb(200,240,200)
Note over DynClient,Col: Start
Col->>DynClient: create dynamic informer factory
DynClient->>Inf: build informer for batch.volcano.sh/jobs
Inf->>Col: attach Add/Update/Delete handlers
Col->>Inf: start informer & wait for cache sync
end
rect rgb(240,220,200)
Note over Inf,Batch: Event handling
Inf->>Col: event (Add/Update/Delete)
Col->>Col: type assert & isExcluded?
alt excluded
Col->>Col: drop event
else not excluded
Col->>Col: processJob -> build payload
Col->>Batch: queue CollectedResource
end
end
rect rgb(220,240,240)
Note over Batch,Ch: Batching output
Batch->>Batch: accumulate & flush
Batch->>Ch: emit batched resources
end
rect rgb(240,200,240)
Note over Col,Inf: Stop
Col->>Inf: stop informer
Col->>Batch: stop & drain
Col->>Ch: close channel
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/controller/collectionpolicy_controller.go (1)
79-138: ExcludedVolcanoJobs is never populated from the CRD
ExcludedVolcanoJobs []collector.ExcludedVolcanoJobexists inPolicyConfigand is checked inidentifyAffectedCollectorsand passed to collector constructors, but it has no corresponding field in theExclusionsstruct (api/v1/collectionpolicy_types.go) and is never populated increateNewConfig. Any Volcano job exclusions configured in the CRD will be silently ignored.Add
ExcludedVolcanoJobs []ExcludedVolcanoJobto theExclusionsstruct and populate it increateNewConfigsimilar to other excluded resources:+ // VolcanoJobs + for _, job := range envSpec.Exclusions.ExcludedVolcanoJobs { + newConfig.ExcludedVolcanoJobs = append(newConfig.ExcludedVolcanoJobs, collector.ExcludedVolcanoJob{ + Namespace: job.Namespace, + Name: job.Name, + }) + }(Note: The same issue affects
ExcludedDatadogReplicaSets,ExcludedArgoRollouts, andExcludedKubeflowNotebooks—all are missing from the CRD Exclusions struct and should be added as well.)
🧹 Nitpick comments (1)
internal/collector/volcano_job_collector.go (1)
89-118: VolcanoJob collector behavior and wiring look solid overall
- Uses the correct GVR
{Group: "batch.volcano.sh", Version: "v1alpha1", Resource: "jobs"}and dynamic informer.- Namespace handling (single‑namespace informer vs. all‑namespace informer plus
isExcludednamespace filter) is consistent and safe, albeit slightly over‑broad in terms of watch scope for multi‑namespace configs.ExcludedVolcanoJobmapping to amap[types.NamespacedName]boolprovides efficient exclusion checks.IsAvailableprobes the resource with a boundedListand reports telemetry on failure.AddResourcevalidates type and reuseshandleJobEvent, matching other collectors’ patterns.Apart from the Stop/batchChan race, the implementation fits well into the existing collector framework.
If you want to tighten scope for multi‑namespace setups, you could eventually create per‑namespace informers (one per entry in
namespaces) instead of relying on a cluster‑wide informer plusisExcluded.Also applies to: 215-235, 288-313, 364-387, 389-409
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (10)
dist/backend-install.yamlis excluded by!**/dist/**dist/install.yamlis excluded by!**/dist/**dist/installer_updater.yamlis excluded by!**/dist/**dist/zxporter.yamlis excluded by!**/dist/**gen/api/v1/apiv1connect/k8s.connect.gois excluded by!**/gen/**gen/api/v1/common.pb.gois excluded by!**/*.pb.go,!**/gen/**gen/api/v1/k8s.pb.gois excluded by!**/*.pb.go,!**/gen/**gen/api/v1/k8s_grpc.pb.gois excluded by!**/*.pb.go,!**/gen/**gen/api/v1/metrics_collector.pb.gois excluded by!**/*.pb.go,!**/gen/**proto/dakr_proto_descriptor.binis excluded by!**/*.bin
📒 Files selected for processing (6)
config/rbac/role.yaml(1 hunks)internal/collector/interface.go(3 hunks)internal/collector/types.go(1 hunks)internal/collector/volcano_job_collector.go(1 hunks)internal/controller/collectionpolicy_controller.go(6 hunks)proto/metrics_collector.proto(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
internal/collector/interface.go (1)
gen/api/v1/metrics_collector.pb.go (1)
ResourceType_RESOURCE_TYPE_VOLCANO_JOB(174-174)
internal/controller/collectionpolicy_controller.go (3)
internal/collector/volcano_job_collector.go (2)
ExcludedVolcanoJob(39-42)NewVolcanoJobCollector(45-87)internal/collector/batcher.go (2)
DefaultMaxBatchSize(16-16)DefaultMaxBatchTime(19-19)internal/collector/interface.go (1)
VolcanoJob(142-142)
internal/collector/volcano_job_collector.go (4)
internal/collector/interface.go (6)
CollectedResource(309-325)EventTypeAdd(18-18)EventTypeUpdate(20-20)EventTypeDelete(22-22)EventType(12-12)ResourceType(87-87)internal/collector/batcher.go (2)
ResourcesBatcher(23-31)NewResourcesBatcher(34-55)internal/logger/logger.go (1)
Logger(26-29)gen/api/v1/metrics_collector.pb.go (10)
LogLevel_LOG_LEVEL_ERROR(326-326)EventType(27-27)EventType(80-82)EventType(84-86)EventType(93-95)ResourceType(98-98)ResourceType(301-303)ResourceType(305-307)ResourceType(314-316)LogLevel_LOG_LEVEL_WARN(325-325)
internal/collector/types.go (1)
internal/collector/interface.go (3)
VolumeAttachment(140-140)KubeflowNotebook(141-141)VolcanoJob(142-142)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
- GitHub Check: Test on K8s v1.32.3 (manifest)
- GitHub Check: Test on K8s v1.30.8 (manifest)
- GitHub Check: Test on K8s v1.32.3 (helm)
- GitHub Check: Test on K8s v1.28.15 (manifest)
- GitHub Check: Test on K8s v1.30.8 (helm)
- GitHub Check: Test on K8s v1.29.14 (manifest)
- GitHub Check: Test on K8s v1.31.6 (helm)
- GitHub Check: Test on K8s v1.29.14 (helm)
- GitHub Check: Test on K8s v1.31.6 (manifest)
- GitHub Check: Test on K8s v1.27.16 (helm)
- GitHub Check: Test on K8s v1.25.16 (helm)
- GitHub Check: Test on K8s v1.26.15 (helm)
- GitHub Check: Test on K8s v1.28.15 (helm)
- GitHub Check: Test on K8s v1.27.16 (manifest)
- GitHub Check: Test on K8s v1.26.15 (manifest)
- GitHub Check: Test on K8s v1.25.16 (manifest)
- GitHub Check: Test Metrics Server Lifecycle on K8s v1.32.3
- GitHub Check: Analyze (go)
🔇 Additional comments (6)
proto/metrics_collector.proto (1)
125-129: VolcanoJob proto enum addition looks consistent
RESOURCE_TYPE_VOLCANO_JOB = 50cleanly extendsResourceTypewithout renumbering existing values; comment placement and spacing match existing style. No issues here.internal/collector/types.go (1)
5-16: AllResourceTypes correctly extended with VolcanoJobIncluding
VolcanoJobalongsideKubeflowNotebookandVolumeAttachmentkeeps resource enumeration in sync with the new enum value and string mappings. No further changes needed here.config/rbac/role.yaml (1)
126-133: RBAC for Volcano jobs is appropriateGranting
get,list, andwatchonbatch.volcano.shjobsmatches the collector’s read‑only usage and is consistent with existing batch job permissions.internal/collector/interface.go (1)
90-143: VolcanoJob ResourceType wiring is consistent end‑to‑endThe new
VolcanoJobconstant, its"volcano_job"string, andProtoType()mapping toRESOURCE_TYPE_VOLCANO_JOBare all aligned with the generated proto enum. This keeps resource type resolution and disabled‑collector logic working as expected.If you haven’t already, please re‑generate
gen/api/v1to ensure the Go stubs forResourceType_RESOURCE_TYPE_VOLCANO_JOBstay in sync with the proto.Also applies to: 147-196, 205-303
internal/controller/collectionpolicy_controller.go (2)
196-199: Controller RBAC annotations for Volcano jobs align with manifestThe kubebuilder RBAC line for
groups=batch.volcano.sh,resources=jobs,verbs=get;list;watchmatches the ClusterRole rule inconfig/rbac/role.yaml, so generated manifests should stay consistent with the static YAML.After running controller‑gen, please verify the rendered ClusterRole includes the Volcano jobs rule once as expected (no duplication with the hand‑written YAML).
Also applies to: 247-247
776-778: VolcanoJob collector is correctly integrated into restart, registration, and disabled‑collector flows
identifyAffectedCollectorskeys offExcludedVolcanoJobsand marks"volcano_job"for selective restarts.restartCollectorsandhandleDisabledCollectorsChangeboth recreate collectors viacollector.NewVolcanoJobCollectorwithTargetNamespacesandExcludedVolcanoJobs.registerResourceCollectorsregisters aVolcanoJobCollectorwithname: collector.VolcanoJob, so disabled‑collector config based on"volcano_job"will work as with other types.This wiring is consistent with the existing collector patterns.
Also applies to: 1455-1464, 2567-2578, 3197-3206
| // VolcanoJobCollector watches for Volcano Job resources | ||
| type VolcanoJobCollector struct { | ||
| dynamicClient dynamic.Interface | ||
| batchChan chan CollectedResource | ||
| resourceChan chan []CollectedResource | ||
| batcher *ResourcesBatcher | ||
| stopCh chan struct{} | ||
| informers map[string]cache.SharedIndexInformer | ||
| informerStopChs map[string]chan struct{} | ||
| namespaces []string | ||
| excludedJobs map[types.NamespacedName]bool | ||
| logger logr.Logger | ||
| telemetryLogger telemetry_logger.Logger | ||
| mu sync.RWMutex | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, get the file size to understand what we're working with
wc -l internal/collector/volcano_job_collector.goRepository: devzero-inc/zxporter
Length of output: 111
🏁 Script executed:
# Read the key sections mentioned in the review
sed -n '22,36p' internal/collector/volcano_job_collector.go
echo "---"
sed -n '63,87p' internal/collector/volcano_job_collector.go
echo "---"
sed -n '250,273p' internal/collector/volcano_job_collector.go
echo "---"
sed -n '315,352p' internal/collector/volcano_job_collector.goRepository: devzero-inc/zxporter
Length of output: 3087
🏁 Script executed:
# Find other collector files to compare patterns
fd -t f "collector\.go$" internal/collector/ | head -20Repository: devzero-inc/zxporter
Length of output: 927
🏁 Script executed:
# Find the actual location of handleJobEvent definition
rg -n "func.*handleJobEvent" internal/collector/volcano_job_collector.goRepository: devzero-inc/zxporter
Length of output: 167
🏁 Script executed:
# Read lines 315-352 to see what else is mentioned
sed -n '315,352p' internal/collector/volcano_job_collector.goRepository: devzero-inc/zxporter
Length of output: 1122
🏁 Script executed:
# Compare with job_collector.go - check Stop() implementation
rg -A 30 "func.*\*JobCollector\).*Stop" internal/collector/job_collector.go | head -50Repository: devzero-inc/zxporter
Length of output: 920
🏁 Script executed:
# Compare with job_collector.go - check handleJobEvent
rg -A 20 "func.*\*JobCollector\).*handleJobEvent" internal/collector/job_collector.go | head -40Repository: devzero-inc/zxporter
Length of output: 678
🏁 Script executed:
# Check the Start() method in volcano_job_collector to see how informers are managed
rg -A 60 "func.*\*VolcanoJobCollector\).*Start" internal/collector/volcano_job_collector.go | head -80Repository: devzero-inc/zxporter
Length of output: 1851
🏁 Script executed:
# Check if there's any WaitGroup or synchronization for informer goroutines
rg -n "WaitGroup|sync\.Wait|informer\.Run" internal/collector/volcano_job_collector.goRepository: devzero-inc/zxporter
Length of output: 111
🏁 Script executed:
# Check the batcher.stop() implementation to see if it waits for pending sends
fd -t f "batcher.go" internal/collector/ -o internal/Repository: devzero-inc/zxporter
Length of output: 201
🏁 Script executed:
# Find and read batcher.stop() implementation
rg -A 20 "func.*\*ResourcesBatcher\).*stop" internal/collector/Repository: devzero-inc/zxporter
Length of output: 971
🏁 Script executed:
# Check job_collector.go Start() to see if it has same pattern
rg -A 30 "func.*\*JobCollector\).*Start" internal/collector/job_collector.go | head -50Repository: devzero-inc/zxporter
Length of output: 1111
🏁 Script executed:
# Check how many lines in volcano_job_collector.go around line 213 to understand informer lifecycle
sed -n '200,230p' internal/collector/volcano_job_collector.goRepository: devzero-inc/zxporter
Length of output: 862
🏁 Script executed:
# Verify if handleJobEvent has any stopCh checks in volcano_job_collector
sed -n '251,280p' internal/collector/volcano_job_collector.goRepository: devzero-inc/zxporter
Length of output: 1064
🏁 Script executed:
# Double-check if job_collector.go also has similar race by looking at its Stop() more carefully
rg -B 5 -A 40 "func.*\*JobCollector\).*Stop" internal/collector/job_collector.go | tail -50Repository: devzero-inc/zxporter
Length of output: 1283
🏁 Script executed:
# Check if job_collector uses informerFactory.Start() and how it's stopped
rg -n "informerFactory.Start|informerFactory.Stop" internal/collector/job_collector.goRepository: devzero-inc/zxporter
Length of output: 102
Critical race condition: closing batchChan while event handlers may still be executing
handleJobEvent unconditionally sends to batchChan with no synchronization check. However, Stop() closes this channel immediately after signaling informer shutdown, without waiting for in-flight handler goroutines to complete. Because informer.Run() executes asynchronously in a background goroutine (line 213), a handler invocation can race with batchChan being closed, causing a panic on send to closed channel.
The current shutdown sequence in Stop():
- Close
informerStopChs(signals informer to stop, but doesn't wait) - Close
stopCh - Close
batchChan← handlers may still be executing here - Stop batcher
Fix by adding a stopCh check in handleJobEvent:
func (c *VolcanoJobCollector) handleJobEvent(obj *unstructured.Unstructured, eventType EventType) {
name := obj.GetName()
namespace := obj.GetNamespace()
if c.isExcluded(namespace, name) {
return
}
processedObj := c.processJob(obj)
key := fmt.Sprintf("%s/%s", namespace, name)
c.logger.Info("Collected Volcano Job resource", "key", key, "eventType", eventType, "resource", processedObj)
+ select {
+ case <-c.stopCh:
+ return
+ case c.batchChan <- CollectedResource{
- c.batchChan <- CollectedResource{
ResourceType: VolcanoJob,
Object: processedObj,
Timestamp: time.Now(),
EventType: eventType,
Key: key,
+ }
- }
}Avoid closing batchChan in Stop() (let the batcher manage its input channel lifecycle) or close it only after ensuring no more handlers can run. This same pattern should be reviewed and fixed consistently across other collectors (e.g., job_collector.go has the same vulnerability).
Also applies to: lines 63–87, 250–273, 315–352
🤖 Prompt for AI Agents
In internal/collector/volcano_job_collector.go around lines 22–36 and the
referenced areas (lines 63–87, 250–273, 315–352), handlers unconditionally send
to batchChan while Stop() closes batchChan, causing a race and possible panic;
modify handleJobEvent to check stopCh (or otherwise ensure collector is
stopping) before sending to batchChan to avoid sends after shutdown, and
remove/avoid closing batchChan inside Stop() — let the batcher own/close that
channel or only close it after confirming no informer handlers can run (e.g.,
wait for informer goroutines to exit or use a sync.WaitGroup); apply the same
pattern/fix to the other collector files (job_collector.go and similar)
mentioned in the comment.
ba09baa to
79b3dfe
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/controller/collectionpolicy_controller.go (1)
77-138:ExcludedVolcanoJobsis never populated increateNewConfigYou added
ExcludedVolcanoJobs []collector.ExcludedVolcanoJobtoPolicyConfigand you wire it into:
identifyAffectedCollectors(lines 780–782),- selective restart / registration / disabled-collector re-enable flows,
but
createNewConfignever populatesnewConfig.ExcludedVolcanoJobsfromenvSpec.Exclusions. That means any Volcano job exclusions defined in the CRD/env spec will be silently ignored, and exclusion changes will never be detected.Consider adding a conversion block similar to the other
Excluded*types, for example (assuming the spec hasenvSpec.Exclusions.ExcludedVolcanoJobswithNamespaceandNamefields):// CronJobs for _, cron := range envSpec.Exclusions.ExcludedCronJobs { newConfig.ExcludedCronJobs = append(newConfig.ExcludedCronJobs, collector.ExcludedCronJob{ Namespace: cron.Namespace, Name: cron.Name, }) } +// VolcanoJobs +for _, vj := range envSpec.Exclusions.ExcludedVolcanoJobs { + newConfig.ExcludedVolcanoJobs = append(newConfig.ExcludedVolcanoJobs, collector.ExcludedVolcanoJob{ + Namespace: vj.Namespace, + Name: vj.Name, + }) +}Also applies to: 378-599
♻️ Duplicate comments (1)
internal/collector/volcano_job_collector.go (1)
250-273: Race:handleJobEventcan send onbatchChanwhileStopcloses it, causing panics
handleJobEventunconditionally does:c.batchChan <- CollectedResource{...}while
StopclosesbatchChan:if c.batchChan != nil { close(c.batchChan) c.batchChan = nil }Because informers run in background goroutines, there is a real window where:
Stophas closedbatchChan, but- an in-flight Add/Update/Delete handler is still executing
handleJobEvent,leading to
send on closed channelpanic. This is the same issue that was already raised on an earlier revision of this file and injob_collector.go.A robust fix is to:
- Track active handler executions with a
sync.WaitGroup.- Wait for all handlers to finish before closing
batchChan.- Make
Stopidempotent withsync.Onceto avoid double-closing channels.Example patch sketch:
type VolcanoJobCollector struct { dynamicClient dynamic.Interface batchChan chan CollectedResource @@ logger logr.Logger telemetryLogger telemetry_logger.Logger - mu sync.RWMutex + mu sync.RWMutex + handlersWg sync.WaitGroup + stopOnce sync.Once } @@ func (c *VolcanoJobCollector) Start(ctx context.Context) error { @@ - _, err := informer.AddEventHandler(cache.ResourceEventHandlerFuncs{ - AddFunc: func(obj interface{}) { + _, err := informer.AddEventHandler(cache.ResourceEventHandlerFuncs{ + AddFunc: func(obj interface{}) { + c.handlersWg.Add(1) + defer c.handlersWg.Done() @@ - UpdateFunc: func(oldObj, newObj interface{}) { + UpdateFunc: func(oldObj, newObj interface{}) { + c.handlersWg.Add(1) + defer c.handlersWg.Done() @@ - DeleteFunc: func(obj interface{}) { + DeleteFunc: func(obj interface{}) { + c.handlersWg.Add(1) + defer c.handlersWg.Done() @@ func (c *VolcanoJobCollector) Stop() error { - c.logger.Info("Stopping Volcano Job collector") - - // Stop all informers - for key, stopCh := range c.informerStopChs { - c.logger.Info("Stopping informer", "resource", key) - close(stopCh) - } - - c.informers = make(map[string]cache.SharedIndexInformer) - c.informerStopChs = make(map[string]chan struct{}) - - // Close the main stop channel (signals informers to stop) - select { - case <-c.stopCh: - c.logger.Info("Volcano Job collector stop channel already closed") - default: - close(c.stopCh) - c.logger.Info("Closed Volcano Job collector stop channel") - } - - // Close the batchChan (input to the batcher). - if c.batchChan != nil { - close(c.batchChan) - c.batchChan = nil - c.logger.Info("Closed Volcano Job collector batch input channel") - } - - // Stop the batcher (waits for completion). - if c.batcher != nil { - c.batcher.stop() - c.logger.Info("Volcano Job collector batcher stopped") - } - // resourceChan is closed by the batcher's defer func. - - return nil + c.stopOnce.Do(func() { + c.logger.Info("Stopping Volcano Job collector") + + // Signal informers to stop. + for key, stopCh := range c.informerStopChs { + c.logger.Info("Stopping informer", "resource", key) + close(stopCh) + } + c.informers = make(map[string]cache.SharedIndexInformer) + c.informerStopChs = make(map[string]chan struct{}) + + // Wait for all in-flight event handlers to finish before touching batchChan. + c.handlersWg.Wait() + c.logger.Info("All Volcano Job event handlers completed") + + // Close the main stop channel (used by the Start() helper goroutine). + select { + case <-c.stopCh: + c.logger.Info("Volcano Job collector stop channel already closed") + default: + close(c.stopCh) + c.logger.Info("Closed Volcano Job collector stop channel") + } + + // Now it is safe to close the batchChan and stop the batcher. + if c.batchChan != nil { + close(c.batchChan) + c.batchChan = nil + c.logger.Info("Closed Volcano Job collector batch input channel") + } + + if c.batcher != nil { + c.batcher.stop() + c.logger.Info("Volcano Job collector batcher stopped") + } + // resourceChan is closed by the batcher. + }) + return nil }This keeps the overall lifecycle the same but ensures no goroutine can be sending into
batchChanat the moment it is closed, and prevents double-close panics ifStopis called more than once. You may also want to apply the same pattern (or similar) tojob_collector.goto eliminate the same class of race there.Also applies to: 315-352
🧹 Nitpick comments (1)
internal/collector/volcano_job_collector.go (1)
38-87: Constructor and exclusion map look goodBuilding a
map[types.NamespacedName]boolfrom[]ExcludedVolcanoJoband passing bufferedbatchChan/resourceChanintoResourcesBatcherfollows the existing collector patterns and gives O(1) exclusion lookups.One stylistic note: all other
Excluded*types live ininternal/collector/types.go. For consistency, you might consider movingExcludedVolcanoJobthere as well and keeping this file focused on collector behavior.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (4)
dist/backend-install.yamlis excluded by!**/dist/**dist/install.yamlis excluded by!**/dist/**dist/installer_updater.yamlis excluded by!**/dist/**dist/zxporter.yamlis excluded by!**/dist/**
📒 Files selected for processing (5)
config/rbac/role.yaml(1 hunks)internal/collector/interface.go(3 hunks)internal/collector/types.go(1 hunks)internal/collector/volcano_job_collector.go(1 hunks)internal/controller/collectionpolicy_controller.go(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- internal/collector/interface.go
- config/rbac/role.yaml
🧰 Additional context used
🧬 Code graph analysis (2)
internal/collector/types.go (1)
internal/collector/interface.go (3)
VolumeAttachment(140-140)KubeflowNotebook(141-141)VolcanoJob(142-142)
internal/controller/collectionpolicy_controller.go (3)
internal/collector/volcano_job_collector.go (2)
ExcludedVolcanoJob(39-42)NewVolcanoJobCollector(45-87)internal/collector/batcher.go (2)
DefaultMaxBatchSize(16-16)DefaultMaxBatchTime(19-19)internal/collector/interface.go (1)
VolcanoJob(142-142)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Build Docker Image
- GitHub Check: Analyze (go)
- GitHub Check: Run make test
- GitHub Check: Build Docker Image
🔇 Additional comments (9)
internal/collector/types.go (1)
4-16: VolcanoJob inclusion inAllResourceTypesis correct and necessaryAdding
VolcanoJobto the centralAllResourceTypesslice keeps the registry consistent with theResourceTypeenum and allows generic flows (like restart and registration) to see the new type.internal/controller/collectionpolicy_controller.go (5)
251-251: RBAC for Volcano jobs is aligned with the new collectorThe kubebuilder RBAC annotation for
groups=batch.volcano.sh,resources=jobs,verbs=get;list;watchmatches the GVR used byVolcanoJobCollectorand is scoped read-only, consistent with other optional third‑party collectors.
780-782: Change detection for Volcano job exclusions is wired correctlyAdding the
ExcludedVolcanoJobscomparison toidentifyAffectedCollectorsensures that changing Volcano job exclusions triggers a selective restart of thevolcano_jobcollector, matching the behavior of other resource types.
1459-1468: Selective restart wiring forvolcano_jobmatches the collector constructorThe new
"volcano_job"case correctly instantiatesNewVolcanoJobCollectorwith:
r.DynamicClient,newConfig.TargetNamespaces,newConfig.ExcludedVolcanoJobs,- standard batch size/time and telemetry.
This mirrors how other dynamic collectors (e.g., VPA, Kubeflow) are wired.
2571-2582: VolcanoJobCollector registration and disable flow integration look consistentThe new entry in
collectors:
- Uses
NewVolcanoJobCollectorwith the same(dynamicClient, TargetNamespaces, ExcludedVolcanoJobs, ...)signature as in selective restart.- Participates in the
disabledCollectorsMapgating vianame: collector.VolcanoJob, so"volcano_job"inDisabledCollectorswill correctly skip registration.This is consistent with the patterns used for other optional collectors.
3201-3210: Re-enablingvolcano_jobinhandleDisabledCollectorsChangeis wired correctlyThe
"volcano_job"branch recreates the collector withDynamicClient,TargetNamespaces, andExcludedVolcanoJobs, then flows through the common register/start logic. This keeps enable/disable semantics for Volcano jobs aligned with all other collectors.internal/collector/volcano_job_collector.go (3)
89-248: Informer setup and event handler wiring are solidUsing
dynamicinformer.NewFilteredDynamicSharedInformerFactoryonbatch.volcano.sh/v1alpha1jobs, with:
- namespace scoping (single namespace vs all),
- proper Add/Update/Delete handlers,
- tombstone handling for deletes,
- cache sync with timeout and telemetry reporting,
matches the patterns of other dynamic collectors and should behave well even when the Volcano CRD is missing (guarded by
IsAvailableat registration time).
288-313: Exclusion logic correctly combines namespace scoping and explicit job exclusions
isExcluded:
- Enforces
TargetNamespacessemantics (treating a non-empty list as an allowlist, consistent with other collectors).- Then checks the
excludedJobsmap under anRLock.This matches how other exclusion types are handled and should be safe under concurrent use.
364-387: Availability probe andAddResourcehelper are consistent with the rest of the framework
IsAvailabledoes a cheapListwithLimit: 1on the Volcano Job GVR and reports telemetry on failure, which is exactly whatregisterResourceCollectorsexpects.AddResourcevalidates type, emits telemetry on misuse, and reuseshandleJobEvent(EventTypeAdd).Both align with patterns in other collectors.
Also applies to: 389-409
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
internal/controller/collectionpolicy_controller.go (1)
80-141: WireExcludedVolcanoJobsintocreateNewConfigYou added
ExcludedVolcanoJobs []collector.ExcludedVolcanoJobtoPolicyConfigand pass it intoNewVolcanoJobCollector, butcreateNewConfignever populates this slice fromenvSpec.Exclusions. Unless this is being set elsewhere, any CRD/env-driven excluded volcano jobs will be ignored in practice. Consider mirroring the pattern used forExcludedJobs(and other Excluded* types) here.For example, assuming
envSpec.Exclusionsexposes anExcludedVolcanoJobsslice analogous toExcludedJobs, something like this would align with the existing conversions:@@ func (r *CollectionPolicyReconciler) createNewConfig(envSpec *monitoringv1.CollectionPolicySpec, logger logr.Logger) (*PolicyConfig, bool) { // Jobs for _, job := range envSpec.Exclusions.ExcludedJobs { newConfig.ExcludedJobs = append(newConfig.ExcludedJobs, collector.ExcludedJob{ Namespace: job.Namespace, Name: job.Name, }) } + + // VolcanoJobs + for _, job := range envSpec.Exclusions.ExcludedVolcanoJobs { + newConfig.ExcludedVolcanoJobs = append(newConfig.ExcludedVolcanoJobs, collector.ExcludedVolcanoJob{ + Namespace: job.Namespace, + Name: job.Name, + }) + }Please confirm whether the spec/env wiring for
ExcludedVolcanoJobsexists elsewhere or needs to be added here.Also applies to: 311-618
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (4)
dist/backend-install.yamlis excluded by!**/dist/**dist/install.yamlis excluded by!**/dist/**dist/installer_updater.yamlis excluded by!**/dist/**dist/zxporter.yamlis excluded by!**/dist/**
📒 Files selected for processing (2)
internal/collector/interface.go(3 hunks)internal/controller/collectionpolicy_controller.go(6 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
internal/controller/collectionpolicy_controller.go (3)
internal/collector/volcano_job_collector.go (2)
ExcludedVolcanoJob(39-42)NewVolcanoJobCollector(45-87)internal/collector/batcher.go (2)
DefaultMaxBatchSize(16-16)DefaultMaxBatchTime(19-19)internal/collector/interface.go (1)
VolcanoJob(142-142)
internal/collector/interface.go (1)
gen/api/v1/metrics_collector.pb.go (1)
ResourceType_RESOURCE_TYPE_VOLCANO_JOB(174-174)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Build Docker Image
- GitHub Check: Run make test
- GitHub Check: Analyze (go)
- GitHub Check: Build Docker Image
🔇 Additional comments (6)
internal/collector/interface.go (1)
141-143: VolcanoJob ResourceType wiring is consistentEnum value,
String()mapping ("volcano_job"), andProtoType()mapping toRESOURCE_TYPE_VOLCANO_JOBare all aligned with the existing patterns and protobuf definitions. No changes needed.Also applies to: 196-197, 304-305
internal/controller/collectionpolicy_controller.go (5)
239-255: RBAC for Volcano jobs matches collector’s read‑only usageThe new kubebuilder RBAC annotation for
batch.volcano.shjobs(get;list;watch) is consistent with how the VolcanoJob collector is used as a read‑only watcher alongside other optional third‑party resources.
783-785: ExcludedVolcanoJobs correctly drive selective restart forvolcano_jobcollectorHooking
ExcludedVolcanoJobsintoidentifyAffectedCollectorsso that changes flag"volcano_job"keeps the selective‑restart logic in line with other Excluded* lists.
1462-1471: Selective restart wiring forvolcano_jobmirrors other dynamic collectorsThe
volcano_jobcase inrestartCollectorsrecreatesNewVolcanoJobCollectorwithDynamicClient,TargetNamespaces,ExcludedVolcanoJobs, and the standard batch/logging parameters, matching the constructor signature and patterns of other namespaced collectors.
2593-2604: VolcanoJobCollector included in initial registration and DisabledCollectors flowAdding
NewVolcanoJobCollectorto thecollectorsslice withname: collector.VolcanoJobensures it’s registered on initialization and can be controlled viaDisabledCollectorsusing the"volcano_job"key, consistent with other resource types.
3232-3241: Disabled‑collector re‑enablement coversvolcano_jobThe new
"volcano_job"branch inhandleDisabledCollectorsChangereconstructsNewVolcanoJobCollectorfromnewConfigwith the same arguments as other paths, so re‑enabling a previously disabled volcano job collector behaves consistently with the rest of the system.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.