Skip to content
This repository was archived by the owner on May 6, 2026. It is now read-only.

chore: Log device names which we are publishing#260

Merged
aojea merged 1 commit into
google:mainfrom
gauravkghildiyal:log-device-names
Oct 16, 2025
Merged

chore: Log device names which we are publishing#260
aojea merged 1 commit into
google:mainfrom
gauravkghildiyal:log-device-names

Conversation

@gauravkghildiyal
Copy link
Copy Markdown
Member

No description provided.

@gauravkghildiyal gauravkghildiyal force-pushed the log-device-names branch 3 times, most recently from 612b4d5 to 48efb54 Compare October 14, 2025 23:58
Comment thread pkg/driver/dra_hooks.go Outdated
select {
case devices := <-np.netdb.GetResources(ctx):
klog.V(4).Infof("Received %d devices", len(devices))
klog.V(3).Infof("Got %d devices from inventory: %s", len(devices), getDeviceNames(devices))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was on purpose not logging all devices, since that information can be unbounded, you can use level 7 if you want to log all the existing devices, but we've been beating by this in the past in several places, I still remember a bug in ingress-gce that logged all nodes and it caused major issues at scale

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a valid concern and one that I had myself as well, but the reason why I still went ahead with that was because I also assumed that the number of devices on a node would be bounded to a smaller number.

But given we've had this doubt, I've made it such that we only log an arbitrary subset of device names.

Copy link
Copy Markdown
Contributor

@aojea aojea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logging entire list of names is very expensive, so if you still want to add that we need to put it at least at level 7 in a separate statement

@gauravkghildiyal
Copy link
Copy Markdown
Member Author

logging entire list of names is very expensive, so if you still want to add that we need to put it at least at level 7 in a separate statement

I've made adjustments to not log all device names. So this is what helped identify #263 so there's real value in having even some fixed size of names log (and likely this fixed size of names should serve most cases)

Comment thread pkg/driver/dra_hooks.go
klog.V(3).Infof("Got %d devices from inventory: %s", len(devices), formatDeviceNames(devices, 15))
devices = filter.FilterDevices(np.celProgram, devices)
klog.V(4).Infof("After filtering %d devices", len(devices))
klog.V(3).Infof("After filtering, publishing %d devices in ResourceSlice(s): %s", len(devices), formatDeviceNames(devices, 15))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have some debug mode which prints all devices? That might be a good middle ground.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying debug mode in ADDITION to the subset of devices?

(Usually I've not found debug mode to be helpful, because by the issue we wanted to catch is already missed, so I would want something else as well, besides the debug mode.)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's fair, debug mode can sometimes be helpful if we can repro the issue but can't see enough logs as is. Do we not usually expect to see > 15 devices in our inventory? If so then we can go ahead with this, but might want to make it something we can pass as a command line variable so if we need to we can adjust it if it's not too much work to plumb it here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, thanks! I think 15 should be good for now. As you described, we've not even close to 15 so far. Mostly just 10.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is always something we can improve on, this seems good as is and adds value.

Copy link
Copy Markdown
Contributor

@aojea aojea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bounded list LGTM

@aojea aojea merged commit 3ad1a04 into google:main Oct 16, 2025
6 of 7 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants