Skip to content

Conversation

@dprotaso
Copy link
Member

@dprotaso dprotaso commented Jun 25, 2025

This stacks onto the shared main PR - #3190

You can review the webhook changes here

  • webhook changes for observability
  • go mod
  • vendor

@knative-prow knative-prow bot requested review from creydr and skonto June 25, 2025 13:05
@knative-prow knative-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 25, 2025
@dprotaso dprotaso changed the title Webhook OTel changes [wip] Webhook OTel changes Jun 25, 2025
@knative-prow knative-prow bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Jun 25, 2025
@dprotaso
Copy link
Member Author

/assign @Cali0707 @evankanderson @skonto

@dprotaso dprotaso force-pushed the observability-webhook branch from 40c7046 to a3884bc Compare June 25, 2025 13:38
@dprotaso dprotaso force-pushed the observability-webhook branch 5 times, most recently from f32c228 to 62402c6 Compare June 26, 2025 04:19
@codecov
Copy link

codecov bot commented Jun 26, 2025

Codecov Report

Attention: Patch coverage is 88.69565% with 13 lines in your changes missing coverage. Please review.

Project coverage is 76.00%. Comparing base (7a5377f) to head (624cd94).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
webhook/conversion.go 68.57% 8 Missing and 3 partials ⚠️
webhook/metrics.go 91.66% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3189      +/-   ##
==========================================
- Coverage   76.05%   76.00%   -0.05%     
==========================================
  Files         205      205              
  Lines       11751    11710      -41     
==========================================
- Hits         8937     8900      -37     
+ Misses       2541     2540       -1     
+ Partials      273      270       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dprotaso dprotaso force-pushed the observability-webhook branch 2 times, most recently from 0977ab8 to bc10cae Compare June 30, 2025 18:32
@dprotaso dprotaso changed the title [wip] Webhook OTel changes Webhook OTel changes Jun 30, 2025
@knative-prow knative-prow bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 30, 2025
@dprotaso
Copy link
Member Author

This is ready for a review cc @evankanderson @Cali0707

Based on the feedback from the OTel folks I removed the package variable instruments. This makes unit tests easier to write and FYI the instruments are de-duped by the otel-go sdk when multiple webhooks are created (excluding the pointers in our webhook struct)

attrs = append(attrs, allowedAttr)
labeler.Add(allowedAttr)

wh.metrics.recordHandlerDuration(ctx,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We report time for bad requests too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad requests may have a different time distribution than "good" requests, which can be a useful hint when debugging, so I'm +1 on separating these.

@dprotaso dprotaso changed the title Webhook OTel changes [webhook] OTel changes Jul 2, 2025
Copy link
Member

@evankanderson evankanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

One concern about use of LabelerFromContext without looking at the second value in admission.go, but this seems like (not counting libraries), this somewhat simplifies our overall code.

attrs = append(attrs, allowedAttr)
labeler.Add(allowedAttr)

wh.metrics.recordHandlerDuration(ctx,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad requests may have a different time distribution than "good" requests, which can be a useful hint when debugging, so I'm +1 on separating these.

}
r.Body = io.NopCloser(&bodyBuffer)

labeler, _ := otelhttp.LabelerFromContext(r.Context())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the second result is false, then:

In this case it is safe to use the Labeler but any attributes added to it will not be used.

Do we want / need to call otelhttp.ContextWithLabeler to set the labeler if this was false?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone's using the handler without that middleware then the labeler is a noop

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I didn't know if that was a concern. You can add a comment here that this relies on that middleware setting the labeler context as another option, since this looks like ignoring an error value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 87 to 94
labeler.Add(
ConversionResultStatus.With(strings.ToLower(response.Response.Result.Status)),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do this after the recordHandlerDuration call?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recordHandlerDuration will add the duration to our custom histogram metric that measures the call to the conversion controller.

The labeler on the other hand will add attributes that otelhttp middelware reports. The the order of these don't really mater in my mind

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the labeler labels won't have any relation to the metric attributes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the labeler labels won't have any relation to the metric attributes?

There is no relationship. What we should probably do is actually get the attribute keys from the label when recording the metric - that way we'll have the exact same attributes (though there's a slice alloc)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok - i moved this invocation upward and record will now fetch the labeler attributes

Comment on lines 43 to 48
AdmissionOperation = attributekey.String("kn.webhook.admission.operation")
AdmissionGroup = attributekey.String("kn.webhook.admission.group")
AdmissionVersion = attributekey.String("kn.webhook.admission.version")
AdmissionKind = attributekey.String("kn.webhook.admission.kind")
AdmissionSubresource = attributekey.String("kn.webhook.admission.subresource")
AdmissionAllowed = attributekey.Bool("kn.webhook.admission.result.allowed")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need admission in these keys? If not, they could cover all kinds of webhook metrics (conversion, defaulting, etc), with fewer definitions. (e.g. the admission and conversion status could use the same attribute)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible but a lot of the fields don't apply to conversion webhooks (maybe just two).

What are your concerns?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having to type a lot of long strings into Prometheus queries. (Yes, it's more human laziness than technical cost, which should be fairly minimal with a reasonable interning implementation.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok - i did a pass take a look - i think it worked out

Copy link
Member

@Cali0707 Cali0707 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/hold in case @dprotaso wants any more reviews

@knative-prow knative-prow bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. labels Jul 4, 2025
@dprotaso dprotaso force-pushed the observability-webhook branch from 7687b56 to 624cd94 Compare July 4, 2025 14:44
@knative-prow knative-prow bot removed the lgtm Indicates that a PR is ready to be merged. label Jul 4, 2025
@dprotaso
Copy link
Member Author

dprotaso commented Jul 4, 2025

rebased

Copy link
Member

@Cali0707 Cali0707 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@knative-prow knative-prow bot added the lgtm Indicates that a PR is ready to be merged. label Jul 4, 2025
@knative-prow
Copy link

knative-prow bot commented Jul 4, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Cali0707, dprotaso, evankanderson

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@dprotaso
Copy link
Member Author

dprotaso commented Jul 4, 2025

/hold cancel

@evankanderson if you have any follow up comments let me know and i can do them in a follow up PR next week.

@knative-prow knative-prow bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 4, 2025
@knative-prow knative-prow bot merged commit f478764 into knative:main Jul 4, 2025
35 of 37 checks passed
@dprotaso dprotaso deleted the observability-webhook branch July 4, 2025 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants