Skip to content

tls: expose detailed certificate validation failure reasons in access logs#42420

Merged
ggreenway merged 7 commits intoenvoyproxy:mainfrom
prashanthjos:main
Dec 12, 2025
Merged

tls: expose detailed certificate validation failure reasons in access logs#42420
ggreenway merged 7 commits intoenvoyproxy:mainfrom
prashanthjos:main

Conversation

@prashanthjos
Copy link
Copy Markdown
Contributor

@prashanthjos prashanthjos commented Dec 4, 2025

Commit Message:
Propagate detailed certificate validation errors (e.g., "certificate has expired", "unable to get local issuer certificate", "SAN matcher failed") from DefaultCertValidator through SslExtendedSocketInfo to the transport failure reason string in access logs.

Additional Description:
Previously, when TLS certificate validation failed, access logs only showed the generic CERTIFICATE_VERIFY_FAILED error without actionable details. This made debugging TLS issues difficult in production where debug logging is not feasible. This change adds two new methods to SslExtendedSocketInfo (setCertificateValidationError/certificateValidationError) and wires them through:
ContextImpl::customVerifyCallback() stores error_details from ValidationResults
ValidateResultCallbackImpl::onCertValidationResult() stores error_details for async validation
SslSocket::drainErrorQueue() appends the detailed error when SSL_R_CERTIFICATE_VERIFY_FAILED is encountered

The detailed error is now visible in %UPSTREAM_TRANSPORT_FAILURE_REASON% and %DOWNSTREAM_TRANSPORT_FAILURE_REASON% access log formatters.

Risk Level: Low

Testing:
Added unit tests in default_validator_test.cc: TestCertificateValidationErrorDetailsForSanFailure, TestSslExtendedSocketInfoCertValidationError, TestEmptyCertChainErrorDetails

Added unit tests in handshaker_test.cc: SslExtendedSocketInfoCertValidationError, ValidateResultCallbackStoresErrorDetails
All existing TLS tests pass

Docs Changes: Uses existing documented access log formatters

Release Notes: Added detailed certificate validation failure reasons to %UPSTREAM_TRANSPORT_FAILURE_REASON% and %DOWNSTREAM_TRANSPORT_FAILURE_REASON% access log fields. When TLS certificate validation fails, the access log now includes specific error details such as "certificate has expired" or "unable to get local issuer certificate" instead of only the generic CERTIFICATE_VERIFY_FAILED code.

Fixes #42266

… logs

Previously, when TLS certificate validation failed, Envoy's access logs
only showed the generic "CERTIFICATE_VERIFY_FAILED" error code without
actionable details about why the validation failed. This made debugging
TLS issues difficult, especially in production environments where debug
logging is not feasible.

This change propagates the detailed certificate validation error (e.g.,
"certificate has expired", "unable to get local issuer certificate",
"SAN matcher failed") from the DefaultCertValidator through
SslExtendedSocketInfo to the transport failure reason string that
appears in access logs.

The detailed error is appended to the existing failure_reason_ when
SSL_R_CERTIFICATE_VERIFY_FAILED is encountered, resulting in access
log output like:

  TLS_error:|268435581:SSL:CERTIFICATE_VERIFY_FAILED:verify cert failed:
  X509_verify_cert: certificate verification error at depth 0:
  certificate has expired:TLS_error_end

This information is exposed via the existing access log formatters:
- %UPSTREAM_TRANSPORT_FAILURE_REASON% (for upstream/cluster connections)
- %DOWNSTREAM_TRANSPORT_FAILURE_REASON% (for downstream/listener mTLS)

Signed-off-by: Prashanth Josyula <prashanth.chaitanya@salesforce.com>
… logs

Previously, when TLS certificate validation failed, Envoy's access logs
only showed the generic "CERTIFICATE_VERIFY_FAILED" error code without
actionable details about why the validation failed. This made debugging
TLS issues difficult, especially in production environments where debug
logging is not feasible.

This change propagates the detailed certificate validation error (e.g.,
"certificate has expired", "unable to get local issuer certificate",
"SAN matcher failed") from the DefaultCertValidator through
SslExtendedSocketInfo to the transport failure reason string that
appears in access logs.

The detailed error is appended to the existing failure_reason_ when
SSL_R_CERTIFICATE_VERIFY_FAILED is encountered, resulting in access
log output like:

  TLS_error:|268435581:SSL:CERTIFICATE_VERIFY_FAILED:verify cert failed:
  X509_verify_cert: certificate verification error at depth 0:
  certificate has expired:TLS_error_end

This information is exposed via the existing access log formatters:
- %UPSTREAM_TRANSPORT_FAILURE_REASON% (for upstream/cluster connections)
- %DOWNSTREAM_TRANSPORT_FAILURE_REASON% (for downstream/listener mTLS)

Signed-off-by: Prashanth Josyula <prashanth.chaitanya@salesforce.com>
@prashanthjos
Copy link
Copy Markdown
Contributor Author

/retest

@ggreenway ggreenway self-assigned this Dec 4, 2025
…gs only showed the generic "CERTIFICATE_VERIFY_FAILED" error code without actionable details about why the validation failed. This made debugging TLS issues difficult, especially in production environments where debug logging is not feasible.

This change propagates the detailed certificate validation error (e.g., "certificate has expired", "unable to get local issuer certificate", "SAN matcher failed") from the DefaultCertValidator through
SslExtendedSocketInfo to the transport failure reason string that appears in access logs.

The detailed error is appended to the existing failure_reason_ when SSL_R_CERTIFICATE_VERIFY_FAILED is encountered, resulting in access log output like:

  TLS_error:|268435581:SSL:CERTIFICATE_VERIFY_FAILED:verify cert failed:
  X509_verify_cert: certificate verification error at depth 0:
  certificate has expired:TLS_error_end

This information is exposed via the existing access log formatters:
- %UPSTREAM_TRANSPORT_FAILURE_REASON% (for upstream/cluster connections)
- %DOWNSTREAM_TRANSPORT_FAILURE_REASON% (for downstream/listener mTLS)

Signed-off-by: Prashanth Josyula <prashanth.chaitanya@salesforce.com>
@prashanthjos prashanthjos requested a review from wbpcode as a code owner December 5, 2025 01:11
@prashanthjos
Copy link
Copy Markdown
Contributor Author

/retest

@prashanthjos
Copy link
Copy Markdown
Contributor Author

@ggreenway @wbpcode can you please take a look when you get sometime.

Copy link
Copy Markdown
Member

@ggreenway ggreenway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really nice improvement. Thanks!

/wait

while (uint64_t err = ERR_get_error()) {
if (ERR_GET_LIB(err) == ERR_LIB_SSL) {
if (ERR_GET_REASON(err) == SSL_R_PEER_DID_NOT_RETURN_A_CERTIFICATE) {
ctx_->stats().fail_verify_no_cert_.inc();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything in the details text right now if the peer didn't provide a cert? If not, should we add something here in this case?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Comment on lines +853 to +854
EXPECT_THAT(error_details, testing::HasSubstr("verify cert failed"));
EXPECT_THAT(error_details, testing::HasSubstr("SAN matcher"));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the entire string of error_details here? Can this match either the full string or a longer substring?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

…gs only showed the generic "CERTIFICATE_VERIFY_FAILED" error code without actionable details about why the validation failed. This made debugging TLS issues difficult, especially in production environments where debug logging is not feasible.

This change propagates the detailed certificate validation error (e.g., "certificate has expired", "unable to get local issuer certificate","SAN matcher failed") from the DefaultCertValidator through
SslExtendedSocketInfo to the transport failure reason string that appears in access logs.

The detailed error is appended to the existing failure_reason_ when SSL_R_CERTIFICATE_VERIFY_FAILED is encountered, resulting in access log output like:

  TLS_error:|268435581:SSL:CERTIFICATE_VERIFY_FAILED:verify cert failed:
  X509_verify_cert: certificate verification error at depth 0:
  certificate has expired:TLS_error_end

This information is exposed via the existing access log formatters:
- %UPSTREAM_TRANSPORT_FAILURE_REASON% (for upstream/cluster connections)
- %DOWNSTREAM_TRANSPORT_FAILURE_REASON% (for downstream/listener mTLS)

Signed-off-by: Prashanth Josyula <prashanth.chaitanya@salesforce.com>
@prashanthjos
Copy link
Copy Markdown
Contributor Author

/retest

@prashanthjos
Copy link
Copy Markdown
Contributor Author

@ggreenway please have a relook I have addressed your comments

@wbpcode wbpcode self-assigned this Dec 8, 2025
Copy link
Copy Markdown
Member

@ggreenway ggreenway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/wait

} else if (saw_no_client_cert) {
absl::StrAppend(&failure_reason_, ":peer did not provide required client certificate");
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be simplified as below? Anything we miss trying to remove the repetitive if check

  // Append detailed error info for certificate-related failures.
  if (saw_cert_verify_failed || saw_no_client_cert) {
    auto* extended_socket_info = reinterpret_cast<Envoy::Ssl::SslExtendedSocketInfo*>(
        SSL_get_ex_data(rawSsl(), ContextImpl::sslExtendedSocketInfoIndex()));
    if (saw_no_client_cert) {
        absl::StrAppend(&failure_reason_, ":peer did not provide required client certificate");
    } else if (extended_socket_info != nullptr) {
      absl::string_view cert_validation_error = extended_socket_info->certificateValidationError();
      if (!cert_validation_error.empty()) {
        absl::StrAppend(&failure_reason_, ":", cert_validation_error);
      }
    }
  }

Copy link
Copy Markdown
Contributor

@Pawan-Bishnoi Pawan-Bishnoi Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or even

  if (saw_no_client_cert) {
        absl::StrAppend(&failure_reason_, ":peer did not provide required client certificate");
  } else if (saw_cert_verify_failed) {
  // Append detailed error info for certificate-related failures.
    auto* extended_socket_info = reinterpret_cast<Envoy::Ssl::SslExtendedSocketInfo*>(
        SSL_get_ex_data(rawSsl(), ContextImpl::sslExtendedSocketInfoIndex()));
    if (extended_socket_info != nullptr) {
      absl::string_view cert_validation_error = extended_socket_info->certificateValidationError();
      if (!cert_validation_error.empty()) {
        absl::StrAppend(&failure_reason_, ":", cert_validation_error);
      }
    }
  }

wbpcode
wbpcode previously approved these changes Dec 11, 2025
Copy link
Copy Markdown
Member

@wbpcode wbpcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I think it would be better to merge the setCertificateValidationError and setCertificateValidationStatus. But current change also fine to me.

@wbpcode wbpcode removed the waiting label Dec 11, 2025
@wbpcode
Copy link
Copy Markdown
Member

wbpcode commented Dec 11, 2025

Defer this to @ggreenway for further review/approval.

And hello, @ggreenway, I see you left a require change, but I didn't see any new comment from you. Do I missed something?

@ggreenway
Copy link
Copy Markdown
Member

@wbpcode that's weird; github seems to have discarded the comment I left in that review.

Comment thread source/common/tls/ssl_socket.cc Outdated
absl::StrAppend(&failure_reason_, ":peer did not provide required client certificate");
}
} else if (saw_no_client_cert) {
absl::StrAppend(&failure_reason_, ":peer did not provide required client certificate");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refactor the logic so there's only one case that appends this error. Maybe something roughly like this:

bool added_reason = false;
if (saw_cert_verify_failed) {
  // checks for null
  StrAppend(failure_reason_, "the text");
  added_reason = true;
}
if (!added_reason && saw_no_client_cert) {
  StrAppend(......);
}

… logs

Previously, when TLS certificate validation failed, Envoy's access logs
only showed the generic "CERTIFICATE_VERIFY_FAILED" error code without
actionable details about why the validation failed. This made debugging
TLS issues difficult, especially in production environments where debug
logging is not feasible.

This change propagates the detailed certificate validation error (e.g.,
"certificate has expired", "unable to get local issuer certificate",
"SAN matcher failed") from the DefaultCertValidator through
SslExtendedSocketInfo to the transport failure reason string that
appears in access logs.

The detailed error is appended to the existing failure_reason_ when
SSL_R_CERTIFICATE_VERIFY_FAILED is encountered, resulting in access
log output like:

  TLS_error:|268435581:SSL:CERTIFICATE_VERIFY_FAILED:verify cert failed:
  X509_verify_cert: certificate verification error at depth 0:
  certificate has expired:TLS_error_end

This information is exposed via the existing access log formatters:
- %UPSTREAM_TRANSPORT_FAILURE_REASON% (for upstream/cluster connections)
- %DOWNSTREAM_TRANSPORT_FAILURE_REASON% (for downstream/listener mTLS)

Signed-off-by: Prashanth Josyula <prashanth.16@gmail.com>
Copy link
Copy Markdown
Member

@ggreenway ggreenway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Member

@ggreenway ggreenway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a release note changelogs/current.yaml

/wait

@ggreenway ggreenway enabled auto-merge (squash) December 11, 2025 22:56
Signed-off-by: Prashanth Josyula <prashanth.16@gmail.com>
auto-merge was automatically disabled December 11, 2025 23:12

Head branch was pushed to by a user without write access

@ggreenway
Copy link
Copy Markdown
Member

There's a merge conflict on the changelog. Please merge main into your PR and resolve.

/wait

Signed-off-by: Prashanth Josyula <prashanth.16@gmail.com>
@prashanthjos
Copy link
Copy Markdown
Contributor Author

/retest

@ggreenway ggreenway merged commit ab62d9e into envoyproxy:main Dec 12, 2025
24 checks passed
MayorFaj pushed a commit to MayorFaj/envoy that referenced this pull request Dec 15, 2025
… logs (envoyproxy#42420)

Propagate detailed certificate validation errors (e.g., "certificate has
expired", "unable to get local issuer certificate", "SAN matcher
failed") from DefaultCertValidator through SslExtendedSocketInfo to the
transport failure reason string in access logs.

Previously, when TLS certificate validation failed, access logs only
showed the generic CERTIFICATE_VERIFY_FAILED error without actionable
details. This made debugging TLS issues difficult in production where
debug logging is not feasible. 

The detailed error is now visible in %UPSTREAM_TRANSPORT_FAILURE_REASON%
and %DOWNSTREAM_TRANSPORT_FAILURE_REASON% access log formatters.

Fixes envoyproxy#42266

Signed-off-by: Prashanth Josyula <prashanth.chaitanya@salesforce.com>
Signed-off-by: MayorFaj <mayorfaj@gmail.com>
grnmeira pushed a commit to grnmeira/envoy that referenced this pull request Mar 20, 2026
… logs (envoyproxy#42420)

Propagate detailed certificate validation errors (e.g., "certificate has
expired", "unable to get local issuer certificate", "SAN matcher
failed") from DefaultCertValidator through SslExtendedSocketInfo to the
transport failure reason string in access logs.

Previously, when TLS certificate validation failed, access logs only
showed the generic CERTIFICATE_VERIFY_FAILED error without actionable
details. This made debugging TLS issues difficult in production where
debug logging is not feasible.

The detailed error is now visible in %UPSTREAM_TRANSPORT_FAILURE_REASON%
and %DOWNSTREAM_TRANSPORT_FAILURE_REASON% access log formatters.

Fixes envoyproxy#42266

Signed-off-by: Prashanth Josyula <prashanth.chaitanya@salesforce.com>
Signed-off-by: Gustavo <grnmeira@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Improve TLS Certificate Validation Error Visibility in Logs

5 participants