tls: Add CRL support#2255
Conversation
4d76125 to
181790a
Compare
There was a problem hiding this comment.
This is (common?) interface and not implementation, so I don't think you should use BoringSSL-specific types here.
There was a problem hiding this comment.
So, there was a request in the previous PR to have the class hold a bssl::UniquePtr<X509_CRL> so that invalid data would be caught sooner. Agreed that it's a bit ugly to have BoringSSL-specific types in the interface, so happy to make whichever changes you / @ggreenway would prefer.
There was a problem hiding this comment.
I agree with @PiotrSikora that we shouldn't expose boringssl at this interface level. I haven't fully reviewed the change but can we find a way to hide this? We might need to add a CRL interface also.
There was a problem hiding this comment.
So, here's three possible options:
-
Have
ContextConfigcontain aDataSource, or just the raw bytes for a CRL.
Pros: we don't expose BoringSSL in this interface; simple.
Cons: if the CRL is invalid, we won't find out until we try to instantiate a listener (since that's when we turn the bytes into a CRL). -
Have
ContextConfigcontain the BoringSSL types.
Pros: we parse the CRL ~immediately and thus can error at config load time; already exists.
Cons: doesn't feel likeContextConfigshould have BoringSSL types. -
Introduce a
CRLinterface (or something similar), parse the CRL up-front, wrap into the interface and use that here.
Pros: probably can get the best of both worlds (verifying the CRL up-front and hiding the concrete type).
Cons: complicated; need to make sure that whatever parses the config does so into a format that theContextConfigimplementation understands (i.e. there's coupling between config loading and theContextConfigImpl).
Thoughts?
There was a problem hiding this comment.
Yeah agreed this is a little tricky. Some random ideas:
-
Variant of (1) where we expose the raw bytes, but also verify that it's valid inside the config and throw if not. (Basically we would end up loading the CRL twice, but I think that's probably not a big deal).
-
Some type of visitor pattern, where the
ContextImplcalls back into the config and the config sets up the SSL context. This would require some dynamic_cast to get at theSSLinside the config impl, but we are already doing that in various places inside config impl for similar reasons, so I think it's probably OK.
There was a problem hiding this comment.
FWIW, #2248 is just loading and exposing raw bytes, without verifying the content. We could start loading them twice, but this will start being costly at scale, and will hurt loading times.
There was a problem hiding this comment.
I think the most pragmatic solution is to just expose raw bytes in this interface, and catch any data-validation errors in the ContextImpl. We already catch many similar types of errors there.
There was a problem hiding this comment.
v1 APIs are deprecated, so I'm not sure if we're supposed to keep adding things here? cc @mattklein123 @htuch
There was a problem hiding this comment.
At this point v2 is required, v1 is optional but not blocked. If folks want to add and doc no problem.
There was a problem hiding this comment.
You should probably wait for ContextConfigImpl::readDataSource() from #2248.
There was a problem hiding this comment.
+1
If not though, change to "if (!crl_conf.filename().empty()) {"
There was a problem hiding this comment.
You might want to look at similar code in #2248, and use the list iterator from bssl::UniquePtr.
There was a problem hiding this comment.
Ooh, that's good to know - I didn't realize there was an iterator on STACKs. Also, regarding the similar code: as I read it, #2248 will load CRLs from the file passed as a CA cert; I don't want to duplicate work, but $WORK definitely needs to be able to load CRLs from a different file from CAs (we essentially fetch CRLs on a cronjob, whereas the CA certs are placed with config management). What are your thoughts on reducing duplication here?
There was a problem hiding this comment.
Yeah, CRLs will be loaded from trusted_ca as well, since that's what the filesystem-based API is currently doing, and I didn't want to change the behavior and suddenly start ignoring CRLs from the trusted_ca file... Having said that, I don't think that's a common use case and handling CRLs via separate field is definitely OK.
There was a problem hiding this comment.
You should use X509_CRL_up_ref() instead of this hack.
There was a problem hiding this comment.
Thanks for the tip - switched to that!
|
@ggreenway do you mind also taking a look? |
ggreenway
left a comment
There was a problem hiding this comment.
Overall looks good, except I think this should wait for readDataSource(), as @PiotrSikora suggested.
There was a problem hiding this comment.
Reminder: Point at your real data-plane-api commit
There was a problem hiding this comment.
Done, thanks! I just pointed at the current commit on master.
There was a problem hiding this comment.
(rebased to fix merge conflict)
There was a problem hiding this comment.
I think the most pragmatic solution is to just expose raw bytes in this interface, and catch any data-validation errors in the ContextImpl. We already catch many similar types of errors there.
There was a problem hiding this comment.
+1
If not though, change to "if (!crl_conf.filename().empty()) {"
There was a problem hiding this comment.
Please also add a test case for invalid crl (empty file maybe?)
There was a problem hiding this comment.
Done! I think an empty file is actually okay (list with 0 elements), so I added a file with invalid PEM data.
d852d77 to
b4252e3
Compare
|
Okay, I believe that I've made all the review changes that have been requested, except for using |
|
#2248 is waiting on some other things, so it will be at least another week. @adunham-stripe would you rather wait on that, or proceed without it? Or maybe you can check with @PiotrSikora and just add readDataSource() in this PR to unblock this? |
|
I'm happy to take either approach! We don't have an immediate need for CRLs yet, so I'm okay waiting and seeing how #2248 lands first. Though, if you'd rather get this PR moving sooner, and if @PiotrSikora is onboard, happy to add |
|
I'm fine with however you want to proceed; I don't think anyone is blocked on CRLs. I just wanted to let you know the status of things and give you options. |
|
Thank you! 😁 I'm going to hold off for now, then; I've subscribed to the other PR and will follow up after it gets merged 👍 |
|
@adunham-stripe the other PR merged if you want to finish this up. Thank you! |
|
Yep, saw that - nice job on the quick turnaround from you folks! I've rebased on master and am just running tests on the changes that @ggreenway requested. Hoping to push the changes by EOD. |
966eb23 to
fbc693a
Compare
|
@mattklein123 - Okay, I just pushed the changes that @ggreenway requested in their review, and fixed tests. Let me know if there's anything else you folks would like! |
mattklein123
left a comment
There was a problem hiding this comment.
Quick skim LGTM, small comment.
@PiotrSikora can you review? @ggreenway is out this week. Thank you!
There was a problem hiding this comment.
We won't ever support inline in v1. I would remove this TODO. Did you add docs for v1 also in data-plane-api? I can't remember.
There was a problem hiding this comment.
Removed! And yes, I did add documentation here.
There was a problem hiding this comment.
Could you move this (and all other changes, really) to right after handing of trusted CA? It pretty much complements it, and the code is so similar that it doesn't make sense to separate it by handling of certificate chain and private key.
There was a problem hiding this comment.
Erm, I meant to move the certificateRevocationList / certificateRevocationListPath handling in all the files, not just this one (i.e. you should also move all the new code in context_config.h, tls_context_json.cc, context_config_impl.h and context_impl.cc).
There was a problem hiding this comment.
Ah, gotcha - moved the rest!
There was a problem hiding this comment.
X509_V_FLAG_CRL_CHECK|X509_V_FLAG_CRL_CHECK_ALL
There was a problem hiding this comment.
Do we actually want to enable X509_V_FLAG_CRL_CHECK_ALL? I had strange failures when turning that flag on during testing, and there's at least one bug still open about it.
There was a problem hiding this comment.
I'm pretty sure we want it. What was the error, do you recall?
There was a problem hiding this comment.
Could you replace this with EXPECT_THROW_WITH_MESSAGE and proper message, to make sure it fails because of invalid CA and not for other reason (like CRL defined, but no trusted CA)?
There was a problem hiding this comment.
Could you add test for v2 with inlined CRL as well? See similar tests in test/server/listener_manager_impl_test.cc , (...which probably should be moved to this file at some point).
There was a problem hiding this comment.
Could you replace this with inline test?
There was a problem hiding this comment.
I don't think I can do that easily since the v1 config doesn't support inline data, AFAIK. Happy to write out a temporary file during the test, perhaps?
There was a problem hiding this comment.
It's fine to replace the existing v1 test with v2 test, since that's the recommended syntax anyway...
Also, there should be v2 tests for success / invalid content (i.e. replacement for this file) / invalid file path.
There was a problem hiding this comment.
Added success / failure cases to the v2 tests - do you want me to just remove the v1 tests in test/common/ssl/context_impl_test.cc?
There was a problem hiding this comment.
Up to you, I'm fine either way.
There was a problem hiding this comment.
We could simply revoke one of the existing certs and add it to the CRL.
If you prefer revoked_cert.pem, that's fine with me, but perhaps reuse san_with_dns.cfg to produce cert instead of adding another copy of the same .cfg file?
There was a problem hiding this comment.
Good point - I just pushed a commit that revokes an existing cert and tweaks the tests.
There was a problem hiding this comment.
s/revoked_cert.crl/ca_cert.crl
453872b to
0da2edd
Compare
…nfiguration Signed-off-by: Andrew Dunham <adunham@stripe.com>
Signed-off-by: Andrew Dunham <adunham@stripe.com>
Signed-off-by: Andrew Dunham <adunham@stripe.com>
Signed-off-by: Andrew Dunham <adunham@stripe.com>
Signed-off-by: Andrew Dunham <adunham@stripe.com>
Signed-off-by: Andrew Dunham <adunham@stripe.com>
Signed-off-by: Andrew Dunham <adunham@stripe.com>
|
@PiotrSikora - Okay, I rebased to fix a merge conflict, and added a couple commits that address almost all of your feedback. I'm going to test the |
Signed-off-by: Andrew Dunham <adunham@stripe.com>
9181a27 to
204fa7c
Compare
|
Well, after much testing, it appears that I can't reproduce the previous error I was seeing, so I just pushed a commit to add the |
Signed-off-by: Andrew Dunham <adunham@stripe.com>
204fa7c to
ee2530f
Compare
|
|
||
| rm *csr | ||
| rm *srl | ||
| rm -r crl |
| @@ -0,0 +1,36 @@ | |||
| [req] | |||
There was a problem hiding this comment.
This file is no longer being used.
Signed-off-by: Andrew Dunham <adunham@stripe.com>
Signed-off-by: Andrew Dunham <adunham@stripe.com>
| fmt::format("Failed to load CRL from {}", config.certificateRevocationListPath())); | ||
| } | ||
|
|
||
| X509_STORE* store_ctx = SSL_CTX_get_cert_store(ctx_.get()); |
There was a problem hiding this comment.
Sorry, I know that I suggested it, but this should be actually store, not store_ctx.
There was a problem hiding this comment.
Also, I know that's not your code (and I cannot comment on it), but in the CA handling, there is a call to SSL_CTX_get_cert_store() on each iteration, instead of before the loop, like you're doing here.
Could you update the old code to match yours? Thanks!
|
|
||
| rm *csr | ||
| rm *srl | ||
| rm -r crl_* |
There was a problem hiding this comment.
You don't need -r anymore.
There was a problem hiding this comment.
Up to you, I'm fine either way.
…' from script Signed-off-by: Andrew Dunham <adunham@stripe.com>
| ca_cert_(readDataSource(config.validation_context().trusted_ca(), true)), | ||
| ca_cert_path_(getDataSourcePath(config.validation_context().trusted_ca())), | ||
| certificate_revocation_list_(readDataSource(config.validation_context().crl(), true)), | ||
| certificate_revocation_list_path_(getDataSourcePath(config.validation_context().crl())), |
There was a problem hiding this comment.
There is one more thing missing here. You should throw an exception (down in the function body) if CRL is set, but there is no CA. All v1 tests are doing this, which obviously shouldn't work.
There was a problem hiding this comment.
Yeah, good point - I've pushed a commit that adds this, along with some tests.
…icate Signed-off-by: Andrew Dunham <adunham@stripe.com>
| // TODO(htuch): Support multiple hashes. | ||
| ASSERT(config.validation_context().verify_certificate_hash().size() <= 1); | ||
| if (ca_cert_.empty() && !certificate_revocation_list_.empty()) { | ||
| throw EnvoyException("Cannot have a CRL without a CA certificate"); |
There was a problem hiding this comment.
Sigh, it looks that this comment got lost somehow.
This should be:
throw EnvoyException(fmt::format("Failed to load CRL from {} without trusted CA certificates",
config.certificateRevocationListPath()));
to match other errors.
There was a problem hiding this comment.
@PiotrSikora - Hah, no worries! Just pushed a commit that changes this and updates the tests.
Signed-off-by: Andrew Dunham <adunham@stripe.com>
| )EOF"; | ||
|
|
||
| EXPECT_THROW_WITH_REGEX(loadConfigJson(json), EnvoyException, | ||
| "Failed to load CRL from .*/not_a_crl.crl$"); |
There was a problem hiding this comment.
Nit: missing ^ to indicate start of the message (you're using it in other places).
PiotrSikora
left a comment
There was a problem hiding this comment.
One last thing (I promise!), you might want to add some comment to test/common/ssl/test_data/README.md to mention (probably in CA's description) that there is CRL that revokes san_with_dns_cert.pem.
Signed-off-by: Andrew Dunham <adunham@stripe.com>
|
@PiotrSikora - Nit fixed, and comment added! |
| - **CA**: Certificate Authority for **No SAN**, **SAN With URI** and **SAN With | ||
| DNS**. It has the self-signed certificate *ca_cert.pem*. *ca_key.pem* is its | ||
| private key. | ||
| private key. Additionally, we create a CRL for this CA (*ca_cert.crl*) that |
There was a problem hiding this comment.
One space should be enough for everybody ;)
There was a problem hiding this comment.
FWIW, this also breaks ci/circleci: format check.
There was a problem hiding this comment.
Aaaaaaagh 😛
Okay, fixed.
Signed-off-by: Andrew Dunham <adunham@stripe.com>
PiotrSikora
left a comment
There was a problem hiding this comment.
LGTM, @mattklein123 please check style and language.
mattklein123
left a comment
There was a problem hiding this comment.
LGTM. Thank you @adunham-stripe and @PiotrSikora!
|
@adunham-stripe sorry one last thing, it would be cool if you could do a quick doc PR to this page: https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/ssl to mention that we now support CRLs (with link to config option). Thank you if you have time. cc @PiotrSikora |
|
Thanks very much for helping get this merged, everyone! I submitted the docs PR over at envoyproxy/data-plane-api#437 |
Description:
Adds support for CRLs. Fixes issue #1496. Duplicate version of #2077.
API Changes:
See Data Plane PR: envoyproxy/data-plane-api#358
Risk Level: Medium (TLS is scary!)
Testing:
Wrote some automated tests that test that revoked certs can't connect while non-revoked certs can.