-
Notifications
You must be signed in to change notification settings - Fork 191
PARQUET-1301: [C++] Crypto package in parquet-cpp #464
Conversation
src/parquet/util/crypto.h
Outdated
| #include "parquet/types.h" | ||
|
|
||
| namespace parquet { | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the definition of encrypt and decrypt be pushed into a .cc file? There's enough going on here that call overhead is negligible and splitting it up will make it easier to read and reduce build times.
src/parquet/util/crypto.h
Outdated
|
|
||
| if (Encryption::PARQUET_AES_GCM_V1 != alg_id) { | ||
| std::stringstream ss; | ||
| ss << "Crypto algorithm " << alg_id << " currently unsupported\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: remove the trailing newline in the exception descriptions to be consistent with other error strings.
src/parquet/util/crypto.h
Outdated
| int plaintext_len; | ||
| int ret; | ||
|
|
||
| int tag_len = 16; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make these const? Same thing in encrypt.
src/parquet/util/crypto.h
Outdated
| throw parquet::ParquetException(ss.str()); | ||
| } | ||
|
|
||
| EVP_CIPHER_CTX *ctx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider initializing this set of variables to something invalid. Makes it a little easier to figure out what's going on when looking at core files.
src/parquet/util/crypto.h
Outdated
| throw parquet::ParquetException(ss.str()); | ||
| } | ||
|
|
||
| EVP_CIPHER_CTX *ctx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Manage ctx with RAII; otherwise any exception after line 63 is going to leak memory.
You could just use a unique_ptr with a custom deleter e.g.
struct EVPContextDeleter {
void operator()(EVP_CIPHER_CTX *ctx) {
if(ctx) { // would need to initialize to null in case there's a throw before the init
EVP_CIPHER_CTX_free(ctx);
}
}
};
... stuff ...
std::unique_ptr<EVP_CYPHER_CTX, EVPContextDeleter> ctx;
src/parquet/util/crypto.h
Outdated
| int len; | ||
| int ciphertext_len; | ||
| uint8_t tag[tag_len]; | ||
| uint8_t iv[iv_len]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that letting tag_len and iv_len be const will be more proper in C++
src/parquet/util/crypto.cc
Outdated
| #include "parquet/exception.h" | ||
| #include "parquet/types.h" | ||
|
|
||
| #include "parquet/util/crypto.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a build issue on Windows in types.h (line 93), it recognize OPTIONAL as something else rather than an enum. When I include openssl/*.h files below line 28 (must be included after types.h) in crypto.cc, it works. It seems that there is some macro on openssl related to OPTIONAL. So could you please change the order of include here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, will do.
src/parquet/util/crypto.cc
Outdated
| #include <iostream> | ||
| #include "parquet/exception.h" | ||
| #include "parquet/util/crypto.h" | ||
| #include "parquet/types.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, this line #include "parquet/types.h" is not necessary because the file is included in crypto.h as well. What I really mean is:
#include "parquet/util/crypto.h"
#include <openssl/aes.h>
#include <openssl/evp.h>
#include <openssl/rand.h>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/parquet/util/crypto.cc
Outdated
|
|
||
| namespace parquet { | ||
|
|
||
| const int gcm_tag_len = 16; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
constexpr is more suited here since the values are known at compile time.
Same for the other two below
src/parquet/util/crypto.cc
Outdated
| const int ctr_iv_len = 16; | ||
|
|
||
| void handleError(const char *message, EVP_CIPHER_CTX *ctx) { | ||
| if (NULL != ctx) EVP_CIPHER_CTX_free(ctx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nullptr instead of NULL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use {} even for a single statement if condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handleError does not need EVP_CIPHER_CTX *ctx argument if we use EvpCipherCtxPtr
src/parquet/util/crypto.cc
Outdated
| uint8_t *key, int key_len, uint8_t *aad, int aad_len, | ||
| uint8_t *ciphertext) | ||
| { | ||
| EVP_CIPHER_CTX *ctx = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nullptr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using a smart pointer with a custom deleter will simplify freeing resources.
Define the following on top and use EvpCipherCtxPtr ctx(EVP_CIPHER_CTX_new()); everywhere
struct EvpCipherCtxDeleter {
void operator()(EVP_CIPHER_CTX *ctx) const {
if (nullptr != ctx) {
EVP_CIPHER_CTX_free(ctx);
}
}
};
using EvpCipherCtxPtr = std::unique_ptr<EVP_CIPHER_CTX, EvpCipherCtxDeleter>;
src/parquet/util/crypto.cc
Outdated
| uint8_t iv[gcm_iv_len]; | ||
|
|
||
| // Random IV | ||
| RAND_load_file("/dev/urandom", 32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
declare constexpr long max_bytes = 32; on top and then use it here?
src/parquet/util/crypto.cc
Outdated
| uint8_t iv[ctr_iv_len]; | ||
|
|
||
| // Random IV | ||
| RAND_load_file("/dev/urandom", 32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
declare constexpr long max_bytes = 32; on top and then use it here?
src/parquet/util/crypto.cc
Outdated
| throw parquet::ParquetException(ss.str()); | ||
| } | ||
|
|
||
| if (16 != key_len) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
declare constexpr int key_length = 16; on top and then use it here?
src/parquet/util/crypto.cc
Outdated
| uint8_t *key, int key_len, uint8_t *aad, int aad_len, | ||
| uint8_t *plaintext) | ||
| { | ||
| EVP_CIPHER_CTX *ctx = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use EvpCipherCtxPtr ctx(EVP_CIPHER_CTX_new());
src/parquet/util/crypto.cc
Outdated
| } | ||
|
|
||
| int gcm_decrypt(const uint8_t *ciphertext, int ciphertext_len, | ||
| uint8_t *key, int key_len, uint8_t *aad, int aad_len, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is key_len used in the body?
src/parquet/util/crypto.cc
Outdated
| } | ||
|
|
||
| int ctr_decrypt(const uint8_t *ciphertext, int ciphertext_len, | ||
| uint8_t *key, int key_len, uint8_t *plaintext) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see key_len being used in the body
src/parquet/util/crypto.cc
Outdated
| throw parquet::ParquetException(ss.str()); | ||
| } | ||
|
|
||
| if (16 != key_len) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use key_length declared above.
|
Travis CI failure has been fixed on master. Can you please rebase? |
b9eabae to
9beef24
Compare
wesm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made some stylistic comments. clang-format (make format) will fix many of these problems
| #include <string> | ||
| #include <sstream> | ||
| #include <iostream> | ||
| #include "parquet/exception.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Order of includes does not follow our style guide
src/parquet/util/crypto.cc
Outdated
| constexpr int gcm_tag_len = 16; | ||
| constexpr int gcm_iv_len = 12; | ||
| constexpr int ctr_iv_len = 16; | ||
| constexpr long max_bytes = 32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use kFooBarBaz naming style for global const/constexpr
src/parquet/util/crypto.cc
Outdated
| constexpr long max_bytes = 32; | ||
|
|
||
| struct EvpCipherCtxDeleter { | ||
| void operator()(EVP_CIPHER_CTX *ctx) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consistent style for pointers is EVP_CIPHER_CTX* ctx
src/parquet/util/crypto.cc
Outdated
| int gcm_encrypt(const uint8_t *plaintext, int plaintext_len, | ||
| uint8_t *key, int key_len, uint8_t *aad, int aad_len, | ||
| uint8_t *ciphertext) | ||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you run clang-format on this file? Will fix a lot of the style issues
src/parquet/util/crypto.cc
Outdated
| if (16 == key_len) { | ||
| // Init AES-GCM with 128-bit key | ||
| if(1 != EVP_EncryptInit_ex(ctx.get(), EVP_aes_128_gcm(), nullptr, nullptr, nullptr)) { | ||
| throw ParquetException("Couldn't init AES_GCM_128 encryption"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps create a macro like
ENCRYPT_CHECK(EXPR, 1, "AES_GCM_128");
This will help with code verbosity throughout this file
src/parquet/util/crypto.h
Outdated
|
|
||
| int decrypt(std::shared_ptr<EncryptionProperties> encryption_props, bool metadata, | ||
| const uint8_t *ciphertext, int ciphertext_len, | ||
| uint8_t *plaintext); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these public APIs? If not public should perhaps put them in an internal namespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, can you capitalize the names of these functions to confirm with the Google style guide? Thanks
src/parquet/util/crypto.cc
Outdated
| } | ||
| }; | ||
|
|
||
| using EvpCipherCtxPtr = std::unique_ptr<EVP_CIPHER_CTX, EvpCipherCtxDeleter>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use simply define a class/struct with a destructor? Why is std::unique_ptr necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested by @majetideepak and @jamesclampffer . Let me know guys which one should I use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is overly elaborate. Unless you expect to need to transfer ownership of a smart pointer to some other call frame, using RAII with a simple class with a destructor that frees the resource is the simplest solution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually prefer the std::unique_ptr approach to avoid creating a new class. std::unique_ptr follows the RAII design principle. I am OK with either RAII approaches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the deleter has already created a new class. Which of these do you prefer?
struct EvpCipherCtxDeleter {
void operator()(EVP_CIPHER_CTX *ctx) const {
if (nullptr != ctx) {
EVP_CIPHER_CTX_free(ctx);
}
}
};
using EvpCipherCtxPtr = std::unique_ptr<EVP_CIPHER_CTX, EvpCipherCtxDeleter>;
int gcm_encrypt(const uint8_t *plaintext, int plaintext_len,
uint8_t *key, int key_len, uint8_t *aad, int aad_len,
uint8_t *ciphertext)
{
int len;
int ciphertext_len;
uint8_t tag[gcm_tag_len];
uint8_t iv[gcm_iv_len];
// Random IV
RAND_load_file("/dev/urandom", max_bytes);
RAND_bytes(iv, sizeof(iv));
// Init cipher context
EvpCipherCtxPtr ctx(EVP_CIPHER_CTX_new());
if(nullptr == ctx.get()) {
throw ParquetException("Couldn't init cipher context");
}
if (16 == key_len) {
// Init AES-GCM with 128-bit key
if(1 != EVP_EncryptInit_ex(ctx.get(), EVP_aes_128_gcm(), nullptr, nullptr, nullptr)) {
throw ParquetException("Couldn't init AES_GCM_128 encryption");
}
}
else if (24 == key_len) {
// Init AES-GCM with 192-bit key
if(1 != EVP_EncryptInit_ex(ctx.get(), EVP_aes_192_gcm(), nullptr, nullptr, nullptr)) {
throw ParquetException("Couldn't init AES_GCM_192 encryption");
}
}
else if (32 == key_len) {
// Init AES-GCM with 256-bit key
if(1 != EVP_EncryptInit_ex(ctx.get(), EVP_aes_256_gcm(), nullptr, nullptr, nullptr)) {
throw ParquetException("Couldn't init AES_GCM_256 encryption");
}
}
// Setting key and IV
if(1 != EVP_EncryptInit_ex(ctx.get(), nullptr, nullptr, key, iv)) {
throw ParquetException("Couldn't set key and IV");
}
...
or
class EvpCipher {
public:
explicit EvpCipher(int key_len) {
ctx_ = EVP_CIPHER_CTX_new();
if(nullptr == ctx_) {
throw ParquetException("Couldn't init cipher context");
}
Init(key_len);
}
~EvpCipher() {
if (nullptr != ctx_) {
EVP_CIPHER_CTX_free(ctx_);
}
}
void Init(const key_len) {
if (16 == key_len) {
// Init AES-GCM with 128-bit key
if(1 != EVP_EncryptInit_ex(ctx_, EVP_aes_128_gcm(), nullptr, nullptr, nullptr)) {
throw ParquetException("Couldn't init AES_GCM_128 encryption");
}
}
else if (24 == key_len) {
// Init AES-GCM with 192-bit key
if(1 != EVP_EncryptInit_ex(ctx_, EVP_aes_192_gcm(), nullptr, nullptr, nullptr)) {
throw ParquetException("Couldn't init AES_GCM_192 encryption");
}
}
else if (32 == key_len) {
// Init AES-GCM with 256-bit key
if(1 != EVP_EncryptInit_ex(ctx_, EVP_aes_256_gcm(), nullptr, nullptr, nullptr)) {
throw ParquetException("Couldn't init AES_GCM_256 encryption");
}
}
// Setting key and IV
if(1 != EVP_EncryptInit_ex(ctx_, nullptr, nullptr, key, iv)) {
throw ParquetException("Couldn't set key and IV");
}
}
private:
EVP_CIPHER_CTX ctx_;
};
int gcm_encrypt(const uint8_t *plaintext, int plaintext_len,
uint8_t *key, int key_len, uint8_t *aad, int aad_len,
uint8_t *ciphertext)
{
int len;
int ciphertext_len;
uint8_t tag[gcm_tag_len];
uint8_t iv[gcm_iv_len];
// Random IV
RAND_load_file("/dev/urandom", max_bytes);
RAND_bytes(iv, sizeof(iv));
EvpCipher cipher(key_len);
...
I think that encapsulation is a good idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wesm , @majetideepak Thank you for the comments and suggestions. I'm posting a new commit that takes them into account, let me know if additional changes are required.
|
Thanks, can you fix the cpplint failures while we review in the meantime? |
|
Sure, will do. |
ca10c85 to
ad17fbe
Compare
src/parquet/util/crypto.cc
Outdated
| constexpr int rndMaxBytes = 32; | ||
|
|
||
| #define ENCRYPT_INIT(ALG) \ | ||
| if (1 != EVP_EncryptInit_ex(ctx_, ALG, nullptr, nullptr, nullptr)) { \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the coding conventions require only the class members names to end with an underscore. Same below
|
|
||
| class EvpCipher { | ||
| public: | ||
| explicit EvpCipher(int cipher, int key_len, int type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ctx_ = nullptr here so that destructor does not see an uninitialized value
src/parquet/util/crypto.cc
Outdated
|
|
||
| // Setting additional authenticated data | ||
| if (nullptr != aad) { | ||
| if (1 != EVP_DecryptUpdate(cipher.get(), nullptr, &len, aad, aad_len)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: combine the two if conditions?
| int len; | ||
| int plaintext_len; | ||
|
|
||
| uint8_t tag[gcmTagLen]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
safer to zero-initialize all of them.
majetideepak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think developers will benefit from a description of some of the related security concepts in the code or in the documentation.
You could cut-copy from the design document as well
https://docs.google.com/document/d/1T89G7xR0zHFV1f2pjTO28jtfVm8qoNVGEJQ70Rsk-bY/edit
|
@majetideepak , your comments are factored in. As for the documentation, I'm working on an encryption.md doc, that will be published in the parquet-format repo, similar to docs for other Parquet features. |
majetideepak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 LGTM. Thanks for making all the changes!
No description provided.