alts: Fix TsiSocket doWrite on short writes#15962
Conversation
Signed-off-by: yihuaz <yihuaz@google.com>
21ac793 to
1647c44
Compare
|
@antoniovicente F.Y.I. |
antoniovicente
left a comment
There was a problem hiding this comment.
Many thanks for taking on improvements to how alts TsiSocket::doWrite interacts with buffer watermarks. First round of comments to improve my understanding of what the code is doing and some potential edge cases that I noticed.
| tsi_result status = frame_protector_->protect(buffer, raw_write_buffer_); | ||
| ENVOY_CONN_LOG(debug, "TSI: protected buffer left: {} result: {}", callbacks_->connection(), | ||
| buffer.length(), tsi_result_to_string(status)); | ||
| return repeatProtectAndWrite(buffer, end_stream); |
There was a problem hiding this comment.
What about the raw_write_buffer_.length() > 0 code below this early return?
There was a problem hiding this comment.
I think we still need it to send handshake data to its peer.
There was a problem hiding this comment.
I agree, but think that it could be moved to the "if (!handshake_complete_) {" branch.
| while (true) { | ||
| uint64_t bytes_to_drain_this_iteration = | ||
| prev_bytes_to_drain_ > 0 ? prev_bytes_to_drain_ | ||
| : std::min(buffer.length(), actual_frame_size_to_use_); |
There was a problem hiding this comment.
Does actual_frame_size_to_use_ include framing overhead or not?
We need bytes excluding overhead in order to emit a single protected frame instead of a large frame followed by a tiny frame.
There was a problem hiding this comment.
It includes the framing overhead. You are right. When computing bytes_to_drain_this_iteration, we should exclude the overhead from actual_frame_size_to_use_.
| ENVOY_CONN_LOG(debug, "TSI: protecting buffer size: {}", callbacks_->connection(), | ||
| bytes_to_protect_this_iteration.length()); | ||
| tsi_result status = | ||
| frame_protector_->protect(bytes_to_protect_this_iteration, raw_write_buffer_); |
There was a problem hiding this comment.
Consider changing the protect method to accept a string_view with the input bytes to avoid copying into bytes_to_protect_this_iteration just to copy again inside protect as it populates the grpc slice.
I think that ExternallyManagedSlice could be used instead of UnmanagedMemorySlice to save another copy during protect, but we can sort that out later.
There was a problem hiding this comment.
Good point. Yeah, it does not make sense to make bytes copy twice.
| // Short write. Exit. | ||
| if (raw_write_buffer_.length() > 0) { | ||
| prev_bytes_to_drain_ = | ||
| prev_bytes_to_drain_ > 0 ? prev_bytes_to_drain_ : bytes_to_drain_this_iteration; |
There was a problem hiding this comment.
I think that when prev_bytes_to_drain_ > 0 the following is true:
prev_bytes_to_drain_ == bytes_to_drain_this_iteration
this could be: prev_bytes_to_drain_ = bytes_to_drain_this_iteration;
A comment would be useful.
There was a problem hiding this comment.
Ah nice catch. We can simply do prev_bytes_to_drain_ = bytes_to_drain_this_iteration;
| ENVOY_CONN_LOG(debug, "TSI: raw_write length {} end_stream {}", callbacks_->connection(), | ||
| raw_write_buffer_.length(), end_stream); | ||
| result = raw_buffer_socket_->doWrite(raw_write_buffer_, end_stream && (buffer.length() == 0)); | ||
| total_bytes_written += result.bytes_processed_; |
There was a problem hiding this comment.
total_bytes_written should be tracking bytes_to_drain_this_iteration, not the encrypted bytes.
Updates to total_bytes_written should happen around the buffer.drain below and look something like:
total_bytes_written += bytes_to_drain_this_iteration;
| Network::IoResult result = {Network::PostIoAction::KeepOpen, 0, false}; | ||
|
|
||
| while (true) { | ||
| uint64_t bytes_to_drain_this_iteration = |
There was a problem hiding this comment.
There's an odd case that merits special consideration:
prev_bytes_to_drain_ == 0 && raw_write_buffer_.length() > 0
I think this can happen in the case where doHandshake adds bytes to raw_write_buffer_ but also completes the handshake. When this happens, the protect call below is skipped, but bytes_to_drain_this_iteration ends up being > 0, which could result in bytes in the input buffer being discarded without being sent.
Ways to detect:
ASSERT((prev_bytes_to_drain_ == 0) == (raw_write_buffer_.length() == 0);
ASSERT(prev_bytes_to_drain_ >= buffer.length());
ASSERT(buffer.length() >= bytes_to_drain_this_iteration) before the call to buffer.drain() further down
Possibly adding an ASSERT to OwnedImpl::drainImpl to detect attempts to drain more bytes than are in the buffer.
There was a problem hiding this comment.
Per this code, I think we have a guarantee that raw_write_buffer_.length() is 0 when entering doWrite() for the first time after handshake completes. Also, it does not make sense that a peer wants to send non-handshake data without first confirming if the handshake completes successfully. In other words, during handshake, peer A will send whatever it receives from peer B to the handshake service in order to get the bytes to send to peer B. Here, peer A will not concatenate any non-handshake data to the data received from peer B, and send them to the handshake service because peer A has not received any confirmation from the handshake service that handshake completes successfully.
There was a problem hiding this comment.
raw_write_buffer_ will be non-empty after the doWrite just after handshake completes if that doWrite results in a partial write. I know this may be really unlikely but it is possible for raw_write_buffer_ to be non-empty after handshake completes. I think it also true that the peer that completes the handshake first will need to do a write after handshake completes locally to provide the remote peer the information it needs to complete its own handshake.
There was a problem hiding this comment.
Possible solution: branch on raw_write_buffer_.length() > 0 instead of prev_bytes_to_drain_.
When raw_write_buffer_.length() > 0 then bytes_to_drain_this_iteration = prev_bytes_to_drain_ and we should attempt a write even if bytes_to_drain_this_iteration is 0 which would happen if the bytes in the buffer are handshake bytes.
When raw_write_buffer_.length() > 0 then bytes_to_drain_this_iteration = std::min(buffer.length(), max_unprotected_frame_size_). If >0, attempt to protect and write those bytes, else break.
There was a problem hiding this comment.
I took a slightly different approach by introducing a new field - prev_handshake_bytes_to_drain_ that indicates if we need to drain handshake data before doing regular protect+write operations. PTAL.
Signed-off-by: yihuaz <yihuaz@google.com>
cc09a39 to
b9d5778
Compare
Signed-off-by: yihuaz <yihuaz@google.com>
f2a42d0 to
a104741
Compare
| bytes_to_drain_this_iteration, tsi_result_to_string(status)); | ||
| } | ||
|
|
||
| // Write raw_write_buffer_ to nework. |
| Network::IoResult result = {Network::PostIoAction::KeepOpen, 0, false}; | ||
|
|
||
| while (true) { | ||
| uint64_t bytes_to_drain_this_iteration = |
There was a problem hiding this comment.
raw_write_buffer_ will be non-empty after the doWrite just after handshake completes if that doWrite results in a partial write. I know this may be really unlikely but it is possible for raw_write_buffer_ to be non-empty after handshake completes. I think it also true that the peer that completes the handshake first will need to do a write after handshake completes locally to provide the remote peer the information it needs to complete its own handshake.
| tsi_result status = frame_protector_->protect(buffer, raw_write_buffer_); | ||
| ENVOY_CONN_LOG(debug, "TSI: protected buffer left: {} result: {}", callbacks_->connection(), | ||
| buffer.length(), tsi_result_to_string(status)); | ||
| return repeatProtectAndWrite(buffer, end_stream); |
There was a problem hiding this comment.
I agree, but think that it could be moved to the "if (!handshake_complete_) {" branch.
| // Include 4 bytes frame message type and 16 bytes tag length. | ||
| // It is consistent with gRPC ALTS zero copy frame protector implementation. | ||
| static constexpr int FrameOverheadSize = 0; | ||
| static constexpr int FrameOverheadSize = 20; |
There was a problem hiding this comment.
The use of max_unprotected_frame_size_ and setMaxUnprotectedFrameSize seems to make testing for this overhead constant difficult. Or at least that's the impression I get from no tests needing update when changing this overhead value. max_unprotected_frame_size_ is read in a single location in repeatProtectAndWrite, consider changing that to "actual_frame_size_to_use_ - FrameOverheadSize"
There was a problem hiding this comment.
We can not use a hard-coded FrameOverheadSize within the code because in the test we used fake frame protector which has a different frame overhead size (i.e., 4 bytes). The code is updated. PTAL.
Signed-off-by: yihuaz <yihuaz@google.com>
4581f5a to
db79a59
Compare
| ? prev_handshake_bytes_to_drain_ | ||
| : (prev_bytes_to_drain_ > 0 | ||
| ? prev_bytes_to_drain_ | ||
| : std::min(buffer.length(), actual_frame_size_to_use_ - frame_overhead_size_)); |
There was a problem hiding this comment.
This expression seems very complex. Consider the following instead:
- ASSERT(prev_handshake_bytes_to_drain_ == 0) when entering repeatProtectAndWrite
- in TsiSocket::doWrite, check if prev_handshake_bytes_to_drain_ > 0 and if so attempt to flush the buffer and only call repeatProtectAndWrite if the flush was successful.
There was a problem hiding this comment.
Nice suggestion. Done.
| if (raw_write_buffer_.length() > 0) { | ||
| ENVOY_CONN_LOG(debug, "TSI: raw_write length {} end_stream {}", callbacks_->connection(), | ||
| raw_write_buffer_.length(), end_stream); | ||
| return raw_buffer_socket_->doWrite(raw_write_buffer_, end_stream && (buffer.length() == 0)); |
There was a problem hiding this comment.
No bytes from buffer were consumed so bytes written should be reported as being 0 regardless of what doWrite on the raw socket says.
Also, in cases where handshake completes successfully, it would be ideal to fall through and execute repeatProtectAndWrite
There was a problem hiding this comment.
Good point. Done.
Regarding fall through, I do not think it is going to happen because ALTS TSI is an async implementation which means in doHandshake(), it calls next() and immediately returns, which leaves handshake_complete_ being still false. When the response from handshake service comes back, handshake_complete_ will be set to true if handshake completes successfully.
There was a problem hiding this comment.
Can you add ASSERT(!handshake_complete_) after the doHandshake() call with an explanation?
Signed-off-by: yihuaz <yihuaz@google.com>
41e78c1 to
a4c3230
Compare
| } | ||
|
|
||
| Network::IoResult TsiSocket::doWrite(Buffer::Instance& buffer, bool end_stream) { | ||
| Network::IoResult result = {Network::PostIoAction::KeepOpen, 0, false}; |
There was a problem hiding this comment.
nit: result is never used without it being written by a raw_buffer_socket_->doWrite. Consider removing this local variable and instead declaring it when doWrite is called.
Network::IoResult result = raw_buffer_socket_->doWrite(...);
| if (raw_write_buffer_.length() > 0) { | ||
| ENVOY_CONN_LOG(debug, "TSI: raw_write length {} end_stream {}", callbacks_->connection(), | ||
| raw_write_buffer_.length(), end_stream); | ||
| return raw_buffer_socket_->doWrite(raw_write_buffer_, end_stream && (buffer.length() == 0)); |
There was a problem hiding this comment.
Can you add ASSERT(!handshake_complete_) after the doHandshake() call with an explanation?
| raw_write_buffer_.length(), end_stream); | ||
| return raw_buffer_socket_->doWrite(raw_write_buffer_, end_stream && (buffer.length() == 0)); | ||
| // Check if we need to flush outstanding handshake bytes. | ||
| if (prev_handshake_bytes_to_drain_ > 0) { |
There was a problem hiding this comment.
nit: There are no reads for the exact number of bytes in prev_handshake_bytes_to_drain_, just wherever or not it is > 0.
Consider using a boolean instead to indicate that raw_write_buffer_ contains handshake data that needs to be flushed.
| raw_write_buffer_.length(), end_stream); | ||
| return raw_buffer_socket_->doWrite(raw_write_buffer_, end_stream && (buffer.length() == 0)); | ||
| // Check if we need to flush outstanding handshake bytes. | ||
| if (prev_handshake_bytes_to_drain_ > 0) { |
There was a problem hiding this comment.
nit: ASSERT(raw_write_buffer_.length() > 0) since prev_handshake_bytes_to_drain_ > 0
| if (raw_write_buffer_.length() > 0) { | ||
| ENVOY_CONN_LOG(debug, "TSI: raw_write length {} end_stream {}", callbacks_->connection(), | ||
| raw_write_buffer_.length(), end_stream); | ||
| result = raw_buffer_socket_->doWrite(raw_write_buffer_, end_stream && (buffer.length() == 0)); |
There was a problem hiding this comment.
The coverage report shows that this branch is not covered by tests.
There was a problem hiding this comment.
Ah nice finding! It turns out this code path should not be triggered at all. Because in an async TSI, the buffer to be sent to peer (i.e., raw_write_buffer_) should be filled by the handshake service and sent to the peer in the callback function (i.e., doHandshakeNextDone) upon finishing the process of handshake request. After scheduling a handshake request, we do not know if raw_write_buffer_ is ready to be sent to its peer so should not attempt the network write.
| EXPECT_CALL(*server_.raw_socket_, doWrite(_, false)) | ||
| .WillOnce(Invoke([&](Buffer::Instance& buffer, bool) { | ||
| Network::IoResult result = {Network::PostIoAction::KeepOpen, 4, false}; | ||
| server_to_client_.add(buffer.linearize(0), 4); |
There was a problem hiding this comment.
I don't think 0 is a valid argument to linearize. I think you need linearize(4) to match the number of bytes you're adding.
Alternatively, consider:
server_to_client_.move(buffer, 4);
which is a nicer version of add(..., 4) followed by drain(4)
There was a problem hiding this comment.
Nice suggestion. The code is much simpler now.
Signed-off-by: yihuaz <yihuaz@google.com>
|
Sorry for the late response. All comments are addressed. PTAL. |
| // Envoy ALTS implements asynchronous tsi_handshaker_next() interface | ||
| // which returns immediately after scheduling a handshake request to | ||
| // the handshake service. The handshake response will be handled by a | ||
| // dedicated thread in a seperate API within which handshake_complete_ |
There was a problem hiding this comment.
Failing format checks due to spell error.
seperate -> separate
Signed-off-by: yihuaz <yihuaz@google.com>
|
Thanks @antoniovicente for the detailed review! I am wondering if there is anything else remaining to be addressed before this PR gets merged? |
Signed-off-by: yihuaz <yihuaz@google.com> Signed-off-by: Gokul Nair <gnair@twitter.com>
Signed-off-by: yihuaz <yihuaz@google.com> Signed-off-by: Gokul Nair <gnair@twitter.com>
This PR resolves the third issue described in #15296.