alts: Fix TsiSocket doWrite on short writes by yihuazhang · Pull Request #15962 · envoyproxy/envoy

yihuazhang · 2021-04-13T23:30:38Z

This PR resolves the third issue described in #15296.

Signed-off-by: yihuaz <yihuaz@google.com>

yihuazhang · 2021-04-13T23:32:41Z

antoniovicente

Many thanks for taking on improvements to how alts TsiSocket::doWrite interacts with buffer watermarks. First round of comments to improve my understanding of what the code is doing and some potential edge cases that I noticed.

antoniovicente · 2021-04-14T00:01:15Z

-    tsi_result status = frame_protector_->protect(buffer, raw_write_buffer_);
-    ENVOY_CONN_LOG(debug, "TSI: protected buffer left: {} result: {}", callbacks_->connection(),
-                   buffer.length(), tsi_result_to_string(status));
+    return repeatProtectAndWrite(buffer, end_stream);


What about the raw_write_buffer_.length() > 0 code below this early return?

I think we still need it to send handshake data to its peer.

I agree, but think that it could be moved to the "if (!handshake_complete_) {" branch.

antoniovicente · 2021-04-14T00:05:45Z

+  while (true) {
+    uint64_t bytes_to_drain_this_iteration =
+        prev_bytes_to_drain_ > 0 ? prev_bytes_to_drain_
+                                 : std::min(buffer.length(), actual_frame_size_to_use_);


Does actual_frame_size_to_use_ include framing overhead or not?

We need bytes excluding overhead in order to emit a single protected frame instead of a large frame followed by a tiny frame.

It includes the framing overhead. You are right. When computing bytes_to_drain_this_iteration, we should exclude the overhead from actual_frame_size_to_use_.

antoniovicente · 2021-04-14T00:12:00Z

+      ENVOY_CONN_LOG(debug, "TSI: protecting buffer size: {}", callbacks_->connection(),
+                     bytes_to_protect_this_iteration.length());
+      tsi_result status =
+          frame_protector_->protect(bytes_to_protect_this_iteration, raw_write_buffer_);


Consider changing the protect method to accept a string_view with the input bytes to avoid copying into bytes_to_protect_this_iteration just to copy again inside protect as it populates the grpc slice.

I think that ExternallyManagedSlice could be used instead of UnmanagedMemorySlice to save another copy during protect, but we can sort that out later.

Good point. Yeah, it does not make sense to make bytes copy twice.

antoniovicente · 2021-04-14T00:22:48Z

+    // Short write. Exit.
+    if (raw_write_buffer_.length() > 0) {
+      prev_bytes_to_drain_ =
+          prev_bytes_to_drain_ > 0 ? prev_bytes_to_drain_ : bytes_to_drain_this_iteration;


I think that when prev_bytes_to_drain_ > 0 the following is true:
prev_bytes_to_drain_ == bytes_to_drain_this_iteration

this could be: prev_bytes_to_drain_ = bytes_to_drain_this_iteration;

A comment would be useful.

Ah nice catch. We can simply do prev_bytes_to_drain_ = bytes_to_drain_this_iteration;

antoniovicente · 2021-04-14T00:24:03Z

+    ENVOY_CONN_LOG(debug, "TSI: raw_write length {} end_stream {}", callbacks_->connection(),
+                   raw_write_buffer_.length(), end_stream);
+    result = raw_buffer_socket_->doWrite(raw_write_buffer_, end_stream && (buffer.length() == 0));
+    total_bytes_written += result.bytes_processed_;


total_bytes_written should be tracking bytes_to_drain_this_iteration, not the encrypted bytes.

Updates to total_bytes_written should happen around the buffer.drain below and look something like:
total_bytes_written += bytes_to_drain_this_iteration;

Acknowledged.

antoniovicente · 2021-04-14T00:27:19Z

+  Network::IoResult result = {Network::PostIoAction::KeepOpen, 0, false};
+
+  while (true) {
+    uint64_t bytes_to_drain_this_iteration =


There's an odd case that merits special consideration:

prev_bytes_to_drain_ == 0 && raw_write_buffer_.length() > 0

I think this can happen in the case where doHandshake adds bytes to raw_write_buffer_ but also completes the handshake. When this happens, the protect call below is skipped, but bytes_to_drain_this_iteration ends up being > 0, which could result in bytes in the input buffer being discarded without being sent.

Ways to detect:
ASSERT((prev_bytes_to_drain_ == 0) == (raw_write_buffer_.length() == 0);
ASSERT(prev_bytes_to_drain_ >= buffer.length());
ASSERT(buffer.length() >= bytes_to_drain_this_iteration) before the call to buffer.drain() further down
Possibly adding an ASSERT to OwnedImpl::drainImpl to detect attempts to drain more bytes than are in the buffer.

Per this code, I think we have a guarantee that raw_write_buffer_.length() is 0 when entering doWrite() for the first time after handshake completes. Also, it does not make sense that a peer wants to send non-handshake data without first confirming if the handshake completes successfully. In other words, during handshake, peer A will send whatever it receives from peer B to the handshake service in order to get the bytes to send to peer B. Here, peer A will not concatenate any non-handshake data to the data received from peer B, and send them to the handshake service because peer A has not received any confirmation from the handshake service that handshake completes successfully.

raw_write_buffer_ will be non-empty after the doWrite just after handshake completes if that doWrite results in a partial write. I know this may be really unlikely but it is possible for raw_write_buffer_ to be non-empty after handshake completes. I think it also true that the peer that completes the handshake first will need to do a write after handshake completes locally to provide the remote peer the information it needs to complete its own handshake.

Possible solution: branch on raw_write_buffer_.length() > 0 instead of prev_bytes_to_drain_.

When raw_write_buffer_.length() > 0 then bytes_to_drain_this_iteration = prev_bytes_to_drain_ and we should attempt a write even if bytes_to_drain_this_iteration is 0 which would happen if the bytes in the buffer are handshake bytes.

When raw_write_buffer_.length() > 0 then bytes_to_drain_this_iteration = std::min(buffer.length(), max_unprotected_frame_size_). If >0, attempt to protect and write those bytes, else break.

I took a slightly different approach by introducing a new field - prev_handshake_bytes_to_drain_ that indicates if we need to drain handshake data before doing regular protect+write operations. PTAL.

Signed-off-by: yihuaz <yihuaz@google.com>

antoniovicente · 2021-04-16T01:20:03Z

+                     bytes_to_drain_this_iteration, tsi_result_to_string(status));
+    }
+
+    // Write raw_write_buffer_ to nework.


spelling: nework

antoniovicente · 2021-04-16T01:25:24Z

+  Network::IoResult result = {Network::PostIoAction::KeepOpen, 0, false};
+
+  while (true) {
+    uint64_t bytes_to_drain_this_iteration =


raw_write_buffer_ will be non-empty after the doWrite just after handshake completes if that doWrite results in a partial write. I know this may be really unlikely but it is possible for raw_write_buffer_ to be non-empty after handshake completes. I think it also true that the peer that completes the handshake first will need to do a write after handshake completes locally to provide the remote peer the information it needs to complete its own handshake.

antoniovicente · 2021-04-16T01:29:16Z

-    tsi_result status = frame_protector_->protect(buffer, raw_write_buffer_);
-    ENVOY_CONN_LOG(debug, "TSI: protected buffer left: {} result: {}", callbacks_->connection(),
-                   buffer.length(), tsi_result_to_string(status));
+    return repeatProtectAndWrite(buffer, end_stream);


I agree, but think that it could be moved to the "if (!handshake_complete_) {" branch.

antoniovicente · 2021-04-16T01:40:29Z

 // Include 4 bytes frame message type and 16 bytes tag length.
 // It is consistent with gRPC ALTS zero copy frame protector implementation.
-static constexpr int FrameOverheadSize = 0;
+static constexpr int FrameOverheadSize = 20;


The use of max_unprotected_frame_size_ and setMaxUnprotectedFrameSize seems to make testing for this overhead constant difficult. Or at least that's the impression I get from no tests needing update when changing this overhead value. max_unprotected_frame_size_ is read in a single location in repeatProtectAndWrite, consider changing that to "actual_frame_size_to_use_ - FrameOverheadSize"

We can not use a hard-coded FrameOverheadSize within the code because in the test we used fake frame protector which has a different frame overhead size (i.e., 4 bytes). The code is updated. PTAL.

Signed-off-by: yihuaz <yihuaz@google.com>

antoniovicente · 2021-04-21T23:44:46Z

+            ? prev_handshake_bytes_to_drain_
+            : (prev_bytes_to_drain_ > 0
+                   ? prev_bytes_to_drain_
+                   : std::min(buffer.length(), actual_frame_size_to_use_ - frame_overhead_size_));


This expression seems very complex. Consider the following instead:

ASSERT(prev_handshake_bytes_to_drain_ == 0) when entering repeatProtectAndWrite

in TsiSocket::doWrite, check if prev_handshake_bytes_to_drain_ > 0 and if so attempt to flush the buffer and only call repeatProtectAndWrite if the flush was successful.

Nice suggestion. Done.

antoniovicente · 2021-04-21T23:48:15Z

+    if (raw_write_buffer_.length() > 0) {
+      ENVOY_CONN_LOG(debug, "TSI: raw_write length {} end_stream {}", callbacks_->connection(),
+                     raw_write_buffer_.length(), end_stream);
+      return raw_buffer_socket_->doWrite(raw_write_buffer_, end_stream && (buffer.length() == 0));


No bytes from buffer were consumed so bytes written should be reported as being 0 regardless of what doWrite on the raw socket says.

Also, in cases where handshake completes successfully, it would be ideal to fall through and execute repeatProtectAndWrite

Good point. Done.

Regarding fall through, I do not think it is going to happen because ALTS TSI is an async implementation which means in doHandshake(), it calls next() and immediately returns, which leaves handshake_complete_ being still false. When the response from handshake service comes back, handshake_complete_ will be set to true if handshake completes successfully.

Can you add ASSERT(!handshake_complete_) after the doHandshake() call with an explanation?

Signed-off-by: yihuaz <yihuaz@google.com>

antoniovicente · 2021-04-26T22:14:09Z

 }

 Network::IoResult TsiSocket::doWrite(Buffer::Instance& buffer, bool end_stream) {
+  Network::IoResult result = {Network::PostIoAction::KeepOpen, 0, false};


nit: result is never used without it being written by a raw_buffer_socket_->doWrite. Consider removing this local variable and instead declaring it when doWrite is called.

Network::IoResult result = raw_buffer_socket_->doWrite(...);

antoniovicente · 2021-04-26T22:16:32Z

+    if (raw_write_buffer_.length() > 0) {
+      ENVOY_CONN_LOG(debug, "TSI: raw_write length {} end_stream {}", callbacks_->connection(),
+                     raw_write_buffer_.length(), end_stream);
+      return raw_buffer_socket_->doWrite(raw_write_buffer_, end_stream && (buffer.length() == 0));


Can you add ASSERT(!handshake_complete_) after the doHandshake() call with an explanation?

antoniovicente · 2021-04-26T22:19:59Z

-                   raw_write_buffer_.length(), end_stream);
-    return raw_buffer_socket_->doWrite(raw_write_buffer_, end_stream && (buffer.length() == 0));
+    // Check if we need to flush outstanding handshake bytes.
+    if (prev_handshake_bytes_to_drain_ > 0) {


nit: There are no reads for the exact number of bytes in prev_handshake_bytes_to_drain_, just wherever or not it is > 0.

Consider using a boolean instead to indicate that raw_write_buffer_ contains handshake data that needs to be flushed.

antoniovicente · 2021-04-26T22:21:05Z

-                   raw_write_buffer_.length(), end_stream);
-    return raw_buffer_socket_->doWrite(raw_write_buffer_, end_stream && (buffer.length() == 0));
+    // Check if we need to flush outstanding handshake bytes.
+    if (prev_handshake_bytes_to_drain_ > 0) {


nit: ASSERT(raw_write_buffer_.length() > 0) since prev_handshake_bytes_to_drain_ > 0

antoniovicente · 2021-04-26T22:33:44Z

+    if (raw_write_buffer_.length() > 0) {
+      ENVOY_CONN_LOG(debug, "TSI: raw_write length {} end_stream {}", callbacks_->connection(),
+                     raw_write_buffer_.length(), end_stream);
+      result = raw_buffer_socket_->doWrite(raw_write_buffer_, end_stream && (buffer.length() == 0));


The coverage report shows that this branch is not covered by tests.

Ah nice finding! It turns out this code path should not be triggered at all. Because in an async TSI, the buffer to be sent to peer (i.e., raw_write_buffer_) should be filled by the handshake service and sent to the peer in the callback function (i.e., doHandshakeNextDone) upon finishing the process of handshake request. After scheduling a handshake request, we do not know if raw_write_buffer_ is ready to be sent to its peer so should not attempt the network write.

antoniovicente · 2021-04-26T22:38:19Z

+  EXPECT_CALL(*server_.raw_socket_, doWrite(_, false))
+      .WillOnce(Invoke([&](Buffer::Instance& buffer, bool) {
+        Network::IoResult result = {Network::PostIoAction::KeepOpen, 4, false};
+        server_to_client_.add(buffer.linearize(0), 4);


I don't think 0 is a valid argument to linearize. I think you need linearize(4) to match the number of bytes you're adding.

Alternatively, consider:
server_to_client_.move(buffer, 4);

which is a nicer version of add(..., 4) followed by drain(4)

Nice suggestion. The code is much simpler now.

Signed-off-by: yihuaz <yihuaz@google.com>

yihuazhang · 2021-05-02T18:52:44Z

Sorry for the late response. All comments are addressed. PTAL.

antoniovicente · 2021-05-03T21:25:28Z

+    // Envoy ALTS implements asynchronous tsi_handshaker_next() interface
+    // which returns immediately after scheduling a handshake request to
+    // the handshake service. The handshake response will be handled by a
+    // dedicated thread in a seperate API within which handshake_complete_


Failing format checks due to spell error.

seperate -> separate

Signed-off-by: yihuaz <yihuaz@google.com>

antoniovicente

Thanks!

yihuazhang · 2021-05-04T17:43:16Z

Thanks @antoniovicente for the detailed review! I am wondering if there is anything else remaining to be addressed before this PR gets merged?

Signed-off-by: yihuaz <yihuaz@google.com> Signed-off-by: Gokul Nair <gnair@twitter.com>

yihuazhang requested review from antoniovicente and asraa as code owners April 13, 2021 23:30

fix TsiSocket doWrite on short writes

1647c44

Signed-off-by: yihuaz <yihuaz@google.com>

yihuazhang force-pushed the fix_tsi_write branch from 21ac793 to 1647c44 Compare April 13, 2021 23:31

antoniovicente self-assigned this Apr 13, 2021

antoniovicente reviewed Apr 14, 2021

View reviewed changes

antoniovicente added the waiting label Apr 14, 2021

repokitteh-read-only Bot removed the waiting label Apr 15, 2021

Address 1st round of comments

b9d5778

Signed-off-by: yihuaz <yihuaz@google.com>

yihuazhang force-pushed the fix_tsi_write branch from cc09a39 to b9d5778 Compare April 15, 2021 23:37

fix incorrect overhead size

a104741

Signed-off-by: yihuaz <yihuaz@google.com>

yihuazhang force-pushed the fix_tsi_write branch from f2a42d0 to a104741 Compare April 15, 2021 23:41

antoniovicente reviewed Apr 16, 2021

View reviewed changes

antoniovicente added the waiting label Apr 16, 2021

address 2nd round of comments.

db79a59

Signed-off-by: yihuaz <yihuaz@google.com>

repokitteh-read-only Bot removed the waiting label Apr 21, 2021

yihuazhang force-pushed the fix_tsi_write branch from 4581f5a to db79a59 Compare April 21, 2021 01:57

antoniovicente reviewed Apr 21, 2021

View reviewed changes

Address 3rd round of comments

a4c3230

Signed-off-by: yihuaz <yihuaz@google.com>

yihuazhang force-pushed the fix_tsi_write branch from 41e78c1 to a4c3230 Compare April 22, 2021 18:16

antoniovicente reviewed Apr 26, 2021

View reviewed changes

antoniovicente added the waiting label Apr 26, 2021

repokitteh-read-only Bot removed the waiting label May 2, 2021

Address 4th round of comments

31e5882

Signed-off-by: yihuaz <yihuaz@google.com>

yihuazhang force-pushed the fix_tsi_write branch from d7e31bc to 31e5882 Compare May 2, 2021 18:42

antoniovicente reviewed May 3, 2021

View reviewed changes

antoniovicente added the waiting label May 3, 2021

repokitteh-read-only Bot removed the waiting label May 3, 2021

fix spell check

9e02cc8

Signed-off-by: yihuaz <yihuaz@google.com>

yihuazhang force-pushed the fix_tsi_write branch from fbf0976 to 9e02cc8 Compare May 3, 2021 22:08

antoniovicente approved these changes May 4, 2021

View reviewed changes

antoniovicente merged commit 595bfcf into envoyproxy:main May 4, 2021

gokulnair pushed a commit to gokulnair/envoy that referenced this pull request May 5, 2021

alts: Fix TsiSocket doWrite on short writes (envoyproxy#15962)

e52fe86

Signed-off-by: yihuaz <yihuaz@google.com> Signed-off-by: Gokul Nair <gnair@twitter.com>

gokulnair pushed a commit to gokulnair/envoy that referenced this pull request May 6, 2021

alts: Fix TsiSocket doWrite on short writes (envoyproxy#15962)

89ed219

Signed-off-by: yihuaz <yihuaz@google.com> Signed-off-by: Gokul Nair <gnair@twitter.com>

antoniovicente mentioned this pull request May 17, 2021

alts: Fix TsiSocket doWrite on outstanding handshake bytes handling #16488

Merged

Conversation

yihuazhang commented Apr 13, 2021

Uh oh!

yihuazhang commented Apr 13, 2021

Uh oh!

antoniovicente left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!