Skip to content

Reset inactivity timeout in SSL read loop.#2242

Closed
shinrich wants to merge 190 commits intoapache:masterfrom
shinrich:post_no_activity
Closed

Reset inactivity timeout in SSL read loop.#2242
shinrich wants to merge 190 commits intoapache:masterfrom
shinrich:post_no_activity

Conversation

@shinrich
Copy link
Copy Markdown
Member

@shinrich shinrich commented Jul 6, 2017

Reset inactivity timeout after every successful SSL_read. Hopefully avoids timeouts during POST.

zwoop and others added 30 commits January 13, 2017 20:08
BIO and BIO_METHOD were made opaque in OpenSSL 1.1 [1].

[1] https://www.openssl.org/news/changelog#x4

(cherry picked from commit 742f25c)
time_t may or may not be equivalent to long,
it's up to the implementation.

Fixes apache#1311

(cherry picked from commit 8ace435)
Change-Id: I45c9cdbe305e8b3d186beafb87ab5f8c5b03b757
(cherry picked from commit 6e6607a)
Modify TSHttpTxnHookAdd to add hook only once.

Change-Id: I737ca666cc05f2e2e2006cbef7f6b5a68b0eb71f
(cherry picked from commit 78bd265)
(cherry picked from commit 9b8b25c)
…nal timeout event to SM

(cherry picked from commit 977e27d)
complexity of ::size is not O(1).

(cherry picked from commit 87ca6b4)
Fixes apache#1307

(cherry picked from commit d1f44c3)
This reverts commit 9e37494.

(cherry picked from commit 11bfc4b)
This reverts commit f3e37f8.

(cherry picked from commit ffc906a)
This reverts commit ac0db84.

(cherry picked from commit 97723e7)
init_customizations() was added in Docutils 0.13.

(cherry picked from commit 111e7f5)
We can use it in 0.12 and 0.13.

(cherry picked from commit 977aa2c)
This will work better with the various OpenSSL forks.

(cherry picked from commit 40310af)
maskit and others added 14 commits June 26, 2017 19:02
HPACK Huffman code didn't care whether all padding bits are one.
This is prohibited by the spec(Sec 5.2), and it must be treated as a decode error.

(cherry picked from commit b228c13)
…st transform start.

The request transform would be applied to POST and/or PUT request.
The server_vc should be established (writeable) before request transform start.
The CheckConnect is created by connect_s,
  It will callback NET_EVENT_OPEN to HttpSM if server_vc is WRITE_READY,
  Otherwise NET_EVENT_OPEN_FAILED is callbacked.

(cherry picked from commit 91724db)
(cherry picked from commit 55e576b)

 Conflicts:
	tests/gold_tests/body_factory/gold/http-get-304.gold
Based on prior patch TS-3109.

(cherry picked from commit adf906d)
@shinrich shinrich added the TLS label Jul 6, 2017
@shinrich shinrich added this to the 7.1.0 milestone Jul 6, 2017
@shinrich shinrich self-assigned this Jul 6, 2017
@realpdm
Copy link
Copy Markdown

realpdm commented Jul 6, 2017

I applied this patch and still see timeouts happening that match the timeout timer, even though data is flowing up with POST

@shinrich
Copy link
Copy Markdown
Member Author

shinrich commented Jul 6, 2017

I am able to reproduce the effect that @realpdm is seeing by setting up a remap rule to httpbin

map https://mytestmachine.com/post http://httpbin.org/post

I set the timeouts to 3 and 4 seconds.

CONFIG proxy.config.http.transaction_no_activity_timeout_in INT 3
CONFIG proxy.config.http.transaction_no_activity_timeout_out INT 4

Then I issue a curl request with a 10MB post body.

curl -H 'Expect:' -v  -k --data-binary @trafficserver-TS-4180.zip https://mytestmachine.com/post -o /tmp/out

Curl returns "curl: (52) Empty reply from server" and I see ERR_CLIENT_ABORT in our access log.

I added warning messages when we update the next_inactivity_timeout_at in the ssl_read_from_net and when we trigger the inactivity event. The logs show lots and lots of timeout updates (1 every ms or so). Then 3 seconds and the timeout event is sent.

Then I did a tcpdump on the client side and server side. It shows that both sides are busy for quite a while uploading the post body. Then evidently the origin thinks about things for 3-4 seconds. Then ATS shuts down the connection to the client.

Is it possible that your origin is just a bit overwhelmed with the large upload and is taking a couple seconds to process before responding?

@realpdm
Copy link
Copy Markdown

realpdm commented Jul 6, 2017

I don't think it is related to the origin because I can make it fail at particular times by adjusting the timeout. Ie, with the same file. It seems to be a total time on the transaction. If a file is small enough to upload within the timer, it works fine. If it is large enough it will fail at the value the timer is set to as the total transaction time (ttms).

Example, I upload a 256MB file and set timer to 30s it will fail at
"ttms":34745 with ERR_CLIENT_ABORT

Change timer to 5 seconds..
"ttms":9554 with ERR_CLIENT_ABORT

Change timer to 2 seconds..

"ttms":6411 with same error

@maskit
Copy link
Copy Markdown
Member

maskit commented Jul 7, 2017

I thought moving netActivity() into the loop is sufficient but it wasn't

Even if we update next_inactivity_timeout_at in the loop, we are still in the loop. This means that read data from the client isn't send to the origin server at all. Also, the inactivity timeout will be canceled when ATS has received a response header from the origin server.
https://github.com/apache/trafficserver/blob/7.0.x/proxy/http/HttpSM.cc#L1856

Therefore, the inactive timeout is still active while ATS is sending data to the origin server (red arrow in the figure), and it won't be done within transaction_no_active_timeout_in.

image-44

So, I think we have to get out of the loop at times, and should send the received data to the origin server, and probably the timeout should be canceled when ATS received the last part of POST data.

@maskit
Copy link
Copy Markdown
Member

maskit commented Jul 7, 2017

@realpdm Can you try #2247?

@shinrich
Copy link
Copy Markdown
Member Author

shinrich commented Jul 7, 2017

@maskit I see your point. If the client is super aggressive and never leaves the loop, we will suck the entire 10MB post body into a buffer (which is bad on several levels). That isn't what I was seeing from my packet traces for a curl off-box.

@shinrich shinrich force-pushed the post_no_activity branch from 2c8075e to 44029f4 Compare July 7, 2017 14:19
@shinrich
Copy link
Copy Markdown
Member Author

shinrich commented Jul 7, 2017

Bleah. Tried to update with my latest experiment. Forgot that I was working against 7.1.x. Close and will reopen.

@shinrich shinrich closed this Jul 7, 2017
@zwoop zwoop removed this from the 7.1.0 milestone Jul 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.