Ignore read and write errors if vio has been cleared#1522
Ignore read and write errors if vio has been cleared#1522bryancall merged 1 commit intoapache:masterfrom
Conversation
|
Here is a workaround for issue #1401. I ran into issues with the read also coring. |
|
FreeBSD build successful! See https://ci.trafficserver.apache.org/job/freebsd-github/1660/ for details. |
|
Linux build successful! See https://ci.trafficserver.apache.org/job/linux-github/1556/ for details. |
|
Intel CC build successful! See https://ci.trafficserver.apache.org/job/icc-github/92/ for details. |
|
clang-analyzer build successful! See https://ci.trafficserver.apache.org/job/clang-analyzer-github/224/ for details. |
|
Looks reasonable to me. |
|
Testing this on docs, it's similar to #1444 which did not solve the problems for us there (completely at least), hopefully the additions to the read case helps. |
|
Still looking good on Docs, but would like to give it at least another 24 hours before we declare victory. |
|
to review #947 again, my suggestion is Before PR#947:
The target of PR#947 is making iocore to notify EPOLLERR to SM even read.enabled is set to 0. the vc->net_read_io is designed to callback SM and Net sub-system must assure the vio can be callback. |
|
or we can use: keep old condition then add new condition. |
|
I think, it #never calls handler with read.enabled = 0(or write.enabled). It means the handler don not want this type of event .It may never handle this event and just assert! |
|
@oknet Thank for the suggestion I am running in 7.1.0 in production with the change you mention above instead of this PR |
|
FreeBSD build failed! See https://ci.trafficserver.apache.org/job/freebsd-github/1680/ for details. |
|
I updated the PR to use oknet's recommendation. I am going to test a another version of this fix to be: |
|
Linux build failed! See https://ci.trafficserver.apache.org/job/linux-github/1576/ for details. |
|
Given the mixing of || and &&, I'd highly suggest superfluous parentheses for clarity. nm... looks like this was done. |
|
Intel CC build successful! See https://ci.trafficserver.apache.org/job/icc-github/112/ for details. |
|
Linux build successful! See https://ci.trafficserver.apache.org/job/linux-github/1577/ for details. |
|
Intel CC build successful! See https://ci.trafficserver.apache.org/job/icc-github/113/ for details. |
|
FreeBSD build successful! See https://ci.trafficserver.apache.org/job/freebsd-github/1681/ for details. |
|
Actually, there is still a problem in my test after this patch according to jtest. We got the error event and callback to SM, but SM do not want to handle this write event , because write.enabled is 0, then SM assert ! The backtrace is the same as #1531 . Here is 6.x.x |
recommendation from Oknet
|
@scw00 What are |
|
FreeBSD build successful! See https://ci.trafficserver.apache.org/job/freebsd-github/1684/ for details. |
|
|
Linux build successful! See https://ci.trafficserver.apache.org/job/linux-github/1580/ for details. |
|
Intel CC build successful! See https://ci.trafficserver.apache.org/job/icc-github/116/ for details. |
|
@bryancall @scw00 after #947 , We must change the SM to handle EVENT_ERROR event just like EVENT_TIMEOUT. before #947 , an ERROR indicate a VIO error. after #947 , an ERROR indicate a VC error. And we should combine vc->read.error & vc->write.error to one event if the vc->read._cont and vc->write._cont point to same SM. It is just like the EVENT_TIMEOUT callback that we do it in UnixNetVC::mainEvent(). |
|
clang-analyzer build successful! See https://ci.trafficserver.apache.org/job/clang-analyzer-github/248/ for details. |
|
I think the issue raised by @scw00 is different. It looks like what @zwoop identified in issue #1531. In that case the write vio errored, so the write vio is being sent up as data to a handler expecting only a read vio. In the error case it shouldn't matter whether it is a read or a write vio and the error clean up should occur regardless. |
|
I ran my production box overnight with the latest work around. I think this change is a step in the right direction and we should merge it and bring it back to 7.1. |
|
Linux build successful! See https://ci.trafficserver.apache.org/job/RAT-github/2/ for details. |
|
Agree!! |
After apache#947 (c1ac5f) and apache#1522 (a128d5) , the EVENT_ERROR leads by EPOLLERR will be sent to read.vio._cont first and then write.vio._cont. The reader SM could close or shutdown(WRITE) the VC, but we do not check these operations before we callback write.vio._cont. The SM would received EVENT_ERROR twice if write.vio._cont == read.vio._cont.
After apache#947 (c1ac5f) and apache#1522 (a128d5) , the EVENT_ERROR leads by EPOLLERR will be sent to read.vio._cont first and then write.vio._cont. The reader SM could close or shutdown(WRITE) the VC, but we do not check these operations before we callback write.vio._cont. The SM would received EVENT_ERROR twice if write.vio._cont == read.vio._cont.
After apache#947 (c1ac5f) and apache#1522 (a128d5) , the EVENT_ERROR leads by EPOLLERR will be sent to read.vio._cont first and then write.vio._cont. The reader SM could close or shutdown(WRITE) the VC, but we do not check these operations before we callback write.vio._cont. The SM would received EVENT_ERROR twice if write.vio._cont == read.vio._cont.
After apache#947 (c1ac5f) and apache#1522 (a128d5) , the EVENT_ERROR which caused by EPOLLERR will be sent to read.vio._cont first and then write.vio._cont. The reader SM could close or shutdown(WRITE) the VC, but we do not check these operations before we callback write.vio._cont. The SM would received EVENT_ERROR twice if write.vio._cont == read.vio._cont.
After #947 (c1ac5f) and #1522 (a128d5) , the EVENT_ERROR which caused by EPOLLERR will be sent to read.vio._cont first and then write.vio._cont. The reader SM could close or shutdown(WRITE) the VC, but we do not check these operations before we callback write.vio._cont. The SM would received EVENT_ERROR twice if write.vio._cont == read.vio._cont.
After #947 (c1ac5f) and #1522 (a128d5) , the EVENT_ERROR which caused by EPOLLERR will be sent to read.vio._cont first and then write.vio._cont. The reader SM could close or shutdown(WRITE) the VC, but we do not check these operations before we callback write.vio._cont. The SM would received EVENT_ERROR twice if write.vio._cont == read.vio._cont. (cherry picked from commit aee3f3b)
…s to the VConnection" this reverts PRs apache#1559, apache#1522 and apache#947 This reverts commit c1ac5f8.
This reverts PRs apache#1559, apache#1522 and apache#947 PR apache#947 made the HTTP state machine unstable and lead to crashes in production like apache#1930 apache#1559 apache#1522 apache#1531 apache#1629 This reverts commit c1ac5f8.
No description provided.