-
Notifications
You must be signed in to change notification settings - Fork 963
Simplified the logic for ForceWriteThread after we introduced queue.drainTo() #3830
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| // Sync and mark the journal up to the position of the last entry in the batch | ||
| ForceWriteRequest lastRequest = localRequests.get(requestsCount - 1); | ||
| syncJournal(lastRequest); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that there have two log files in the batch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
If the localRequests queue contains multiple journal files and we only sync the lastRequest's journal file, other journal files will skip sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's possible though the Journal thread would have already closed the previous file, so we don't need to either fsync or close it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, you're correct. We need to ensure all the files are closed before the response is triggered. Fixed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Original logic will execute forceWrite before close, which will run bestEffortRemoveFromPageCache. If there have two different journal files in the batch, we only force write for the last file, do we need to do the force write for the another journal file? Does the close will do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
It has two bugs:
- Those non-last journal files in the batch won't be removed from OS PageCache
- That non-last journal files channel in the batch will just call
closeinstead offore(false), which can't ensure the data is flushed to the disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good points. I'll fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually 100% sure that close() implied an fsync, but that's not really the case.
|
Please rebase the master after #3836 is merged to trigger the CI. |
Codecov Report
@@ Coverage Diff @@
## master #3830 +/- ##
=========================================
Coverage 68.21% 68.22%
+ Complexity 6761 6751 -10
=========================================
Files 473 473
Lines 40950 40889 -61
Branches 5240 5229 -11
=========================================
- Hits 27935 27896 -39
+ Misses 10762 10734 -28
- Partials 2253 2259 +6
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
zymap
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
### Motivation Note: this is stacked on top of #3830 & #3835 This change improves the way the AddRequests responses are send to client. The current flow is: * The journal-force-thread issues the fsync on the journal file * We iterate over all the entries that were just synced and for each of them: 1. Trigger channel.writeAndFlus() 2. This will jump on the connection IO thread (Netty will use a `write()` to `eventfd` to post the task and wake the epoll) 3. Write the object in the connection and trigger the serialization logic 4. Grab a `ByteBuf` from the pool and write ~20 bytes with the response 5. Write and flush the buffer on the channel 6. With the flush consolidator we try to group multiple buffer into a single `writev()` syscall, though each call will have a long list of buffer, making the memcpy inefficient. 7. Release all the buffers and return them to the pool All these steps are quite expensive when the bookie is receiving a lot of small requests. This PR changes the flow into: 1. journal fsync 2. go through each request and prepare the response into a per-connection `ByteBuf` which is not written on the channel as of yet 3. after preparing all the responses, we flush them at once: Trigger an event on all the connections that will write the accumulated buffers. The advantages are: 1. 1 ByteBuf allocated per connection instead of 1 per request 1. Less allocations and stress of buffer pool 2. More efficient socket write() operations 3. 1 task per connection posted on the Netty IO threads, instead of 1 per request.
…rainTo() (apache#3830) ### Motivation In apache#3545 we have switched the `ForceWriteThread` to take advantage o `BlockingQueue.drainTo()` method for reducing contention, though the core logic of the force-write was not touched at the time. The logic of force-write is quite complicated because it tries to group multiple force-write requests in the queue by sending a new marker and grouping them when the marker is received. This also leads to a bit of lag when there are many requests coming in and the IO is stressed, as we're waiting a bit more before issuing the fsync. Instead, with the `drainTo()` approach we can greatly simplify the logic and maintain a strict fsync grouping: 1. drain all the force-write-requests available in the queue into a local array list 2. perform the fsync 3. update the journal log mark to the position of the last fw request 4. trigger send-responses for all the requests 5. go back to read from the queue This refactoring will also enable further improvements, to optimize how the send responses are prepared, since we have now a list of responses ready to send.
### Motivation Note: this is stacked on top of apache#3830 & apache#3835 This change improves the way the AddRequests responses are send to client. The current flow is: * The journal-force-thread issues the fsync on the journal file * We iterate over all the entries that were just synced and for each of them: 1. Trigger channel.writeAndFlus() 2. This will jump on the connection IO thread (Netty will use a `write()` to `eventfd` to post the task and wake the epoll) 3. Write the object in the connection and trigger the serialization logic 4. Grab a `ByteBuf` from the pool and write ~20 bytes with the response 5. Write and flush the buffer on the channel 6. With the flush consolidator we try to group multiple buffer into a single `writev()` syscall, though each call will have a long list of buffer, making the memcpy inefficient. 7. Release all the buffers and return them to the pool All these steps are quite expensive when the bookie is receiving a lot of small requests. This PR changes the flow into: 1. journal fsync 2. go through each request and prepare the response into a per-connection `ByteBuf` which is not written on the channel as of yet 3. after preparing all the responses, we flush them at once: Trigger an event on all the connections that will write the accumulated buffers. The advantages are: 1. 1 ByteBuf allocated per connection instead of 1 per request 1. Less allocations and stress of buffer pool 2. More efficient socket write() operations 3. 1 task per connection posted on the Netty IO threads, instead of 1 per request.
Motivation
In #3545 we have switched the
ForceWriteThreadto take advantage oBlockingQueue.drainTo()method for reducing contention, though the core logic of the force-write was not touched at the time.The logic of force-write is quite complicated because it tries to group multiple force-write requests in the queue by sending a new marker and grouping them when the marker is received. This also leads to a bit of lag when there are many requests coming in and the IO is stressed, as we're waiting a bit more before issuing the fsync.
Instead, with the
drainTo()approach we can greatly simplify the logic and maintain a strict fsync grouping:This refactoring will also enable further improvements, to optimize how the send responses are prepared, since we have now a list of responses ready to send.