Fix missing add request stats after group flush for add response #4179

wenbingshen · 2024-01-16T10:02:50Z

Motivation

After support group flush add response, we missing update add requests stats:

org.apache.bookkeeper.proto.WriteEntryProcessor#writeComplete

    @Override
    public void writeComplete(int rc, long ledgerId, long entryId,
                              BookieId addr, Object ctx) {
        if (BookieProtocol.EOK == rc) {
            requestProcessor.getRequestStats().getAddEntryStats()
                .registerSuccessfulEvent(MathUtils.elapsedNanos(startTimeNanos), TimeUnit.NANOSECONDS);
        } else {
            requestProcessor.getRequestStats().getAddEntryStats()
                .registerFailedEvent(MathUtils.elapsedNanos(startTimeNanos), TimeUnit.NANOSECONDS);
        }
     
        # sendWriteReqResponse(rc,
                     ResponseBuilder.buildAddResponse(request),
                     requestProcessor.getRequestStats().getAddRequestStats()); // this line has been removed after group flush add response

        requestHandler.prepareSendResponseV2(rc, request);
        requestProcessor.onAddRequestFinish();

        request.recycle();
        recycle();
    }

Changes

AddRequestStats describes that the metric is updated after the add entry request into the writeThreadPool and the response is sent to the client through the network.

We need to update the AddRequestStat corresponding to the add request after the group flush add response.

So here I record the time when the first request of Group Add into the queue of writeThreadPool, the number of successes and failures of Add, the statistics of the difference between the enqueuing time of all successful requests and the enqueuing time of the first request, and all failures. Statistics on the difference between the requested enqueuing time and the first requested enqueuing time.

The count of AddRequestStats reflects the number of requests, so we need to call registerEvent for each individual AddRequest loop, and the time-consuming metric of each AddRequest uses the average time-consuming of the Group Add as a whole.

hangc0276 · 2024-01-22T08:04:47Z

Refer to the release note: https://github.com/apache/bookkeeper/releases/tag/release-4.16.0
Can we use bookkeeper_server_ADD_ENTRY and bookkeeper_server_READ_ENTRY instead?

wenbingshen · 2024-01-23T03:16:26Z

Refer to the release note: https://github.com/apache/bookkeeper/releases/tag/release-4.16.0 Can we use bookkeeper_server_ADD_ENTRY and bookkeeper_server_READ_ENTRY instead?

@hangc0276 These metrics have different meanings. When we use the V2 protocol,
ADD_ENTRY_REQUEST : Indicates the execution time from when the request enters the write queue to when the response to the production request is sent.
ADD_ENTRY : Indicates the execution time from the beginning of request processing to the completion of writing to the journal
WRITE_THREAD_QUEUED_LATENCY : Indicates the waiting time between the production request entering the queue and starting to be processed.

Based on the above indicators, we use: The time it takes to send a production response to the client on network IO:
ADD_ENTRY_REQUEST - ADD_ENTRY - WRITE_THREAD_QUEUED_LATENCY

wenbingshen · 2024-01-23T03:21:59Z

Refer to the release note: https://github.com/apache/bookkeeper/releases/tag/release-4.16.0 Can we use bookkeeper_server_ADD_ENTRY and bookkeeper_server_READ_ENTRY instead?

@hangc0276 These metrics have different meanings. When we use the V2 protocol, ADD_ENTRY_REQUEST : Indicates the execution time from when the request enters the write queue to when the response to the production request is sent. ADD_ENTRY : Indicates the execution time from the beginning of request processing to the completion of writing to the journal WRITE_THREAD_QUEUED_LATENCY : Indicates the waiting time between the production request entering the queue and starting to be processed.

Based on the above indicators, we use: The time it takes to send a production response to the client on network IO: ADD_ENTRY_REQUEST - ADD_ENTRY - WRITE_THREAD_QUEUED_LATENCY

@hangc0276 bookkeeper_server_READ_ENTRY_REQUEST still works fine in 4.16.x, I noticed that batch read support will be released in 4.17.x, I don't know if READ_ENTRY_REQUEST can be supported under batch read, but send read response has a blocking send api, I think this can effectively reflecting the network IO situation can help us analyze whether the read request delay occurs at the bookie service level or the network or broker side.

After thinking about it again, I think bookkeeper_server_ADD_ENTRY_REQUEST can be replaced by bookkeeper_server_ADD_ENTRY.

fix missing add request stats after group flush for add response

8157f5a

wenbingshen requested review from dlg99, eolivelli, hangc0276, merlimat and zymap January 16, 2024 10:04

wenbingshen self-assigned this Jan 16, 2024

fix checkstyle

63ff3d8

wenbingshen closed this Jan 23, 2024

wenbingshen deleted the wenbing/fix_missing_add_stats branch January 23, 2024 03:42

michaeljmarshall mentioned this pull request Apr 19, 2024

Fix ThreadRegistry#register behavior to ensure correct Prom metrics #4300

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix missing add request stats after group flush for add response #4179

Fix missing add request stats after group flush for add response #4179

Uh oh!

wenbingshen commented Jan 16, 2024 •

edited

Loading

Uh oh!

hangc0276 commented Jan 22, 2024

Uh oh!

wenbingshen commented Jan 23, 2024

Uh oh!

wenbingshen commented Jan 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix missing add request stats after group flush for add response #4179

Fix missing add request stats after group flush for add response #4179

Uh oh!

Conversation

wenbingshen commented Jan 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

Uh oh!

hangc0276 commented Jan 22, 2024

Uh oh!

wenbingshen commented Jan 23, 2024

Uh oh!

wenbingshen commented Jan 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wenbingshen commented Jan 16, 2024 •

edited

Loading