Skip to content

Conversation

@wenbingshen
Copy link
Member

@wenbingshen wenbingshen commented Jan 16, 2024

Motivation

After support group flush add response, we missing update add requests stats:
image

org.apache.bookkeeper.proto.WriteEntryProcessor#writeComplete

    @Override
    public void writeComplete(int rc, long ledgerId, long entryId,
                              BookieId addr, Object ctx) {
        if (BookieProtocol.EOK == rc) {
            requestProcessor.getRequestStats().getAddEntryStats()
                .registerSuccessfulEvent(MathUtils.elapsedNanos(startTimeNanos), TimeUnit.NANOSECONDS);
        } else {
            requestProcessor.getRequestStats().getAddEntryStats()
                .registerFailedEvent(MathUtils.elapsedNanos(startTimeNanos), TimeUnit.NANOSECONDS);
        }
     
        # sendWriteReqResponse(rc,
                     ResponseBuilder.buildAddResponse(request),
                     requestProcessor.getRequestStats().getAddRequestStats()); // this line has been removed after group flush add response

        requestHandler.prepareSendResponseV2(rc, request);
        requestProcessor.onAddRequestFinish();

        request.recycle();
        recycle();
    }

Changes

AddRequestStats describes that the metric is updated after the add entry request into the writeThreadPool and the response is sent to the client through the network.

We need to update the AddRequestStat corresponding to the add request after the group flush add response.

So here I record the time when the first request of Group Add into the queue of writeThreadPool, the number of successes and failures of Add, the statistics of the difference between the enqueuing time of all successful requests and the enqueuing time of the first request, and all failures. Statistics on the difference between the requested enqueuing time and the first requested enqueuing time.

The count of AddRequestStats reflects the number of requests, so we need to call registerEvent for each individual AddRequest loop, and the time-consuming metric of each AddRequest uses the average time-consuming of the Group Add as a whole.

@hangc0276
Copy link
Contributor

Refer to the release note: https://github.com/apache/bookkeeper/releases/tag/release-4.16.0
Can we use bookkeeper_server_ADD_ENTRY and bookkeeper_server_READ_ENTRY instead?

@wenbingshen
Copy link
Member Author

Refer to the release note: https://github.com/apache/bookkeeper/releases/tag/release-4.16.0 Can we use bookkeeper_server_ADD_ENTRY and bookkeeper_server_READ_ENTRY instead?

@hangc0276 These metrics have different meanings. When we use the V2 protocol,
ADD_ENTRY_REQUEST : Indicates the execution time from when the request enters the write queue to when the response to the production request is sent.
ADD_ENTRY : Indicates the execution time from the beginning of request processing to the completion of writing to the journal
WRITE_THREAD_QUEUED_LATENCY : Indicates the waiting time between the production request entering the queue and starting to be processed.

Based on the above indicators, we use: The time it takes to send a production response to the client on network IO:
ADD_ENTRY_REQUEST - ADD_ENTRY - WRITE_THREAD_QUEUED_LATENCY

@wenbingshen
Copy link
Member Author

Refer to the release note: https://github.com/apache/bookkeeper/releases/tag/release-4.16.0 Can we use bookkeeper_server_ADD_ENTRY and bookkeeper_server_READ_ENTRY instead?

@hangc0276 These metrics have different meanings. When we use the V2 protocol, ADD_ENTRY_REQUEST : Indicates the execution time from when the request enters the write queue to when the response to the production request is sent. ADD_ENTRY : Indicates the execution time from the beginning of request processing to the completion of writing to the journal WRITE_THREAD_QUEUED_LATENCY : Indicates the waiting time between the production request entering the queue and starting to be processed.

Based on the above indicators, we use: The time it takes to send a production response to the client on network IO: ADD_ENTRY_REQUEST - ADD_ENTRY - WRITE_THREAD_QUEUED_LATENCY

@hangc0276 bookkeeper_server_READ_ENTRY_REQUEST still works fine in 4.16.x, I noticed that batch read support will be released in 4.17.x, I don't know if READ_ENTRY_REQUEST can be supported under batch read, but send read response has a blocking send api, I think this can effectively reflecting the network IO situation can help us analyze whether the read request delay occurs at the bookie service level or the network or broker side.

After thinking about it again, I think bookkeeper_server_ADD_ENTRY_REQUEST can be replaced by bookkeeper_server_ADD_ENTRY.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants