KAFKA-8580: Compute RocksDB metrics#7263
Conversation
There was a problem hiding this comment.
Here, I set RocksDB's standard interval for its own regular statistics dump. If anybody has a better idea please speak up. In future, we could provide a user configuration to set this.
There was a problem hiding this comment.
Unfortunately, I could not compute all metrics proposed in KIP-471 because some require functionality of RocksDB 6+ and Streams currently uses 5.18.
|
Call for review @guozhangwang @ableegoldman @mjsax @vvcephei @bbejeck @abbccdda |
|
For JDK 11/Scala 2.12 the following tests failed: The other two builds exceeded the timeout. Retest this, please |
|
In JDK8/Scala 2.11 the following tests failed: |
|
Retest this, please |
There was a problem hiding this comment.
Why we want to replace spaces in thread name?
There was a problem hiding this comment.
Overthought it. Will remove the replacement.
There was a problem hiding this comment.
The logic of closeMetricsRecorder and metricsRecorderIsRunning is a bit awkward here since the logic of RocksDBMetricsRecorder#removeStatistics and #close are actually the same except stopRecording. How about simplifying it to the following:
-
Just have a hard-coded recording interval inside
RocksDBMetricsRecorderinstead of letting caller pass in the value. Also maintaining reference to the background thread but not starting it at construction time. -
In
addStatistics, if the map is no longer empty afterwards (in practice it should be), starts the thread if it has not been started. -
In
removeStatistics, if the map becomes empty afterwards, stop the thread.
Then in the caller, we only need addStatistics / removeStatics and do not need the flags for closing any more, and the logic of this function can be simplified also.
For segmented store, we need to make sure that if the last segment was indeed removed, it would not try to create a new one, which means that at least two segments are maintained at the same time which I think can be guaranteed programmatically. To be safe, we can also maintain a flag or a enum state of the recorder of created -> (addStatistics) -> started -> (removeStatistics) stopped, and if addStatistics was called after it has transit to stopped throw an exception.
There was a problem hiding this comment.
That is really a good point. I changed the design of the recorder. Now it starts a thread when the first Statistics object is added, then it stops the thread when the last Statistics object is removed and it starts a new thread when a Statistics object is re-added. This was necessary since a RocksDB instance may be opened and closed a couple of times during its life cycle due to restoration.
There was a problem hiding this comment.
If startRecording is called multiple times with different recording intervals, we would change the sleeping intervals, is that intentional? Or should we just stick with the first passed in value?
There was a problem hiding this comment.
Some metrics are removed here, is that intentional?
There was a problem hiding this comment.
Yes it was intentional. Unfortunately, some metrics require functionality of RocksDB 6+ and Streams currently uses 5.18.
There was a problem hiding this comment.
So there are metrics we would like to add but can't until we upgrade RocksDB? Can we create a 3.0 blocker ticket to add them back in when we bump rocks (and/or maybe a separate ticket to consider a major version bump of rocks with the next major version bump of kafka)
There was a problem hiding this comment.
@ableegoldman I created ticket https://issues.apache.org/jira/browse/KAFKA-8897
I did not specify it as a blocker for 3.0 because a major version increase of Streams is not required for a major version increase of RocksDB. If I miss something regarding this, please feel free to comment on the ticket and change it to a blocker.
There was a problem hiding this comment.
Should we check for the thread has eventually joined, say when the streams instance is closed? With 10min we should not block on waiting, but also we should make sure we are not accidentally leaking threads.
There was a problem hiding this comment.
When the last Statistics object is removed the thread is interrupted, stopped, and the removing thread waits for the thread to die. That should ensure no leak. Please review that code with special care.
a024d01 to
62c57ee
Compare
There was a problem hiding this comment.
Yes it was intentional. Unfortunately, some metrics require functionality of RocksDB 6+ and Streams currently uses 5.18.
There was a problem hiding this comment.
Overthought it. Will remove the replacement.
There was a problem hiding this comment.
That is really a good point. I changed the design of the recorder. Now it starts a thread when the first Statistics object is added, then it stops the thread when the last Statistics object is removed and it starts a new thread when a Statistics object is re-added. This was necessary since a RocksDB instance may be opened and closed a couple of times during its life cycle due to restoration.
There was a problem hiding this comment.
When the last Statistics object is removed the thread is interrupted, stopped, and the removing thread waits for the thread to die. That should ensure no leak. Please review that code with special care.
There was a problem hiding this comment.
At the moment, hit ratios consists only of the current value measured in the recording interval. Should we change this to AVG, MIN, and MAX?
guozhangwang
left a comment
There was a problem hiding this comment.
Made another pass, all are nit comments. @vvcephei could you take a look at this PR also?
There was a problem hiding this comment.
nit: maybe we can just merge NEW into NOT_RUNNING? I.e. the initialized state is just NOT_RUNNING.
There was a problem hiding this comment.
This function is only used for testing?
There was a problem hiding this comment.
nit: the last parameter seems always true in non-testing code. I'm wondering if we can add a couple test-only functions (e.g. the following error() seems test only also), like a setManual which would then force the state to be MANUAL instead of setting it at the constructor.
Then we can remove the startRecordingThread flag, and in tests after we've created the recorder object call setManual so that later add/removeStatistics would not start or stop recording.
There was a problem hiding this comment.
One caveat is that when we are closing the Kafka Streams instance with a specified timeout value, this function may violate that timeout and wait for longer time since we call thread.join() without a timeout value.
There was a problem hiding this comment.
Just to clarify, that add/removeStatistics do not need to be thread-safe since they would only be called within a single thread at a given time right?
vvcephei
left a comment
There was a problem hiding this comment.
Hey @cadonna ,
Thanks for the PR!
I'm curious about the threading model. Is there an advantage to having one thread per store? It seems like we could also just have one thread per Streams instance and have it iterate over all the RocksDBs. I'm not sure, but it seems like the management might be a little simpler, not to mention avoiding all those extra threads for an amount of work that is really quite small.
WDYT?
There was a problem hiding this comment.
This seems suspicious...
- Shouldn't all blocking operations have timeouts?
- Should we be swallowing and ignoring InterruptedExceptions?
It seems like a recipe for Streams to hang forever un-killably. But I feel like I'm missing something.
There was a problem hiding this comment.
some of these don't reset the ticker, but they all use +=. Is this right?
There was a problem hiding this comment.
Oh, Thanks! I totally overlooked that. Now all ticker counts are reset after each recording except for open files and file errors. Open files and file errors are recorded as values, i.e., no aggregation is applied on them on Kafka's metrics side. All other metrics apply some aggregation, thus we have to record only the delta at each recording.
62c57ee to
4987afa
Compare
There was a problem hiding this comment.
(nit): not sure if this is any clearer that 1 (and 0) above.
There was a problem hiding this comment.
I stored the values for delay and interval in variables named recordingDelay and recordingInterval, respectively.
There was a problem hiding this comment.
This seems a little mis-named, since the recording level doesn't actually change when we remove the stats. Maybe something like isStatsRegistered?
On the other hand, would it be possible to just idempotently "remove" the stats even if they weren't registered? I.e., just no-op in that case?
There was a problem hiding this comment.
It would be possible to idempotently remove the stats and I also thought about it. However, I have IllegalStateExceptions in place in RocksDBMetricsRecorder and RocksDBMetricsRecordingTrigger that ensure that the contract is satisfied to only remove stats that where added before. I thought, that would improve the correctness of the code.
There was a problem hiding this comment.
I think it should be safe to call remove here multiple times, the important thing is just to make sure it's actually at least once
There was a problem hiding this comment.
But @cadonna , will this remove the metrics whenever any store on the instance is closed? That seems to make sense with the current eager rebalancing, but with KIP-429 we will only close the ones that are migrated to another consumer.
There was a problem hiding this comment.
We need to be able to remove just a few specific metrics without disrupting the others, while also making sure to actually close/cleanup during an actual shutdown
There was a problem hiding this comment.
After closer look, I don't think there's a conflict between 429 and how the metrics are currently closed 😄
There was a problem hiding this comment.
Metrics of a store within a task are removed from the streams metrics whenever the store of this task is closed. The metrics of the stores that are still open remain. That should be fine with KIP-429.
There was a problem hiding this comment.
I like this parity check. 👍
There was a problem hiding this comment.
Hopefully, there's no danger of these counters overflowing, but regardless, I think the math would still work out if we did reset them here.
There was a problem hiding this comment.
What if rocksDB opened 10 files and closed 0 files during a recording interval. The metrics would record 10. Then the counters are reset to 0. In the next recording interval 0 files are opened and 10 are closed. The metric would record -10. However, the metrics is defined as a Value(), i.e., the value is recorded as a plain number. Your proposal would work, if the metric were defined as CumulativeSum(). I will try to do that.
There was a problem hiding this comment.
Ah, you're right. I was thinking that the numberOfOpenFiles variable was a field (i.e., persistent).
There was a problem hiding this comment.
Metrics are now reset after each read and accumulated with CumulativeSum().
guozhangwang
left a comment
There was a problem hiding this comment.
Made another pass, only have nit ones left and I think after that it's good to merge.
There was a problem hiding this comment.
Actually should the previous sensors like bytesReadDuringCompactionSensor not be a addRateOfSumMetricToSensor metric than a rate? My understanding is that the values read from rocksDB is already sum, so we probably only cares about avg / max?
There was a problem hiding this comment.
The statistics ticker that we read from RocksDB is a monotonically increasing sum of bytes that we reset after each read. I chose a rate because it should give an impression about the IO that is produced by compactions.
There was a problem hiding this comment.
It's a bit confusing to have two storeNames variable, the first one is actually for the logical storeName (which would be the same to the physical store name if it is kv-store) and the second is for the physical storeName (e.g. for segmented store its name is storeName-segmentID).
Maybe rename the second to rocksDBName?
There was a problem hiding this comment.
Renaming is a good idea. I renamed the physical storeName to segmentName.
There was a problem hiding this comment.
Just wanted to say I really like the way this unit test is written! With the right usage of mocks we would avoid having any time-dependent flakiness.
There was a problem hiding this comment.
I think it should be safe to call remove here multiple times, the important thing is just to make sure it's actually at least once
A metric recorder runs in it own thread and regularly records RocksDB metrics from RocksDB's statistics. For segmented state stores the metrics are aggregated over the segments.
Changes the threading model from one recording thread per logical RocksDB instance to one recording thread per Kafka Streams instance.
4987afa to
262b663
Compare
|
Merged to trunk, thanks @cadonna ! Please make sure we have another PR for docs change as well :) |
Conflicts: * .gitignore: addition of clients/src/generated-test was near local additions for support-metrics. * checkstyle/suppressions.xml: upstream refactoring of exclusions for generator were near the local changes for support-metrics. * gradle.properties: scala version bump caused a minor conflict due to the kafka version change locally. gradle/dependencies.gradle: bcpkix version bump was near avro additions in the local version. * apache-github/trunk: (49 commits) KAFKA-8471: Replace control requests/responses with automated protocol (apache#7353) MINOR: Don't generate unnecessary strings for debug logging in FetchSessionHandler (apache#7394) MINOR:fixed typo and removed outdated varilable name (apache#7402) KAFKA-8934: Create version file during build for Streams (apache#7397) KAFKA-8319: Make KafkaStreamsTest a non-integration test class (apache#7382) KAFKA-6883: Add toUpperCase support to sasl.kerberos.principal.to.local rule (KIP-309) KAFKA-8907; Return topic configs in CreateTopics response (KIP-525) (apache#7380) MINOR: Address review comments for KIP-504 authorizer changes (apache#7379) MINOR: add versioning to request and response headers (apache#7372) KAFKA-7273: Extend Connect Converter to support headers (apache#6362) MINOR: improve the Kafka RPC code generator (apache#7340) MINOR: Improve the org.apache.kafka.common.protocol code (apache#7344) KAFKA-8880: Docs on upgrade-guide (apache#7385) KAFKA-8179: do not suspend standby tasks during rebalance (apache#7321) KAFKA-8580: Compute RocksDB metrics (apache#7263) KAFKA-8880: Add overloaded function of Consumer.committed (apache#7304) HOTFIX: fix Kafka Streams upgrade note for broker backward compatibility (apache#7363) KAFKA-8848; Update system tests to use new AclAuthorizer (apache#7374) MINOR: remove unnecessary null check (apache#7299) KAFKA-6958: Overload methods for group and windowed stream to allow to name operation name using the new Named class (apache#6413) ...
A metric recorder runs in it own thread and regularly records RocksDB metrics from
RocksDB's statistics. For segmented state stores the metrics are aggregated over the
segments.
Committer Checklist (excluded from commit message)