-
Notifications
You must be signed in to change notification settings - Fork 331
SAMZA-2397: Updating gauge-val function on newGauge on same metric name #1223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SAMZA-2397: Updating gauge-val function on newGauge on same metric name #1223
Conversation
|
@bkonold can you review? |
Updating method in MetricsRegistryMap
| * @param overrideExistingGauge Overwrite any existing gauges present for the same group and gauge | ||
| * @return Gauge was registered | ||
| */ | ||
| <T> Gauge<T> newGauge(String group, Gauge<T> value, Boolean overrideExistingGauge); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we defining a new method here with the additional parameter to avoid a large code change in adding a boolean to the definition above?
Wondering whether the two methods will coexist long term or if we see a way to unify eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, making overriding to be the default behavior requires testing to ensure that it doesnt break existing metrics; I can work on that separately.
Although I tried with a test-job and everything looked ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to not add a new variant here and update the existing behavior instead. It seems to fall into "fixing undefined behavior" territory, in that I don't think a reasonable user expectation would be to retain the old metric. We can fix this here and call it out in the "backwards incompatible / API changes" section of the PR description.
@cameronlee314 What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If leaving them separate is what is needed to ensure the P2 timeline and we plan for the two living along side one another to be temporary, maybe we can open a ticket and indicate in a comment when / under what conditions clean up can happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, added details in https://issues.apache.org/jira/browse/SAMZA-2417
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to not add a new variant here and update the existing behavior instead. It seems to fall into "fixing undefined behavior" territory, in that I don't think a reasonable user expectation would be to retain the old metric. We can fix this here and call it out in the "backwards incompatible / API changes" section of the PR description.
@cameronlee314 What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was debating doing that, but I'm not sure if it'd break some existing metrics-subsystem which relies on us retaining the first object rather than the last?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prateekm I rollbacked all most of the changes; we now just retain the newest metric.
can you approve it if it looks ok, so we can merge.
| }); | ||
| } | ||
|
|
||
| public <T> Gauge<T> newGauge(String name, final ValueFunction<T> valueFunc, Boolean overwrite) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be a mixing of terminology here vs MetricsRegistry (overwrite vs overrideExistingGauge). All we're doing is passing the arg here on to the registry; is the difference in naming intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, made naming consistent.
samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStore.scala
Outdated
Show resolved
Hide resolved
samza-core/src/main/scala/org/apache/samza/metrics/MetricsHelper.scala
Outdated
Show resolved
Hide resolved
…e#1231) doc and blog related update on master branch for 1.3.0 release
…SamzaSqlRelRecord for all the nested records.
…ped during a rebalance (apache#1213) Stream processor should ensure previous container is stopped during a rebalance
…ing env variables in ShellCommandConfig Design: https://cwiki.apache.org/confluence/display/SAMZA/SEP-23%3A+Simplify+Job+Runner Changes: 1. Add job.config.loader.factory to specify ConfigLoaderFactory. 2. Add job.config.loader.properties prefix to specify properties needed by ConfigLoader to fetch config. 3. Add ENV_SUBMISSION_CONFIG enviroment variable to pass job submission configs to Yarn. API Changes: N/A. This is part of a series PRs, detailed information will be provided in the last/main PR. Upgrade Instructions: N/A. This is part of a series PRs, detailed information will be provided in the last/main PR. Usage Instructions: N/A. This is part of a series PRs, detailed information will be provided in the last/main PR. Tests: N/A.
…ache#1240) Populate container exception from the listener only if it is null
…lt PropertiesConfigLoader as well as PropertiesConfigLoaderFactory implementations. Add ConfigLoader, ConfigLoaderFactory interface and default PropertiesConfigLoader as well as PropertiesConfigLoaderFactory implementations, which will be used on AM to fetch configs. It will also be used on LocalApplicationRunner to fetch configs as well. The existing ConfigFactory interface will be deprecated and removed as we do not fetch configs on start up script run-app.sh anymore. Design: https://cwiki.apache.org/confluence/display/SAMZA/SEP-23%3A+Simplify+Job+Runner Changes: 1. Add ConfigLoader interface which will be used in job coordinator to retrieve config from. 2. Add ConfigLoaderFactory interface which will be used to create an instance of ConfigLoader. 3. Add PropertiesConfigLoader which is a default implementation of ConfigLoader that reads a local properties file given the file path. 4. Add PropertiesConfigLoaderFactory which is a default implementation of ConfigLoaderFactory that reads properties file path from start up config and creates a PropertiesConfigLoader with the provided path. API Changes: N/A. This is part of a series PRs, detailed information will be provided in the last/main PR. Upgrade Instructions: N/A. This is part of a series PRs, detailed information will be provided in the last/main PR. Usage Instructions: N/A. This is part of a series PRs, detailed information will be provided in the last/main PR. Tests: Add TestPropertiesConfigLoader to unit test PropertiesConfigLoader.
* Refactor LocalApplicationRunner Issues: LocalApplicationRunner has multiple constructors and follows different initialization logic. Changes: 1. Refactor all constructors to end up with the same private constructor. 2. Align "static final" and "final static" to "static final" 3. Group class variales based on accessibility 4. Minor fixs on inconsistent styling, return etc. Tests: Unit Tests * Revert LocalApplicationRunner(SamzaApplication app, Config config, MetadataStoreFactory metadataStoreFactory) from package private back to public
…che#1245) Issues: 1. org.apache.samza.sql.util.ConfigUtil is not used 2. it is the same name as org.apache.samza.util.ConfigUtil 3. its usage should be deprecated in favor of Config#subset(String prefix, boolean stripPrefix) Changes: delete org.apache.samza.sql.util.ConfigUtil Tests: build
Issues: samza-autoscaling module is not used. Changes: remove the unused module. API Changes: None Upgrade Instructions: None Usage Instructions: None Tests: build
…ache#1251) * Fix the checkpoint and changelog topic creation configurations. * Address review comments. * Address review comments.
…#1247) 1. Depending on the existence of job.config.loader.factory, RemoteApplicationRunner will alternatively only submit the job, without planning. 2. Late initialize RemoteJobPlanner only when needed to avoid executing user code in submitting only mode.
…from loader. (apache#1248) 1. Based on the existence of ENV_SUBMISSION_CONFIG, ClusterBasedJobCoordinator will alternatively deserilize ENV_SUBMISSION_CONFIG and load full job config from config loader factory. 2. Execute planning, create diagnostics stream and persist full job config back to coordinator stream when loading full job config from config loader.
prateekm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update. Fix it and ship it.
Let's update the PR description with the "API changes" etc. sections (from SEP-25) and call this out as an API change.
| debug("Adding new gauge %s %s %s." format (group, gauge.getName, gauge)) | ||
| putAndGetGroup(group).putIfAbsent(gauge.getName, gauge) | ||
| if (putAndGetGroup(group).containsKey(gauge.getName)) { | ||
| info("Updating existing gauge %s %s %s" format (group, gauge.getName, gauge)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be debug.
|
|
||
| package org.apache.samza.metrics | ||
|
|
||
| import java.lang |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove.
Updating gauge-val function on newGauge on same metric name.
API Change: MetricRegistryMap. newGauge will now overwrite any existing gauge, if one exists for the same group and gauge.name.
Symptom: RocksDB-specific metric value like estimate-table-readers-mem, cur-size-active-mem-table, etc, are not being emitted after Samza 1.1; only a 0 value is emitted instead.
Cause: Samza 1.1 uses an optimization where a RocksDBStore is created first in BulkLoad mode for optimizing bootstrap write, after bootstrapping is complete the store is closed (to trigger compaction), and a new store is created in ReadWrite mode. The rocksdb metrics are calculated using a function which checks if the store is open (see #588 by @xinyuiscool , SAMZA-1778), otherwise it emits 0.
See RocksdbKeyValueStore#95.
However, because the metric-gauge-object is not updated after the store is recreated, the gauge keeps checking the first (closed) store and emits 0.
Fix: Added logic that ensures that the second store's metric overrides the previous gauge for the same name, and logs a message.
Tests: Added unit test, test with a samza job that emits non-zero RocksDB metrics.