Skip to content

Conversation

@rmatharu-zz
Copy link
Contributor

@rmatharu-zz rmatharu-zz commented Nov 22, 2019

Updating gauge-val function on newGauge on same metric name.

API Change: MetricRegistryMap. newGauge will now overwrite any existing gauge, if one exists for the same group and gauge.name.

Symptom: RocksDB-specific metric value like estimate-table-readers-mem, cur-size-active-mem-table, etc, are not being emitted after Samza 1.1; only a 0 value is emitted instead.

Cause: Samza 1.1 uses an optimization where a RocksDBStore is created first in BulkLoad mode for optimizing bootstrap write, after bootstrapping is complete the store is closed (to trigger compaction), and a new store is created in ReadWrite mode. The rocksdb metrics are calculated using a function which checks if the store is open (see #588 by @xinyuiscool , SAMZA-1778), otherwise it emits 0.
See RocksdbKeyValueStore#95.
However, because the metric-gauge-object is not updated after the store is recreated, the gauge keeps checking the first (closed) store and emits 0.

Fix: Added logic that ensures that the second store's metric overrides the previous gauge for the same name, and logs a message.

Tests: Added unit test, test with a samza job that emits non-zero RocksDB metrics.

@prateekm
Copy link
Contributor

@bkonold can you review?

* @param overrideExistingGauge Overwrite any existing gauges present for the same group and gauge
* @return Gauge was registered
*/
<T> Gauge<T> newGauge(String group, Gauge<T> value, Boolean overrideExistingGauge);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we defining a new method here with the additional parameter to avoid a large code change in adding a boolean to the definition above?

Wondering whether the two methods will coexist long term or if we see a way to unify eventually.

Copy link
Contributor Author

@rmatharu-zz rmatharu-zz Dec 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, making overriding to be the default behavior requires testing to ensure that it doesnt break existing metrics; I can work on that separately.
Although I tried with a test-job and everything looked ok.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to not add a new variant here and update the existing behavior instead. It seems to fall into "fixing undefined behavior" territory, in that I don't think a reasonable user expectation would be to retain the old metric. We can fix this here and call it out in the "backwards incompatible / API changes" section of the PR description.

@cameronlee314 What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If leaving them separate is what is needed to ensure the P2 timeline and we plan for the two living along side one another to be temporary, maybe we can open a ticket and indicate in a comment when / under what conditions clean up can happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to not add a new variant here and update the existing behavior instead. It seems to fall into "fixing undefined behavior" territory, in that I don't think a reasonable user expectation would be to retain the old metric. We can fix this here and call it out in the "backwards incompatible / API changes" section of the PR description.

@cameronlee314 What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was debating doing that, but I'm not sure if it'd break some existing metrics-subsystem which relies on us retaining the first object rather than the last?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prateekm I rollbacked all most of the changes; we now just retain the newest metric.
can you approve it if it looks ok, so we can merge.

});
}

public <T> Gauge<T> newGauge(String name, final ValueFunction<T> valueFunc, Boolean overwrite) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be a mixing of terminology here vs MetricsRegistry (overwrite vs overrideExistingGauge). All we're doing is passing the arg here on to the registry; is the difference in naming intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, made naming consistent.

rmatharu and others added 19 commits December 12, 2019 16:02
…e#1231)

doc and blog related update on master branch for 1.3.0 release
…SamzaSqlRelRecord for all the nested records.
…ped during a rebalance (apache#1213)

Stream processor should ensure previous container is stopped during a rebalance
…ing env variables in ShellCommandConfig

Design:
https://cwiki.apache.org/confluence/display/SAMZA/SEP-23%3A+Simplify+Job+Runner

Changes:

1. Add job.config.loader.factory to specify ConfigLoaderFactory.
2. Add job.config.loader.properties prefix to specify properties needed by ConfigLoader to fetch config.
3. Add ENV_SUBMISSION_CONFIG enviroment variable to pass job submission configs to Yarn.

API Changes:

N/A. This is part of a series PRs, detailed information will be provided in the last/main PR.
Upgrade Instructions:
N/A. This is part of a series PRs, detailed information will be provided in the last/main PR.
Usage Instructions:
N/A. This is part of a series PRs, detailed information will be provided in the last/main PR.

Tests:
N/A.
…ache#1240)

Populate container exception from the listener only if it is null
Ke Wu and others added 24 commits January 20, 2020 19:20
…lt PropertiesConfigLoader as well as PropertiesConfigLoaderFactory implementations.

Add ConfigLoader, ConfigLoaderFactory interface and default PropertiesConfigLoader as well as PropertiesConfigLoaderFactory implementations, which will be used on AM to fetch configs. It will also be used on LocalApplicationRunner to fetch configs as well.

The existing ConfigFactory interface will be deprecated and removed as we do not fetch configs on start up script run-app.sh anymore.

Design:
https://cwiki.apache.org/confluence/display/SAMZA/SEP-23%3A+Simplify+Job+Runner

Changes:

1. Add ConfigLoader interface which will be used in job coordinator to retrieve config from.
2. Add ConfigLoaderFactory interface which will be used to create an instance of ConfigLoader.
3. Add PropertiesConfigLoader which is a default implementation of ConfigLoader that reads a local properties file given the file path.
4. Add PropertiesConfigLoaderFactory which is a default implementation of ConfigLoaderFactory that reads properties file path from start up config and creates a PropertiesConfigLoader with the provided path.
API Changes:

  N/A. This is part of a series PRs, detailed information will be provided in the last/main PR.
Upgrade Instructions:
  N/A. This is part of a series PRs, detailed information will be provided in the last/main PR.
Usage Instructions:
  N/A. This is part of a series PRs, detailed information will be provided in the last/main PR.

Tests:

Add TestPropertiesConfigLoader to unit test PropertiesConfigLoader.
* Refactor LocalApplicationRunner

Issues: LocalApplicationRunner has multiple constructors and follows different initialization logic.

Changes:

1. Refactor all constructors to end up with the same private constructor.
2. Align "static final" and "final static" to "static final"
3. Group class variales based on accessibility
4. Minor fixs on inconsistent styling, return etc.

Tests:
Unit Tests

* Revert LocalApplicationRunner(SamzaApplication app, Config config, MetadataStoreFactory metadataStoreFactory) from package private back to public
…che#1245)

Issues:

1. org.apache.samza.sql.util.ConfigUtil is not used
2. it is the same name as org.apache.samza.util.ConfigUtil
3. its usage should be deprecated in favor of Config#subset(String prefix, boolean stripPrefix)

Changes: delete org.apache.samza.sql.util.ConfigUtil

Tests: build
Issues: samza-autoscaling module is not used.

Changes: remove the unused module.

API Changes:
None

Upgrade Instructions:
None

Usage Instructions:
None

Tests: build
…ache#1251)

* Fix the checkpoint and changelog topic creation configurations.

* Address review comments.

* Address review comments.
…#1247)

1. Depending on the existence of job.config.loader.factory, RemoteApplicationRunner will alternatively only submit the job, without planning.
2. Late initialize RemoteJobPlanner only when needed to avoid executing user code in submitting only mode.
…from loader. (apache#1248)

1. Based on the existence of ENV_SUBMISSION_CONFIG, ClusterBasedJobCoordinator will alternatively deserilize ENV_SUBMISSION_CONFIG and load full job config from config loader factory.
2. Execute planning, create diagnostics stream and persist full job config back to coordinator stream when loading full job config from config loader.
Copy link
Contributor

@prateekm prateekm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. Fix it and ship it.

Let's update the PR description with the "API changes" etc. sections (from SEP-25) and call this out as an API change.

debug("Adding new gauge %s %s %s." format (group, gauge.getName, gauge))
putAndGetGroup(group).putIfAbsent(gauge.getName, gauge)
if (putAndGetGroup(group).containsKey(gauge.getName)) {
info("Updating existing gauge %s %s %s" format (group, gauge.getName, gauge))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be debug.


package org.apache.samza.metrics

import java.lang
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.