Conversation
b257bbc to
74598bd
Compare
Introduce a CrawlerMetrics factory that routes metric registration to
Storm V1, V2 (Codahale/Dropwizard), or both APIs based on the config
property `stormcrawler.metrics.version` ("v1" default, "v2", "both").
This enables gradual migration from deprecated V1 metrics without
breaking existing deployments or dashboards.
- New metrics bridge infrastructure in core (ScopedCounter,
ScopedReducedMetric interfaces with V1/V2/Dual implementations)
- Migrated all bolt/spout metric registration across core and all
external modules (opensearch, sql, solr, aws, tika, warc, urlfrontier)
- Added V2 ScheduledStormReporter implementations for OpenSearch, SQL,
and Solr that write the same document schema as V1 MetricsConsumer
| } | ||
|
|
||
| private static String getVersion(Map<String, Object> stormConf) { | ||
| return ConfUtils.getString(stormConf, METRICS_VERSION_KEY, VERSION_V1); |
There was a problem hiding this comment.
We should have this in the default config with the default value shown
Sure, will add some tests.
That's a fair point worth addressing directly. Thinking out loud, i.e. what could theoretically live upstream:
From my POV, the real value in this PR is not the interfaces themselves but it's
If we upstreamed only the interfaces to Storm, There's also a timing argument: Storm already ships a V2 metrics API. What it's missing is not these wrapper interfaces, but a first-class migration path for existing V1 users. That's a larger design discussion for the (rather inactive) Storm community, and tying this PR to that would block a useful, self-contained improvement to SC indefinitely. If Storm eventually adopts a similar abstraction, we could migrate |
Thank you for contributing to Apache StormCrawler.
In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:
For all changes
Is there a issue associated with this PR? Is it referenced in the commit message?
Does your PR title start with
#XXXXwhereXXXXis the issue number you are trying to resolve?Has your PR been rebased against the latest commit within the target branch (typically main)?
Is your initial contribution a single, squashed commit?
Is the code properly formatted with
mvn git-code-format:format-code -Dgcf.globPattern="**/*" -Dskip.format.code=false?For code changes
mvn clean verify?Note
Introduce a CrawlerMetrics factory that routes metric registration to Storm V1, V2 (Codahale/Dropwizard), or both APIs based on the config property
stormcrawler.metrics.version("v1" default, "v2", "both"). This enables gradual migration from deprecated V1 metrics without breaking existing deployments or dashboards.