Skip to content

#585 Add Storm V2 metrics support with backward-compatible bridge#1846

Merged
jnioche merged 2 commits intomainfrom
585
Apr 7, 2026
Merged

#585 Add Storm V2 metrics support with backward-compatible bridge#1846
jnioche merged 2 commits intomainfrom
585

Conversation

@rzo1
Copy link
Copy Markdown
Contributor

@rzo1 rzo1 commented Mar 27, 2026

Thank you for contributing to Apache StormCrawler.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes

  • Is there a issue associated with this PR? Is it referenced in the commit message?

  • Does your PR title start with #XXXX where XXXX is the issue number you are trying to resolve?

  • Has your PR been rebased against the latest commit within the target branch (typically main)?

  • Is your initial contribution a single, squashed commit?

  • Is the code properly formatted with mvn git-code-format:format-code -Dgcf.globPattern="**/*" -Dskip.format.code=false?

For code changes

  • Have you ensured that the full suite of tests is executed via mvn clean verify?
  • Have you written or updated unit tests to verify your changes?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE file, including the main LICENSE file?
  • If applicable, have you updated the NOTICE file, including the main NOTICE file?

Note

Introduce a CrawlerMetrics factory that routes metric registration to Storm V1, V2 (Codahale/Dropwizard), or both APIs based on the config property stormcrawler.metrics.version ("v1" default, "v2", "both"). This enables gradual migration from deprecated V1 metrics without breaking existing deployments or dashboards.

  • New metrics bridge infrastructure in core (ScopedCounter, ScopedReducedMetric interfaces with V1/V2/Dual implementations)
  • Migrated all bolt/spout metric registration across core and all external modules (opensearch, sql, solr, aws, tika, warc, urlfrontier)
  • Added V2 ScheduledStormReporter implementations for OpenSearch, SQL, and Solr that write the same document schema as V1 MetricsConsumer

@rzo1 rzo1 requested a review from jnioche March 27, 2026 18:09
@rzo1 rzo1 added this to the 3.5.2 milestone Mar 27, 2026
@rzo1 rzo1 force-pushed the 585 branch 3 times, most recently from b257bbc to 74598bd Compare March 27, 2026 19:01
Introduce a CrawlerMetrics factory that routes metric registration to
Storm V1, V2 (Codahale/Dropwizard), or both APIs based on the config
property `stormcrawler.metrics.version` ("v1" default, "v2", "both").
This enables gradual migration from deprecated V1 metrics without
breaking existing deployments or dashboards.

- New metrics bridge infrastructure in core (ScopedCounter,
  ScopedReducedMetric interfaces with V1/V2/Dual implementations)
- Migrated all bolt/spout metric registration across core and all
  external modules (opensearch, sql, solr, aws, tika, warc, urlfrontier)
- Added V2 ScheduledStormReporter implementations for OpenSearch, SQL,
  and Solr that write the same document schema as V1 MetricsConsumer
@rzo1 rzo1 requested review from jnioche and removed request for jnioche April 6, 2026 17:28
Copy link
Copy Markdown
Contributor

@jnioche jnioche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rzo1
Can we have tests for the new classes at all?
Should some of them be in Apache Storm instead as they are not specific to SC

}

private static String getVersion(Map<String, Object> stormConf) {
return ConfUtils.getString(stormConf, METRICS_VERSION_KEY, VERSION_V1);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have this in the default config with the default value shown

@rzo1
Copy link
Copy Markdown
Contributor Author

rzo1 commented Apr 6, 2026

Can we have tests for the new classes at all?

Sure, will add some tests.

Should some of them be in Apache Storm instead as they are not specific to SC

That's a fair point worth addressing directly. Thinking out loud, i.e. what could theoretically live upstream:

ScopedCounter, ScopedReducedMetric, DualCounterMetric, and DualReducedMetric are thin abstractions over Storm's own MultiCountMetric/MultiReducedMetric. In principle, Storm could offer these as first-class interfaces to ease V1→V2 migration for any Storm user, not just SC.

From my POV, the real value in this PR is not the interfaces themselves but it's CrawlerMetrics, the factory that routes to V1, V2, or both based on configuration. That class has hard dependencies on SC-specific types:

  • PerSecondReducer (SC core)
  • CollectionMetric (SC core)
  • ConfUtils (SC core)

If we upstreamed only the interfaces to Storm, CrawlerMetrics would still have to live in SC anyway. The interfaces without the factory aren't particularly useful on their own; they'd just be empty contracts with no wiring.

There's also a timing argument: Storm already ships a V2 metrics API. What it's missing is not these wrapper interfaces, but a first-class migration path for existing V1 users. That's a larger design discussion for the (rather inactive) Storm community, and tying this PR to that would block a useful, self-contained improvement to SC indefinitely.

If Storm eventually adopts a similar abstraction, we could migrate CrawlerMetrics to use it and deprecate the SC-local interfaces at that point. That's a clean upgrade path. For now, keeping everything here would be a pragmatic choice as the interfaces are small, the coupling to Storm internals is intentionally light, and the alternative is a cross-project dependency that does not exist yet.

@jnioche jnioche merged commit b4e97ec into main Apr 7, 2026
2 checks passed
@jnioche jnioche deleted the 585 branch April 7, 2026 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants