Conversation
b257bbc to
74598bd
Compare
Introduce a CrawlerMetrics factory that routes metric registration to
Storm V1, V2 (Codahale/Dropwizard), or both APIs based on the config
property `stormcrawler.metrics.version` ("v1" default, "v2", "both").
This enables gradual migration from deprecated V1 metrics without
breaking existing deployments or dashboards.
- New metrics bridge infrastructure in core (ScopedCounter,
ScopedReducedMetric interfaces with V1/V2/Dual implementations)
- Migrated all bolt/spout metric registration across core and all
external modules (opensearch, sql, solr, aws, tika, warc, urlfrontier)
- Added V2 ScheduledStormReporter implementations for OpenSearch, SQL,
and Solr that write the same document schema as V1 MetricsConsumer
| } | ||
|
|
||
| private static String getVersion(Map<String, Object> stormConf) { | ||
| return ConfUtils.getString(stormConf, METRICS_VERSION_KEY, VERSION_V1); |
There was a problem hiding this comment.
We should have this in the default config with the default value shown
Sure, will add some tests.
That's a fair point worth addressing directly. Thinking out loud, i.e. what could theoretically live upstream:
From my POV, the real value in this PR is not the interfaces themselves but it's
If we upstreamed only the interfaces to Storm, There's also a timing argument: Storm already ships a V2 metrics API. What it's missing is not these wrapper interfaces, but a first-class migration path for existing V1 users. That's a larger design discussion for the (rather inactive) Storm community, and tying this PR to that would block a useful, self-contained improvement to SC indefinitely. If Storm eventually adopts a similar abstraction, we could migrate |
This commit aligns the opensearch-java module with recent legacy updates, completes the migration to HC5/API 3.x, and cleans up duplicated resources. Refactors and Alignment: - Ported DelegateRefresher for dynamic config reloading (apache#1870). - Adopted Storm V2 metrics bridge via CrawlerMetrics (apache#1846). - Aligned log messages and metric scopes to OpenSearch (apache#1871). - Ported WaitAckCache extraction to centralize bulk-ack logic (apache#1869). - Fixed a race condition in IndexerBolt by inverting the execution order, ensuring tuples are registered in waitAck before bulk dispatch. - Refactored BulkItemResponseToFailedFlag to a Java record with a compact constructor for strict null-safety. Maintenance and Cleanup: - Removed duplicated archetype, dashboards, and opensearch-conf.yaml to prevent maintenance overhead. - Updated README with a migration guide pointing to legacy resources. - Removed dead rat-exclude in root pom.xml.
This commit aligns the opensearch-java module with recent legacy updates, completes the migration to HC5/API 3.x, and cleans up duplicated resources. Refactors and Alignment: - Ported DelegateRefresher for dynamic config reloading (apache#1870). - Adopted Storm V2 metrics bridge via CrawlerMetrics (apache#1846). - Aligned log messages and metric scopes to OpenSearch (apache#1871). - Ported WaitAckCache extraction to centralize bulk-ack logic (apache#1869). - Fixed a race condition in IndexerBolt by inverting the execution order, ensuring tuples are registered in waitAck before bulk dispatch. - Refactored BulkItemResponseToFailedFlag to a Java record with a compact constructor for strict null-safety. Maintenance and Cleanup: - Removed duplicated archetype, dashboards, and opensearch-conf.yaml to prevent maintenance overhead. - Updated README with a migration guide pointing to legacy resources. - Removed dead rat-exclude in root pom.xml.
Thank you for contributing to Apache StormCrawler.
In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:
For all changes
Is there a issue associated with this PR? Is it referenced in the commit message?
Does your PR title start with
#XXXXwhereXXXXis the issue number you are trying to resolve?Has your PR been rebased against the latest commit within the target branch (typically main)?
Is your initial contribution a single, squashed commit?
Is the code properly formatted with
mvn git-code-format:format-code -Dgcf.globPattern="**/*" -Dskip.format.code=false?For code changes
mvn clean verify?Note
Introduce a CrawlerMetrics factory that routes metric registration to Storm V1, V2 (Codahale/Dropwizard), or both APIs based on the config property
stormcrawler.metrics.version("v1" default, "v2", "both"). This enables gradual migration from deprecated V1 metrics without breaking existing deployments or dashboards.