feat(observability) OTel groundwork — Meter, ActivitySource, OPDB metrics#73
Merged
Conversation
…rics
Phase 2 § Scope item 5 (Wave 3, last Phase 2 scope item). Establishes
the project-wide OpenTelemetry instrumentation pattern that Phase 3 / 4 /
5 services inherit, and applies it to OpdbSyncService as the first
real consumer.
What ships:
- src/PinballWizard.Application/Observability/PinballWizardTelemetry.cs —
single project-wide Meter ("PinballWizard") and ActivitySource
("PinballWizard"). All instruments live here under the
pinwiz.<domain>.<operation>.<measure> naming convention. New scrapers
/ services add their counters/activities alongside the existing OPDB
set rather than creating per-domain Meters.
- 6 OPDB sync instruments: Counter<long> for fetched / inserted /
updated / skipped / failed; Histogram<double> for duration_ms. All
carry a `pinwiz.opdb.sync.mode` attribute (apply | dry_run) so
dashboards filter operational charts to apply-only and validation
charts to dry-run-only.
- 1 ActivitySource activity name (pinwiz.opdb.sync) with run-summary
tags for fetched / inserted / updated / skipped / duration_ms.
- IIngestionSourceRepository.RecordRunResultAsync(sourceId, result, ct)
+ IngestionSourceRunResult DTO. Updates LastRunAt; sets
LastSuccessAt on success (preserves on failure); accumulates
TotalDocumentsDiscovered; increments TotalRunFailures on failure.
No-ops with a logged warning if the source isn't seeded yet.
- IngestionSourceIds static class — single source of truth for the
source-id literals used in the seed manifest and at write-back
call sites. Phase 3 manufacturer scrapers add constants here.
- OpdbSyncService instrumented in a try/catch/finally:
- try block runs the existing fetch-and-upsert loop unchanged
- catch block increments OpdbSyncFailed counter, sets the
ActivityStatusCode.Error tag, and rethrows
- finally block emits all per-run counter/histogram observations
AND calls RecordRunResultAsync (apply-mode only; dry-run skips
operator-visible state). Cancellation skips the write-back so
operator-driven Ctrl-C / ACA shutdown doesn't pollute failure
counters; documented in code.
- ServiceDefaults/Extensions.cs registers AddMeter("PinballWizard") +
AddSource("PinballWizard") with OTel. The string literal is
duplicated rather than typed-referenced to avoid a ServiceDefaults →
Application project reference (would invert layering); duplication
documented in both files + docs/observability.md.
- docs/observability.md — full instrument inventory; Activity inventory;
IngestionSource write-back semantics; how to consume in Aspire
dashboard / Log Analytics / Application Insights; standard tag
conventions; Phase 3/4/5 extension pattern; deferred work
(Cosmos RU charge capture → Phase 6).
Local review summary: 1 🔴 / 5 ⚠️ / 5 categories ✅.
🔴 — Fixed: missing failure-path test. Added
SyncAsync_ApplyMode_RepositoryThrows_RecordsFailedRunResultThenRethrows
which uses NSubstitute.ThrowsAsync to make the GetByOpdbIdAsync call
throw, and asserts that RecordRunResultAsync was called with
Succeeded=false despite the exception, plus verifies the exception
propagates. This is the load-bearing path — operators look at the
dashboard precisely when runs are failing.
⚠️ — Fixed:
- IngestionSourceIds.Opdb const replaces magic-string "opdb" in
OpdbSyncService; Phase 3 scrapers add constants here, single source
of truth for the seed-manifest / write-back contract.
- Cancellation comment expanded in OpdbSyncService.cs to make the
trade-off explicit (cancelled run skips write-back; intentional;
remediation noted inline if a future operator wants cancelled runs
recorded as failures).
- docs/observability.md — added a footnote documenting that the
destination Log Analytics table differs (AppMetrics vs customMetrics)
depending on OTLP ingestion path (direct LA vs Application Insights).
⚠️ — Deferred:
- Sibling-drift "DocumentsDiscovered = inserted + updated" math note —
observability.md § "Adding new instruments" already covers per-service
accounting; explicit OPDB-specific note would add noise without
value.
- Drift test for ServiceDefaults literal vs PinballWizardTelemetry
constant — comment already documents the duplication in 3 places
(PinballWizardTelemetry.cs, Extensions.cs, observability.md);
accepted risk.
Tests: 533 / 533 passing (was 524; +9: 5 telemetry name pins + 4
OpdbSyncService instrumentation behavior tests). Build clean, zero
warnings.
| MachineJson("GRBN-MQR4P", manufacturer: "Stern Pinball, Inc.", name: "Stranger Things (Pro)", commonName: "Stranger Things"), | ||
| MachineJson("XYZ", manufacturer: "Jersey Jack Pinball", name: "Wonka", commonName: "Wonka"))); | ||
|
|
||
| _repository.GetByOpdbIdAsync(Arg.Any<string>(), Arg.Any<string>(), Arg.Any<CancellationToken>()).Returns((Machine?)null); |
| _handler.SetResponseFor("/api/machines?page=1&page_size=100", JsonArray( | ||
| MachineJson("GRBN-MQR4P", manufacturer: "Stern Pinball, Inc.", name: "Stranger Things (Pro)", commonName: "Stranger Things"))); | ||
|
|
||
| _repository.GetByOpdbIdAsync(Arg.Any<string>(), Arg.Any<string>(), Arg.Any<CancellationToken>()).Returns((Machine?)null); |
| { | ||
| _handler.SetResponseFor("/api/machines?page=1&page_size=100", JsonArray( | ||
| MachineJson("GRBN-MQR4P", manufacturer: "Stern Pinball, Inc.", name: "Stranger Things (Pro)", commonName: "Stranger Things"))); | ||
| _repository.GetByOpdbIdAsync(Arg.Any<string>(), Arg.Any<string>(), Arg.Any<CancellationToken>()).Returns((Machine?)null); |
Comment on lines
+182
to
+190
| catch (Exception writeBackEx) | ||
| { | ||
| _logger.LogError( | ||
| writeBackEx, | ||
| "OPDB sync completed{State} but recording the run result on " + | ||
| "ingestion_sources failed; the source's lastRunAt / counters may " + | ||
| "lag by one run.", | ||
| failure is null ? string.Empty : " with errors"); | ||
| } |
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes
docs/build-spec.mdPhase 2 § Scope item 5 (Wave 3, last Phase 2 scope item).Establishes the project-wide OpenTelemetry instrumentation pattern that Phase 3 / 4 / 5 services inherit, and applies it to
OpdbSyncServiceas the first real consumer.What ships
PinballWizardTelemetry— single project-wideMeter+ActivitySource(both named"PinballWizard"). New scrapers / services add their counters and activities here under thepinwiz.<domain>.<operation>.<measure>naming convention rather than creating per-domain Meters.fetched,inserted,updated,skipped,failed; histogram forduration_ms. Each carries apinwiz.opdb.sync.modeattribute (apply|dry_run) so dashboards filter operational charts to apply-only and pre-deploy-validation charts to dry-run-only.pinwiz.opdb.sync) with run-summary tags. Trace inspection alone tells the run's full story without joining against the metric stream.IIngestionSourceRepository.RecordRunResultAsync+ newIngestionSourceRunResultDTO — per-run state write-back (LastRunAt,LastSuccessAt, accumulators). No-ops with a logged warning if the source isn't seeded.IngestionSourceIds— single source of truth for the source-id literals at write-back call sites and in the seed manifest. Phase 3 scrapers add constants here.OpdbSyncServiceinstrumented in a try/catch/finally: catch incrementsOpdbSyncFailedand setsActivityStatusCode.Error; finally emits all per-run observations and writes back toingestion_sources(apply-mode only; cancellation skips the write-back intentionally).ServiceDefaults/Extensions.csregistersAddMeter("PinballWizard")+AddSource("PinballWizard"). The string literal is duplicated rather than typed-referenced to avoid a ServiceDefaults → Application project reference (would invert layering); duplication is documented in both files + observability.md.docs/observability.md— full instrument inventory, Activity inventory, write-back semantics, consumer guides (Aspire dashboard / Log Analytics / Application Insights), standard tag conventions, Phase 3/4/5 extension pattern, deferred work (Cosmos RU charge → Phase 6).Test Plan
dotnet test PinballWizard.slnx --nologo→ 533 / 533 passing (was 524; +9 new tests):PinballWizardTelemetryTestspinning instrument names + units (catches dashboard-breaking renames at build time)OpdbSyncServiceTests: apply-mode records run result withDocumentsDiscovered; dry-run skips write-back; write-back failure doesn't mask original sync success; failure-path rethrows withSucceeded=falserecordeddotnet build PinballWizard.slnx --nologo→ clean, zero warningsLocal review
/local-reviewoutcome: 1 🔴 / 5SyncAsync_ApplyMode_RepositoryThrows_RecordsFailedRunResultThenRethrowsusingNSubstitute.ThrowsAsyncto makeGetByOpdbIdAsyncthrow, assertsRecordRunResultAsyncwas called withSucceeded=falsedespite the exception, and verifies the exception propagates. Load-bearing because operators look at the dashboard precisely when runs are failing.IngestionSourceIds.Opdbconst replaces magic-string"opdb"inOpdbSyncService; Phase 3 scrapers add their constants here.docs/observability.mdfootnote: destination Log Analytics table differs (AppMetricsvscustomMetrics) depending on OTLP ingestion path.DocumentsDiscoveredmath: already covered byobservability.md§ "Adding new instruments".7-item self-audit
IngestionSourceSeeder(the closest sibling); both use read-merge-upsert againstIIngestionSourceRepositorywith consistent error semanticscatch { }— all catches are scoped (OperationCanceledException when ...,Exception exwith logging)--source opdbalready dispatches throughIOpdbSyncServicewhich now resolves with the new constructor parameter via DI (verified —AddCosmosPersistenceregistersIIngestionSourceRepository)RecordRunResultAsynccall args; dry-run test verifiesDidNotReceivegit log -1 --format='%an <%ae>'→Jim Keeley <94459922+jkeeley2073@users.noreply.github.com>✅Out of Scope
pinwiz.cosmos.write.ru_charge) — deferred to Phase 6 (operability) perobservability.md§ "Deferred to later phases". Best designed against real production traffic.pinwiz.scrape.<source>.*) — Phase 3+, when manufacturer scrapers gain ACA Job execution.IngestionSourceRepository.RecordRunResultAsyncrepository tests — would require CosmosContainermocking (~40 lines per case); deferred. The OpdbSyncService tests verify the contract is invoked correctly, which exercises the same logic at the integration boundary.🤖 Generated with Claude Code