Improve admin content and routing reliability#374
Conversation
Summary: Enable editing the feed name for content items and processed URLs through the admin UI, using a dropdown of existing feeds to prevent accidental feed creation. Implementation: • Added FeedName property to ContentItemEditData model and updated repository SQL • ContentItemEditorModal now accepts FeedNames parameter and renders a select dropdown • AdminContentItems, AdminProcessedUrls, and AdminReviews pages lazy-load and pass feed names • ContentRepository.UpdateEditDataAsync uses COALESCE to preserve existing feed_name when null • Syncs feed_name to processed_urls table in same transaction for consistency • Added integration tests for feed name read/write operations • Fixed CA1016 assembly version error by adding Version to Directory.Build.props • Fixed IDE1006 naming rule violations across test files (PascalCase for local constants) • Added Setup-UserSecrets.ps1 script for local dev secret provisioning
There was a problem hiding this comment.
Pull request overview
Adds admin UI/API support for editing a content item’s feed_name using an existing-feeds dropdown (avoids creating new feeds and respects the FK), and updates persistence/tests accordingly. PR also includes additional admin filtering, content-processing pipeline/job-tracking work, and transcript-fetcher strategy changes.
Changes:
- Add
FeedNameto the admin edit-data flow (model, modal UI, repository update + processed_urls sync, integration tests). - Add optional
subcollectionNamefiltering for admin “Content Items” / “Processed URLs” lists (API, client, DB queries, UI filters). - Expand content-processing infrastructure (ad-hoc job type + job log tests) and YouTube transcript fetching (yt-dlp fallback + config flags + tests/docs/scripts).
Reviewed changes
Copilot reviewed 34 out of 34 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/TechHub.Web.Tests/Components/HeroBannerTests.cs | Renames local constant for analyzer/style compliance. |
| tests/TechHub.TestUtilities/TestCollectionsSeeder.cs | Renames SQL const identifiers for analyzer/style compliance. |
| tests/TechHub.Infrastructure.Tests/Services/YouTubeTranscriptServiceTests.cs | Adds tests for transcript fetcher fallback strategy. |
| tests/TechHub.Infrastructure.Tests/Services/ContentProcessingServiceTests.cs | Adds tests for ad-hoc processing job tracking; updates naming. |
| tests/TechHub.Infrastructure.Tests/Services/ContentProcessingPipelineTests.cs | Renames unused parameter to _ to satisfy analyzers. |
| tests/TechHub.Infrastructure.Tests/Repositories/ProcessedUrlRepositoryTests.cs | Updates tests/naming around processed URL metadata/reason handling. |
| tests/TechHub.E2E.Tests/Web/ContentDetailTests.cs | Renames base URL field to underscore-prefixed private static field. |
| tests/TechHub.Api.Tests/Endpoints/AdminEndpointsTests.cs | Adds/updates integration tests for feed-name edit-data behavior and various naming cleanups. |
| src/TechHub.Web/wwwroot/css/buttons.css | Adds disabled styling for buttons. |
| src/TechHub.Web/wwwroot/css/admin.css | Adds styling for ad-hoc job badge. |
| src/TechHub.Web/Services/TechHubApiClient.cs | Adds subcollectionName query parameter support to list calls. |
| src/TechHub.Web/Services/ITechHubApiClient.cs | Extends client interface to accept subcollectionName. |
| src/TechHub.Web/Components/Pages/Admin/AdminReviews.razor | Passes feed-name options to editor modal; lazily loads feed names. |
| src/TechHub.Web/Components/Pages/Admin/AdminProcessedUrls.razor | Adds collection/subcollection dropdown filters; passes feed names to editor modal; forwards subcollectionName to API. |
| src/TechHub.Web/Components/Pages/Admin/AdminDashboard.razor | Handles AdHocProcessing job type in dashboard display/mapping. |
| src/TechHub.Web/Components/Pages/Admin/AdminContentItems.razor | Adds collection/subcollection dropdown filters; passes feed names to editor modal; forwards subcollectionName to API. |
| src/TechHub.Web/Components/ContentItemEditorModal.razor | Adds Feed Name dropdown and binds it into ContentItemEditData saves. |
| src/TechHub.Infrastructure/Services/ContentProcessing/YouTubeTranscriptService.cs | Implements strategy selection + yt-dlp fallback; makes fetch methods overridable for tests. |
| src/TechHub.Infrastructure/Services/ContentProcessing/ContentProcessingService.cs | Refactors per-item pipeline into shared method; adds ad-hoc job creation/logging and new tests support. |
| src/TechHub.Infrastructure/Repositories/ProcessedUrlRepository.cs | Adds optional subcollectionName filtering via join to content_items. |
| src/TechHub.Infrastructure/Repositories/ContentRepository.cs | Includes feed_name in edit-data reads; updates feed_name with COALESCE; syncs processed_urls feed_name; adds subcollection filtering. |
| src/TechHub.Infrastructure/Data/Resources/system-message.md | Updates/extends the AI system prompt content and chapter structure. |
| src/TechHub.Core/Models/Admin/ContentProcessingJob.cs | Adds AdHocProcessing job type constant. |
| src/TechHub.Core/Models/Admin/ContentItemEditData.cs | Adds FeedName to edit-data model. |
| src/TechHub.Core/Interfaces/IProcessedUrlRepository.cs | Extends paging API to accept subcollectionName. |
| src/TechHub.Core/Interfaces/IContentRepository.cs | Extends paging API to accept subcollectionName. |
| src/TechHub.Core/Configuration/ContentProcessorOptions.cs | Adds YouTubeExplodeEnabled / YtDlpEnabled flags. |
| src/TechHub.Api/appsettings.json | Sets transcript fetcher flags; removes committed YouTube cookies. |
| src/TechHub.Api/Program.cs | Updates comment to reflect yt-dlp fallback being config-controlled. |
| src/TechHub.Api/Endpoints/AdminEndpoints.cs | Adds subcollectionName query support to admin list endpoints. |
| scripts/Setup-UserSecrets.ps1 | New helper script to populate local user-secrets from Azure/KeyVault. |
| scripts/Rotate-YouTubeCookies.ps1 | Updates rotation guidance to include SOCS cookie. |
| docs/content-processing.md | Documents transcript fetcher fallback + config flags. |
| Directory.Build.props | Sets a Version property (assembly/package metadata). |
✅ PR Preview Environment RemovedThe preview environment for this pull request has been removed. |
Summary: Improves legacy routing, hero banner targeting, content processing diagnostics, and telemetry accuracy so admin and public content workflows behave more reliably. Implementation: - Added latest-roundup legacy redirect handling and roundup canonical URL fixes - Made section and collection route validation case-insensitive with matching API, web, and repository updates - Added hero banner section targeting plus startup/background cache refresh and admin invalidation - Marked 404 request spans successful in telemetry while preserving result codes - Improved AI categorization JSON escape handling, prompt output rules, processing logs, and YouTube transcript diagnostics - Updated tests, fixtures, migration data, and removed generated processed/skipped JSON state files
…ovements Summary: Address all code review feedback on PR #374 — fix log injection risks, silent exception swallowing, case-insensitive filtering, and extend full-text search to index metadata fields. Implementation: • Added .Sanitize() to user-controlled values (videoUrl, collectionName, feedName, subcollectionName, raw.ExternalUrl, ex.Message) in YouTubeTranscriptService and ContentProcessingService log statements • Replaced generic catch blocks in AdminReviews, AdminContentItems, and AdminProcessedUrls with typed HttpRequestException/TaskCanceledException catches and Logger.LogWarning; added missing ILogger injections • Added TaskCanceledException and JsonException catch blocks to hero banner pre-load in Web/Program.cs; added using System.Text.Json • Fixed subcollectionName query: added .ToLowerInvariant() in ContentRepository and changed = to ILIKE in ProcessedUrlRepository for case-insensitive filtering • Cleared findMoreText in migration 011 to match empty findMoreUrl (no dangling label) • Refactored PostgresDialect.TransformFullTextQuery to use regex token extraction, strip tsquery operators/separators, deduplicate terms, and skip single-char fragments • Added migration 012 to extend search_vector with feed_name, subcollection_name, collection_name, primary_section_name at weight D for metadata-driven discoverability • Added tests for PostgresDialectTests (hyphen splitting, operator stripping, deduplication) and ContentRepositoryTests (metadata field search scenarios)
…che JsonException, HEAD→GET try/finally, TransformFullTextQuery empty return
… expression' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Summary: Fix three reliability issues: Blazor infrastructure 404s inflating OTel failure metrics via status-code page re-execution, real client IP attribution in telemetry, and OpenAI endpoint exposure to public internet. Implementation: • Replace UseStatusCodePagesWithReExecute with UseWhen guard in Program.cs to exclude /_blazor/* paths — prevents circuit teardown 404s from being re-executed through /not-found and changing the OTel span DisplayName • Rename NotFoundRequestSuccessProcessor to TelemetryNoiseProcessor; scope suppression to bot user agents only (remove /_blazor fallback, now fixed at source); add EnrichWithHttpResponse to set client.address from the post-forwarded RemoteIpAddress rather than the Container Apps NAT IP • Add InternalsVisibleTo(TechHub.Api.Tests) to ServiceDefaults.csproj so TelemetryNoiseProcessorTests can access the internal processor class • Add BlazorPathStatusCodePageExclusionTests (5 tests) verifying /_blazor paths bypass status code pages; rewrite TelemetryNoiseProcessorTests for bot suppression behavior; fix IDE1006 naming violations in ContentProcessingServiceTests (url->Url, slug->Slug local constants) • Add graceful Blazor.disconnect() teardown in PlaywrightTestBase to prevent 499 errors in Application Insights from abrupt TCP connection kills • Add networkAcls with defaultAction:Deny + ipRules to openai.bicep module; wire adminIpAddresses param through main.bicep to restrict public endpoint • Promote all dotnet_naming_rule severities from warning to error in .editorconfig so IDE1006 naming violations fail the build immediately
Summary: Fix 5 issues flagged by code reviewers: specific exception handling in test teardown, section-filter correctness in hero banner, ShowHeroBanner guard restored, and Azure CLI JSON parse safety in setup script. Implementation: • PlaywrightTestBase.cs: replaced generic catch blocks with filtered catches; added ExceptionDispatchInfo fallback so all cleanup steps always run even if one fails unexpectedly • HeroBanner.razor: fixed section filter to hide restricted cards when SectionName is empty (not just mismatched) • SectionCollection.razor: restored @if (section?.ShowHeroBanner == true) guard around <HeroBanner> • Setup-UserSecrets.ps1: added --only-show-errors, exit code checks, and warning-line filtering before ConvertFrom-Json for both az ad app list and az containerapp show calls
Summary: Simplify boolean ternary in HeroBanner, add when-filters to fallback catch blocks in PlaywrightTestBase to satisfy CodeQL generic-catch-clause rule. Implementation: • HeroBanner.razor: replaced _hashChanged ? false : _storedCookieCollapsed with !_hashChanged && _storedCookieCollapsed (two occurrences, lines 144 and 169) • PlaywrightTestBase.cs: added when (ex is not ...) filters to all four fallback catch blocks so they are the exact inverse of the filtered catch above each one — CodeQL no longer flags them as generic catches while capture-and-continue semantics are preserved
Summary: Address PR review feedback — suppress response body in HEAD middleware and fix two async-without-await build errors in test handlers. Implementation: • Program.cs: redirect context.Response.Body to Stream.Null during HEAD→GET rewrite to suppress body at middleware level, not just transport level • BlazorPathStatusCodePageExclusionTests.cs: remove async keyword from MapPost and MapFallback handlers (no await present), return Task.CompletedTask to satisfy non-async signature and fix CS1998 under TreatWarningsAsErrors
Summary: Revert narrowed exception filters in ContentProcessingService to broad catches with pragma suppressions and clear intent comments; sanitize all ex.Message log writes; fix PostgresDialect whitespace passthrough bug. Implementation: • ContentProcessingService RunAsync per-item catch: removed specific exception type filter; now catches all exceptions (except OperationCanceledException) with CA1031 pragma suppression — per-item errors are recorded in processed_urls and the loop continues with the next URL • ContentProcessingService RunAsync outer catch: removed OutOfMemoryException/StackOverflowException filter; catches all exceptions with CA1031 pragma suppression — RunAsync must never throw so operators can inspect failures in the management screens • ContentProcessingService: added .Sanitize() to ex.Message in content enrichment fallback logAction and in ProcessSingleAsync FATAL ERROR log • PostgresDialect.TransformFullTextQuery: return string.Empty (not the original whitespace string) for null/whitespace input to avoid passing whitespace to to_tsquery • HeroBanner.razor: sync _storedCookieHash and _storedCookieCollapsed backing fields in OnAfterRenderAsync and ToggleAsync so OnParametersSet uses current state on navigation
| // Treat an empty result as no query so callers skip FTS and avoid a to_tsquery syntax error. | ||
| var transformedQuery = hasQuery ? Dialect.TransformFullTextQuery(request.Query!) : null; | ||
| if (string.IsNullOrEmpty(transformedQuery)) | ||
| { | ||
| hasQuery = false; |
There was a problem hiding this comment.
When the user provides a non-empty search query but Dialect.TransformFullTextQuery() returns an empty string (e.g., "C#" or operator-only input), this code flips hasQuery to false and effectively ignores the query, returning unfiltered results. That behavior is surprising and can make invalid/unsupported queries look like they matched everything. Consider returning an empty result set (Items=[]/TotalCount=0) or surfacing a validation error when request.Query is non-empty but produces no tokenizable FTS terms, instead of treating it as ‘no query’.
| // Treat an empty result as no query so callers skip FTS and avoid a to_tsquery syntax error. | |
| var transformedQuery = hasQuery ? Dialect.TransformFullTextQuery(request.Query!) : null; | |
| if (string.IsNullOrEmpty(transformedQuery)) | |
| { | |
| hasQuery = false; | |
| // If the caller supplied a non-empty query but nothing tokenizable remains, | |
| // return an empty result set rather than silently broadening to an unfiltered browse. | |
| var transformedQuery = hasQuery ? Dialect.TransformFullTextQuery(request.Query!) : null; | |
| if (hasQuery && string.IsNullOrEmpty(transformedQuery)) | |
| { | |
| return await Task.FromResult(new SearchResults<ContentItem> | |
| { | |
| Items = System.Array.Empty<ContentItem>(), | |
| TotalCount = 0 | |
| }); |
| #pragma warning disable CA1031 // Intentional: per-item pipeline errors must not abort the entire feed run. | ||
| // Any unexpected failure is recorded in processed_urls so it can be reviewed in the | ||
| // admin management screens and retried. The loop continues with the next URL. | ||
| catch (Exception ex) | ||
| #pragma warning restore CA1031 | ||
| { | ||
| Log(string.Create(CultureInfo.InvariantCulture, $" ✗ Error ({step}): {raw.ExternalUrl} — {ex.Message}")); | ||
| Log(string.Create(CultureInfo.InvariantCulture, $" ✗ Error: {raw.ExternalUrl.Sanitize()} — {ex.Message.Sanitize()}")); | ||
| await _processedUrlRepo.RecordFailureAsync(raw.ExternalUrl, ex.Message, raw.FeedName, raw.CollectionName, reason: null, hasTranscript: null, jobId: jobId, ct: ct); | ||
| errorCount++; |
There was a problem hiding this comment.
The broad per-item catch currently catches all Exception types, including potentially fatal exceptions like OutOfMemoryException. Even for background jobs, it’s safer to let truly fatal exceptions bubble out (or be handled separately) rather than attempting to continue processing in a compromised process state. Consider adding an exception filter (e.g., when ex is not OutOfMemoryException and not StackOverflowException) or otherwise excluding fatal exceptions from this catch.
| #pragma warning disable CA1031 // Intentional: RunAsync must never throw — all errors are recorded in the job log. | ||
| // This is a background service; any uncaught exception would silently crash the run without marking | ||
| // the job as failed. Operators rely on the management screens to detect and diagnose failures. | ||
| catch (Exception ex) | ||
| #pragma warning restore CA1031 | ||
| { | ||
| Log(string.Create(CultureInfo.InvariantCulture, $"FATAL ERROR: {ex.Message}")); | ||
| Log(string.Create(CultureInfo.InvariantCulture, $"FATAL ERROR: {ex.Message.Sanitize()}")); | ||
| _logger.LogError(ex, "Content processing run {JobId} failed with unhandled exception", jobId); | ||
| await TryFailJobAsync(jobId, feedsProcessed, itemsAdded, itemsSkipped, errorCount, transcriptsSucceeded, transcriptsFailed, log.ToString(), ct); | ||
| } |
There was a problem hiding this comment.
Similar to the per-item handler, this outer catch is currently unfiltered and may catch fatal exceptions like OutOfMemoryException. Continuing execution after catching those can make failures harder to diagnose and can lead to further corruption/noise. Consider excluding fatal exceptions via an exception filter (or rethrowing them) while still recording/logging expected operational exceptions.
Summary
Improves the admin content-management workflow and public content reliability. The PR adds feed name editing for content items and processed URLs, strengthens legacy routing and case-insensitive section handling, refreshes hero banner delivery, improves content-processing diagnostics, hardens structured log safety, and prevents telemetry noise from inflating failure rates. Full-text search now indexes metadata fields so content is discoverable by feed name, subcollection, and collection.
Problem
Admins had no way to reassign content items or processed URLs to a different feed without direct database access. Related content-management and public content workflows also had reliability gaps: legacy weekly roundup URLs could miss the latest roundup, uppercase route segments were rejected, hero banner rendering depended on per-request API calls and all-section visibility, AI output could include invalid JSON escapes, and Blazor infrastructure 404s and bot traffic inflated failure-rate alerts and triggered false-positive alerts. Log statements exposed user-controlled strings without sanitization, and some catch blocks swallowed exceptions silently. Full-text search also lacked coverage for structural metadata such as feed name and subcollection name. Real client IP was not attributed correctly in telemetry. The OpenAI endpoint was reachable from any public IP.
Solution
etworkAcls with defaultAction: Deny + admin IP allowlist to restrict public endpoint access
Technical Changes
Core
Infrastructure
API
Web (Blazor)
Service Defaults
Infrastructure (Bicep)
etworkAcls block with defaultAction: Deny and ipRules from an �dminIpAddresses parameter
Editor Config
Tests
eturn Task.CompletedTask to avoid CS1998 under TreatWarningsAsErrors
Scripts and Docs