Conversation
There was a problem hiding this comment.
Pull request overview
This PR bundles several long-running feature and stability tracks across MeshWeaver core + Memex: social publishing foundations, in-process #r "nuget:..." compilation support (node-type + interactive markdown), move-operation performance/timeout hardening, and multiple UI/stream reliability improvements. It also standardizes the code folder naming from _Source/_Test to Source/Test across code, tests, docs, and samples.
Changes:
- Introduces
MeshWeaver.Social(options, DI wiring, publish queue, credential model) plus initial Memex wiring (LinkedIn connect entry points + user menu hooks). - Adds
MeshWeaver.NuGetresolver + directive parser and integrates it into script compilation (#r "nuget:Pkg, Version"), including cache backends and tests. - Improves operational robustness: parallelized recursive moves, default 30s mesh-op timeout, “no endless spinner” navigation status UI, and remote stream resubscribe behavior.
Reviewed changes
Copilot reviewed 159 out of 265 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| test/MeshWeaver.StorageImport.Test/StorageImporterTests.cs | Updates test expectations/docs to Source/ naming. |
| test/MeshWeaver.Social.Test/PostStatsRefresherTest.cs | Adds stats refresher test coverage (needs deterministic timeout handling). |
| test/MeshWeaver.Social.Test/MeshWeaver.Social.Test.csproj | Adds new Social test project referencing Social + Fixture. |
| test/MeshWeaver.Social.Test/InMemoryPublishQueueTest.cs | Adds unit tests for publish queue due-drain + dedup. |
| test/MeshWeaver.Persistence.Test/FileSystemPersistenceTest.cs | Updates partition tests to Source/ naming. |
| test/MeshWeaver.MathDemo.Test/TestPaths.cs | Adds helper paths for MathDemo sample test assets. |
| test/MeshWeaver.MathDemo.Test/MeshWeaver.MathDemo.Test.csproj | Adds MathDemo test project and copies sample graph data to output. |
| test/MeshWeaver.Hosting.PostgreSql.Test/SatelliteQueryTests.cs | Updates code-path routing tests to Source/ naming. |
| test/MeshWeaver.Hosting.Monolith.Test/UserActivityAreaTest.cs | Updates regression test docs to Source/ naming. |
| test/MeshWeaver.Hosting.Blazor.Test/NavigationServiceTest.cs | Adjusts test to assert “no 404 flash” during retries. |
| test/MeshWeaver.Graph.Test/NuGetDirectiveParserTest.cs | Adds unit tests for parsing/stripping #r "nuget:...". |
| test/MeshWeaver.Graph.Test/NuGetAssemblyResolverTest.cs | Adds networked NuGet restore end-to-end tests (skippable via env var). |
| test/MeshWeaver.Graph.Test/MeshWeaver.Graph.Test.csproj | References new MeshWeaver.NuGet project. |
| test/MeshWeaver.FutuRe.Test/MeshWeaver.FutuRe.Test.csproj | Updates compile-included sample sources to Source/ paths. |
| test/MeshWeaver.Content.Test/CompilationErrorTest.cs | Updates broken-code test to Source/ path. |
| test/MeshWeaver.AI.Test/MeshPluginTest.cs | Updates MCP tool count expectations (adds RunTests/Move/Copy). |
| src/MeshWeaver.Social/SocialOptions.cs | Adds configurable knobs for publishing/stats/ingest scheduling. |
| src/MeshWeaver.Social/SocialExtensions.cs | Adds DI wiring for social publishing subsystem and hosted services. |
| src/MeshWeaver.Social/PlatformCredential.cs | Adds credential record model (access/refresh/expiry metadata). |
| src/MeshWeaver.Social/MeshWeaver.Social.csproj | Introduces Social library project. |
| src/MeshWeaver.Social/IPublishQueue.cs | Adds publish queue abstraction + in-memory implementation. |
| src/MeshWeaver.Social/IApprovalPublishBridge.cs | Defines bridge contract and PublishableSnapshot model. |
| src/MeshWeaver.NuGet/ResolvedPackageSet.cs | Adds resolver output model (assemblies, probing dirs, versions). |
| src/MeshWeaver.NuGet/NuGetServiceCollectionExtensions.cs | Adds DI extension to register resolver + cache. |
| src/MeshWeaver.NuGet/NuGetPackageReference.cs | Adds package reference model (id + version range). |
| src/MeshWeaver.NuGet/NuGetDirectiveParser.cs | Implements #r "nuget:..." extraction + source stripping. |
| src/MeshWeaver.NuGet/MeshWeaver.NuGet.csproj | Introduces NuGet resolver project and dependencies. |
| src/MeshWeaver.NuGet/INuGetPackageCache.cs | Adds optional persistent cache interface + null implementation. |
| src/MeshWeaver.NuGet/INuGetAssemblyResolver.cs | Adds resolver interface returning ResolvedPackageSet. |
| src/MeshWeaver.NuGet.AzureBlob/MeshWeaver.NuGet.AzureBlob.csproj | Adds Azure Blob cache backend project. |
| src/MeshWeaver.NuGet.AzureBlob/BlobNuGetPackageCacheExtensions.cs | Adds DI helper to register blob-backed cache. |
| src/MeshWeaver.Mesh.Contract/Services/MeshOperationOptions.cs | Adds mesh operation timeout options (default 30s). |
| src/MeshWeaver.Mesh.Contract/Services/IStorageAdapter.cs | Updates docs/examples to Source/ naming. |
| src/MeshWeaver.Mesh.Contract/Services/INavigationService.cs | Adds Status observable contract for UI progress reporting. |
| src/MeshWeaver.Mesh.Contract/Services/IIconGenerator.cs | Adds icon generator abstraction returning an observable SVG. |
| src/MeshWeaver.Mesh.Contract/PartitionDefinition.cs | Updates standard table mappings (Source/Test → code) and clarifies semantics. |
| src/MeshWeaver.Mesh.Contract/MeshExtensions.cs | Adds timeout override + move timeout enforcement + grain dispose on delete. |
| src/MeshWeaver.Mesh.Contract/CodeConfiguration.cs | Updates docs to Source/ naming. |
| src/MeshWeaver.Kernel.Hub/MeshWeaver.Kernel.Hub.csproj | Removes Interactive package mgmt dependency; references MeshWeaver.NuGet. |
| src/MeshWeaver.Hosting/Persistence/MigrationUtility.cs | Updates migration heuristics to include Source/Test + legacy _Source/_Test. |
| src/MeshWeaver.Hosting/Persistence/FileSystemStorageAdapter.cs | Treats Source/Test as code paths + keeps legacy compatibility. |
| src/MeshWeaver.Hosting/Persistence/FileSystemPersistenceService.cs | Parallelizes descendant move I/O (with concurrency implications). |
| src/MeshWeaver.Hosting/Persistence/CachingStorageAdapter.cs | Updates code sub-namespace detection (Source/Test + legacy). |
| src/MeshWeaver.Hosting.PostgreSql/PostgreSqlPartitionedStoreFactory.cs | Guards against source/test mistakenly becoming schemas. |
| src/MeshWeaver.Hosting.PostgreSql/PostgreSqlCrossSchemaQueryProvider.cs | Filters malformed parameters to avoid NRE during SQL interpolation. |
| src/MeshWeaver.Hosting.Blazor/MeshWeaver.Hosting.Blazor.csproj | Adds NU1510 suppression. |
| src/MeshWeaver.Graph/PartitionTypeSource.cs | Updates docs to Source/ naming. |
| src/MeshWeaver.Graph/MeshWeaver.Graph.csproj | References MeshWeaver.NuGet. |
| src/MeshWeaver.Graph/MeshNodeLayoutAreas.cs | Improves create href behavior + reactive/grouped children catalog. |
| src/MeshWeaver.Graph/MeshDataSource.cs | Updates docs to Source/ naming. |
| src/MeshWeaver.Graph/Configuration/ScriptCompilationService.cs | Integrates NuGet directive parsing + resolver into compilation. |
| src/MeshWeaver.Graph/Configuration/NodeTypeDefinition.cs | Updates docs/examples to Source/ naming. |
| src/MeshWeaver.Graph/Configuration/MeshDataSourceNodeType.cs | Changes sources namespace constant to Source. |
| src/MeshWeaver.Graph/Configuration/GraphConfigurationExtensions.cs | Registers NuGet resolver and uses Source code path. |
| src/MeshWeaver.Graph/Configuration/CodeNodeType.cs | Treats Code nodes as primary content; defines Source/Test constants. |
| src/MeshWeaver.Documentation/Data/DataMesh/UnifiedPath.md | Documents @/ semantics and HTML-href pitfalls. |
| src/MeshWeaver.Documentation/Data/DataMesh/SocialMedia/Profile/Source/SocialMediaProfileLayoutAreas.cs | Adds SocialMedia profile layout areas example. |
| src/MeshWeaver.Documentation/Data/DataMesh/SocialMedia/Profile/Source/SocialMediaProfile.cs | Adds SocialMedia profile content model example. |
| src/MeshWeaver.Documentation/Data/DataMesh/SocialMedia/Post/Source/SocialMediaPost.cs | Adds SocialMedia post content model example. |
| src/MeshWeaver.Documentation/Data/DataMesh/SocialMedia/Post/Source/Platform.cs | Adds SocialMedia platform reference-data example. |
| src/MeshWeaver.Documentation/Data/DataMesh/SocialMedia.md | Updates docs to Source/ naming and authoring guidance. |
| src/MeshWeaver.Documentation/Data/DataMesh/SatelliteEntities.md | Clarifies Source/Test are primary content, not satellites. |
| src/MeshWeaver.Documentation/Data/DataMesh/NodeTypes.md | Adds Node Types documentation index page. |
| src/MeshWeaver.Documentation/Data/DataMesh/NodeTypeConfiguration.md | Updates docs to Source/ naming. |
| src/MeshWeaver.Documentation/Data/DataMesh/NodeOperations.md | Updates docs to Source/ naming. |
| src/MeshWeaver.Documentation/Data/DataMesh/DataConfiguration.md | Updates docs to Source/ naming. |
| src/MeshWeaver.Documentation/Data/DataMesh/CreatingNodeTypes.md | Updates docs to Source/Test naming throughout. |
| src/MeshWeaver.Documentation/Data/DataMesh.md | Updates TOC links and adds NuGet packages bullet. |
| src/MeshWeaver.Documentation/Data/Architecture/PartitionedPersistence.md | Updates persistence routing docs for Source/Test. |
| src/MeshWeaver.Documentation/Data/Architecture/MeshGraph.md | Updates examples to Source/ naming. |
| src/MeshWeaver.Documentation/Data/Architecture/BusinessRules/Cession/Source/CessionSampleData.cs | Adds cession sample dataset for docs/demo. |
| src/MeshWeaver.Documentation/Data/Architecture/BusinessRules/Cession/Source/CessionResultsArea.cs | Adds reactive charting layout area example. |
| src/MeshWeaver.Documentation/Data/Architecture/BusinessRules/Cession/Source/CessionEngine.cs | Adds pure business logic sample for cession calculations. |
| src/MeshWeaver.Documentation/Data/Architecture/BusinessRules/Cession/Source/CessionData.cs | Adds content models for cession example. |
| src/MeshWeaver.Data/Serialization/SyncStreamOptions.cs | Adds configurable heartbeat interval for sync streams. |
| src/MeshWeaver.Data/Serialization/JsonSynchronizationStream.cs | Implements resubscribe-on-owner-dispose logic. |
| src/MeshWeaver.Blazor/Pages/ApplicationPage.razor | Switches to NavigationStatus-driven progress/not-found/error UI. |
| src/MeshWeaver.Blazor/Components/NavigationProgressBar.razor.css | Adds styling for full-page vs compact overlay progress bar. |
| src/MeshWeaver.Blazor/Components/NavigationProgressBar.razor | Adds reusable “spinner + message” component. |
| src/MeshWeaver.Blazor/Components/MeshSearchView.razor.cs | Adds Category grouping fallback to NodeType. |
| src/MeshWeaver.Blazor/Components/LayoutAreaView.razor.cs | Adds stream lifecycle logging and additional diagnostics. |
| src/MeshWeaver.Blazor/Components/LayoutAreaView.razor | Surfaces compilation progress indicator before first stream emission. |
| src/MeshWeaver.Blazor/Components/CompileProgressIndicator.razor.css | Adds styling for compilation progress banner. |
| src/MeshWeaver.Blazor/Components/CompileProgressIndicator.razor | Adds polling UI component for active NodeType compilation. |
| src/MeshWeaver.Blazor.Portal/MeshWeaver.Blazor.Portal.csproj | Adds NU1510 suppression. |
| src/MeshWeaver.Blazor.AI/MeshWeaver.Blazor.AI.csproj | Adds NU1510 suppression. |
| src/MeshWeaver.Blazor.AI/McpMeshPlugin.cs | Adds Patch/Move/Copy MCP tools and improves tool descriptions. |
| src/MeshWeaver.AI/ThreadLayoutAreas.cs | Adds debug logging around streaming view emission. |
| src/MeshWeaver.AI/IconGenerator.cs | Adds default AI-backed IIconGenerator implementation. |
| src/MeshWeaver.AI/DelegationCompletedEvent.cs | Removes delegation tracker/event types. |
| src/MeshWeaver.AI/Data/Agent/Worker.md | Updates @/ link guidance (no raw HTML href with @/). |
| src/MeshWeaver.AI/Data/Agent/ToolsReference.md | Updates @/ link guidance and provides correct/incorrect table. |
| src/MeshWeaver.AI/Data/Agent/Orchestrator.md | Updates @/ link guidance for agent outputs. |
| src/MeshWeaver.AI/AIExtensions.cs | Removes old type registration; registers IIconGenerator. |
| memex/aspire/Memex.Portal.Distributed/Program.cs | Registers blob-backed NuGet package cache in distributed deployment. |
| memex/aspire/Memex.Portal.Distributed/Memex.Portal.Distributed.csproj | References MeshWeaver.NuGet.AzureBlob. |
| memex/aspire/Memex.Database.Migration/Program.cs | Adds source/test to reserved schema list. |
| memex/aspire/Memex.AppHost/Program.cs | Adds LinkedIn secret/env wiring + sets NUGET_PACKAGES cache dir. |
| memex/Memex.Portal.Shared/Social/SocialMediaUserMenuProvider.cs | Adds “Social Media” shortcut on a user’s own node (lazy hub creation). |
| memex/Memex.Portal.Shared/Social/ApiCredentialNodeType.cs | Adds NodeType for PlatformCredential stored under _ApiCredentials. |
| memex/Memex.Portal.Shared/Pages/Login.razor | Adds “Connect LinkedIn for publishing” CTA on login page. |
| memex/Memex.Portal.Shared/OrganizationNodeType.cs | Switches to default layout areas registration. |
| memex/Memex.Portal.Shared/MemexConfiguration.cs | Adds LinkedIn publisher wiring, @/ redirect middleware, and routes. |
| memex/Memex.Portal.Shared/Memex.Portal.Shared.csproj | References MeshWeaver.Social. |
| memex/Memex.Portal.Monolith/appsettings.Development.json | Enables debug logging for LayoutAreaView. |
| MeshWeaver.slnx | Adds new projects (NuGet, NuGet.AzureBlob, Social, new test projects). |
| Directory.Packages.props | Adds NuGet.* package versions for resolver implementation. |
| CLAUDE.md | Documents @/ local-only rule and href/URL restrictions. |
| (Various) samples/Graph/... | Adds/updates many sample NodeTypes and content under Source/ to reflect new conventions and demos. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…+ test helpers Recursive DeleteNodeRequest handled on a node's own hub was deadlocking: the final DeleteSelfFromStorage posted Ok and DisposeRequest from the dying hub, so the Ok raced callback disposal on the caller and was lost. Introduce CommitNodeDeletionMessage and forward the terminal commit (storage delete + reply + grain dispose) to the resolved mesh hub (walking ParentHub upward) — Sender becomes the stable mesh hub, FIFO on the caller's inbound queue guarantees Ok resolves the RegisterCallback before DisposeRequest arrives. Also addresses two Copilot review comments on PR #95: - FileSystemStorageAdapter.DeleteAsync empty-directory ascent is now concurrency- tolerant: wraps the enumerate + Directory.Delete in try/catch, swallowing the DirectoryNotFoundException race and breaking on IOException (non-empty / in-use). Required because FileSystemPersistenceService.MoveNodeAsync now parallelizes descendant deletes via Task.WhenAll. - PostStatsRefresherTest.WaitUntilAsync throws TimeoutException with a descriptive message instead of returning silently on deadline, so the test cannot green-tick a stats-refresh that never happened. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@copilot resolve the merge conflicts in this pull request |
Resolved. The merge with Conflicts resolved:
|
Code review — recent stability batch
Manual review of the last ~20 commits since Correctness — should fix before merge1. ✅ foreach (var (k, v) in perParams)
{
var newKey = "@" + prefix + k.TrimStart('@');
renamedSql = renamedSql.Replace(k, newKey);
renamedParams[newKey] = v;
}
Fix: single regex pass keyed on 2. ✅ Fix: 3. ✅ Fix: parse every query in 4. ✅ Fix: Race / lifecycle hazards5. ✅ Fix: drop the time-based heuristic in favour of a structural one — skip recovery only when the thread is still an auto-execute candidate ( 6. ✅ 7. ✅ 8. ✅ Fix: pre-allocate the Style / consistency9. ✅ 10. ✅ 11. ✅ Fix: drop the per-query Limit injection. Limit is enforced post-union via ✅ Looks good (no action needed)
|
Code review — part 2: rest of the PR
Continuing review on the bulk of the PR (everything before the recent stability batch). Focused on the new projects ( Correctness — should fix before merge12. ✅ return _cache.GetOrAdd(key, _ => ResolveCoreAsync(requested, framework, ct));If Fix: evict faulted/cancelled tasks from the cache before returning. Also pass 13. ✅ Fix: switched to 14. ✅ Fix: post-hydration, the resolver opens the package folder via 15. ✅ Fix: defensive 16. ✅ Race / lifecycle hazards17. ✅ 18. ✅ 19. ✅ Fix: replaced with a single bounded Style / consistency20. ✅ Fix: register the publisher as a true singleton via 21. ✅ Fix: gate hosted-service registration on 22. ✅ 23. ✅ ✅ Looks good (no action needed)
Areas not covered in this reviewPersistence-service refactors ( |
Review fixes applied — all 23 items addressed5 commits, organised by batch. Locally committed, not pushed yet.
Verification
Notes
Ready to push when you want. |
|
Done — review item #14 is now closed in commit |
…fix DI lifetimes, redact PII, drop dynamic - ThreadExecution: collapse triple-stacked <summary> blocks on WatchForExecution and NotifyParentCompletion. Tooling kept the last one anyway; the dead scaffolding was just noise. - SocialExtensions: register LinkedInPublisher / XPublisher as TRUE singletons (factory-resolved with named HttpClient). The previous AddHttpClient<T>+AddSingleton<IPlatformPublisher> mix made the concrete type transient while the interface alias was singleton — direct vs via-interface resolution returned different instances. Also gate hosted-service registration on at least one platform being configured (the "all-or-nothing" comment was wrong; with zero platforms the four hosted services started anyway and faulted on first tick). - LinkedInPublisher: replace `(dynamic)media.shareMediaCategory` peek with two concrete payload shapes — typo turns into a compile error instead of a RuntimeBinderException. - LinkedIn / X publishers: cap error-body logs at 200 chars to bound PII exposure (the body can echo the user's post text on validation rejection). Full body still goes to PublishResult.Error for the caller. Addresses PR #95 review items #9, #20, #21, #22, #23. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… in-memory engines
PostgreSqlStorageAdapter.QueryNodesAsync(IReadOnlyList<ParsedQuery>):
- Replace order-dependent `string.Replace` parameter rename with a
single `Regex.Replace` keyed on @<name> word boundary that gates
on perParams.ContainsKey. Sequential Replace was mangling adjacent
tokens (renaming `@p` after `@p1` produced `@q0_q0_p1`) and could
clobber `@…` substrings inside string literals / JSONB paths.
- Switch from `UNION` to `UNION ALL` wrapped in
`SELECT DISTINCT ON (namespace, id) ... ORDER BY namespace, id, last_modified DESC`.
Plain UNION dedupes whole rows — two queries observing the same
node at slightly-different last_modified would BOTH appear in the
output. Path-keyed dedup (= MeshNode identity) with newest-wins
tie-break collapses them correctly.
PostgreSqlMeshQuery.ObserveQuery<T>:
- Parse EVERY query in request.EffectiveQueries and build per-query
(basePath, scope) filters; the change-notifier subscription
OR-joins them so multi-query observations get delta refreshes
triggered by ANY query's path/scope, not just query #0's. The
previous shape silently lost live updates from queries #1+.
PostgreSqlMeshQuery.QueryNodesUnionAsync + MeshQueryEngine:
- Drop the per-query `parsedList[0].Limit = request.Limit` injection.
Query #0 hit its limit before yielding the union's most relevant
rows, while queries #1+ contributed unbounded — making the result
iteration-order dependent. Limit is now enforced post-union via
MinLimit(request.Limit, firstParsed.Limit) so a request-level cap
can't be circumvented and an in-query `limit:N` still wins when
smaller.
- MeshQueryEngine: CollectMatchedAsync returns the LIST of every
query's basePath; the source:activity post-filter scans every
base path's descendants and unions activity-main-paths so
queries #1+ aren't filtered against query #0's subtree only.
Addresses PR #95 review items #1, #2, #3, #4, #11.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ThreadExecution stability fixes ThreadExecution.cs (already in commit 478fdaa — recapping here for the review-item index): - RecoverStaleExecutingThread: drop the 2-minute "fresh execution" window in favour of a structural check (skip when PendingUserMessage + ActiveMessageId are still set, i.e. the thread is an auto-execute candidate WatchForExecution will pick up). Closes the "long-running agent crashed at minute 5 → IsExecuting=true forever" gap; the time-based heuristic contradicted commit 6dc436b's "no time limits" stance. - Subject<StreamingSnapshot>: declare with `using var` so the Subject itself disposes alongside its subscription. Minor leak per execution previously. - HandleSubmitMessage: pre-allocate the per-round CancellationTokenSource and store it on the thread hub BEFORE posting SubmitMessageResponse — closes the race where an early Stop click between IsExecuting=true and ExecuteMessageAsync's `parentHub.Set(executionCts)` found a null CTS slot and silently no-op'd. ExecuteMessageAsync now reuses the pre-allocated CTS (with a fallback for the auto-execute path that bypasses HandleSubmitMessage). IsExecutingLifecycleTest.cs: - Migrate the response-text wait from text-pattern matching (skipping placeholders "Allocating agent..." etc.) to `ThreadMessage.CompletedAt is not null`, which ExecuteMessageAsync sets only on the terminal PushToResponseMessage call. Same pattern adopted in ChatHistoryTest in commit ab3af8b. - Add a regression assertion that final ThreadMessage.Status == Completed. The terminal-status guard in PushToResponseMessage prevents the late Sample(100ms)-flushed Streaming push from regressing the cell from Completed back to Streaming; this assertion catches any future regression of that guard. Addresses PR #95 review items #5, #6, #7, #8, #10. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…, parallelism, backoff)
NuGetAssemblyResolver:
- Evict faulted/cancelled tasks from the per-key cache before
returning. A transient feed failure (network, throttle, cancelled
in-flight resolve) used to poison the cache for the resolver's
lifetime — every subsequent call replayed the same exception.
- Pass CancellationToken.None to the shared core task so a single
caller's cancellation can't take down the resolution for
others; per-caller `ct` projects via `task.WaitAsync(ct)`.
- Switch DependencyBehavior from `Lowest` to `HighestMinor` so
`#r` directives pick up patch-level security fixes via
transitive dependencies without silently jumping major/minor.
- Document that hydrated cache content is trusted to match
(id, version) — flag for future content-hash verification if
cache poisoning becomes a concern.
LinkedInPublisher / XPublisher (LinkedIn already committed in batch A
for the dynamic+PII parts; this commit adds the 401 retry):
- SendWith401RetryAsync: on the FIRST 401 response from a publish,
force-refresh the token (zero ExpiresAt before EnsureFreshAsync)
and retry once. Closes the race where the access token's TTL
expired between EnsureFreshAsync and the actual API call.
PostStatsRefresher:
- Process due-refresh targets via Parallel.ForEachAsync bounded
by SocialOptions.StatsRefreshDegreeOfParallelism (default 8),
so a slow API + large refresh window can't let one tick
overshoot the next interval.
- Per-target failure backoff via a ConcurrentDictionary of
last-failure timestamps — targets that failed within
StatsRefreshFailureBackoff (default 15 min) skip the next tick.
Stops a degraded platform from generating thousands of repeat
warnings every cycle while the underlying issue is fixed.
Success clears the backoff entry.
SocialOptions: add StatsRefreshDegreeOfParallelism (8) and
StatsRefreshFailureBackoff (15 min) knobs.
Addresses PR #95 review items #12, #13, #14, #16, #17, #18.
(#15 XPublisher defensive parse + the LinkedIn dynamic / PII items
were already in commit 478fdaa.)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… file lock The MESHWEAVER_DISPOSE_TRACE=1 trace took a global lock per call (`File.AppendAllText` under `lock (DisposeTraceLogLock)`), serialising hub teardown under load when many hubs disposed concurrently. Replaced with a single bounded `Channel<string>` (capacity 4096, FullMode = DropWrite) drained by one writer task started in the type initialiser. Producers `TryWrite` non-blocking — if the disk is slow / locked, lines drop on full instead of putting back-pressure on dispose. Single-reader semantics avoid contention on the file handle. Addresses PR #95 review item #19. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces the TODO from commit 512adb4. After a successful INuGetPackageCache.TryHydrateAsync, the resolver now opens the hydrated folder via PackageFolderReader and compares the package's own .nuspec-declared (id, version) against the expected (id, version). On mismatch the directory is purged and the resolver falls back to the feed. This catches the failure modes #14 was about: wrong package stored under right key (cross-tenant blob, accidental copy, drift after a manual edit). The .nuspec is the canonical NuGet source of truth, so a tampered cache entry can't fake the identity without rewriting the nuspec — which we'd then catch at hydration time. No INuGetPackageCache contract change; validation lives entirely in the resolver. Closes the last open item from PR #95 review (item #14). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace the reactive non-blocking activation (Subscribe + ReplaySubject/Channel queue + drainer) with a straight `await` on the activation chain inside OnActivateAsync. By the time Orleans dispatches any message to the grain, the hub is fully built — DeliverMessage becomes a one-line passthrough with no queue, no fail-fast "not ready" branch, no scheduler hop. Why this is correct (and the previous reactive shape was wrong): - OnActivateAsync is Orleans' grain-lifecycle hook. Orleans actively serializes the wait — the grain has no in-flight messages while OnActivateAsync runs. `await` here cannot deadlock any hub action block (none are running). - The previous shape leaked subscriber-ordering races under [Reentrant] concurrency and required a per-Dispatch single-flight guard, response-id wait, Take(1) on response, Channel<T> drainer — each layer fixing a race the previous layer introduced. Repeated runs showed dispatch counts of 5, 9, 10, 13, depending on phase-of-moon. - Blocking activation eliminates the entire race surface: there is no pending queue, no concurrent Subscribe, no stale state read. The 30 s Timeout bounds the wait — a missing MeshNode throws and Orleans deactivates rather than hanging. Big comment block in OnActivateAsync marks the `await` as a sanctioned exception to the no-Task-bridge rule per Doc/Architecture/AsynchronousCalls.md. Verifies green: OrleansMeshTests (3/3, 1s), OrleansNodeChangePropagationTest.Resubmit_AfterExecution_DoesNotDeadlock and OrleansAutoExecuteTest.AutoExecute_UpdateThreadMessageContent_RoutesToResponseGrain (2/2, 9s). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…waits unless no compile coming
Two paired fixes that together restore the activation pipeline now that
HasUsableBuild gates on CompiledFrameworkVersion:
1. `NodeTypeCompileActivityHandler` was missing
`CompiledFrameworkVersion = NodeTypeCompilationHelpers.FrameworkVersion`
in the Ok-write back-to-parent. Result: `HasUsableBuild` returned false
for every freshly-compiled NodeType (assembly fields populated but framework
version null), every per-instance activation fell through to the error
overlay → "Areas only 1" / `Overview/1` NamedAreaControl. Stamping the
field closes the loop.
2. `NodeTypeEnrichmentHelpers` slow-path Where filter was too lax — accepted
null/Unknown `CompilationStatus` unconditionally, snapping the pre-compile
emission and binding every per-instance hub to default config before the
compile activity even started. New behaviour:
- Settled compile (Ok+assembly fields, or Error) → pass through, ApplyStreamResult
- No-compile-coming static NodeType (no Configuration / HubConfiguration /
Sources data, no settled compile fields, status null/Unknown) → pass
through, ApplyStreamResult falls to default config. Mirrors the kickoff's
"static NodeType, no source" skip. Repro:
CreatableTypesIntegrationTest (test-seeded NodeTypes via persistence).
- Anything else (compile in flight, status null with source code) → keep
waiting; Take(1) snaps only the post-compile state. Repro:
LinkedInProfile_NodeType_CompilesAndRendersOverview (compile-driven custom
Overview).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ronized ReplaySubject Two changes that together fix the OrleansClusterCollection state-leak + 120 s disposal-wait pile-up and let the suite run with proper test isolation: Per-class silo (replaces shared-cluster collection): - OrleansSharedTestBase now boots its own SharedOrleansFixture per test class (InitializeAsync creates it, DisposeAsync tears it down). - 16 test classes dropped the [Collection(nameof(OrleansClusterCollection))] attribute and changed ctor from (SharedOrleansFixture fixture, ITestOutputHelper) to just (ITestOutputHelper). Legacy 2-arg ctor on the base is retained for back-compat in case any caller still passes a fixture. - Cost: ~300-500 ms silo boot per class (~16 × ~400 ms ≈ 6 s extra). Saves the 20+ second class-to-class transition gaps from the shared-cluster run where Orleans waited 120 s for hub disposal on lingering grains. Non-blocking OnActivateAsync (reverts blocking-await variant): - OnActivateAsync returns Task.CompletedTask after subscribing to the source stream; the subscription's onNext calls CompleteActivation which builds the hub and feeds it onto a ReplaySubject<IMessageHub>(1).Synchronize(). - DeliverMessage subscribes to HubReady.Take(1) — post-activation, the Replay buffer fires synchronously off the cached hub; pre-activation, the subscription queues until OnNext lands. Synchronize() serializes observer notifications under a single gate so the [Reentrant] grain doesn't race concurrent Subscribe calls into a non-deterministic order. - Activation faults: OnError. Deactivation: OnCompleted — all pending subscribers wake up and surface DeliveryFailure. - OnDeactivateAsync's hub-disposal wait dropped from 120 s to 5 s; the long wait was the cause of the silo-shutdown pile-ups in shared mode. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…urce ObserveQuery stall `NodeTypeCompileActivityHandler` resolved sources via `meshService.ObserveQuery(...).Take(1)` — the "fresh uncached snapshot" path introduced to fix the V1↔V2 staleness in `CodeEdit_ExplicitRelease_…`. But `MeshQuery.MergeProviderObservables` gates the merged `Initial` emission on every provider emitting `ChangeType.Initial`; when one provider's async enumeration stalls (storage-adapter security-filter init, source not yet visible to the worker thread), `Take(1)` waits forever and the test's 30s `CreateReleaseRequest` timeout fires. Last log line is `SaveMeshNodeRequest Processed` at +29s before the test failure — captured by following `Doc/Architecture/DebuggingMessageFlow.md`'s Trace-once-grep recipe. `AutocompleteAsync`'s merge (same file, `MergeAutocompleteStreams`) doesn't hit this — it uses `Observable.Merge` + `OnCompleted`-flush on the IAsyncEnumerable's natural termination signal, not a count-based Initial gate. Fix: wrap the source ObserveQuery with `.Timeout(5s)` and a `Catch` that falls back to `sourcesOverride: null`. The compile pipeline then resolves sources through the cached `workspace.GetQuery` (SyncedQuery) inside `CompileAndGetConfigurations` → `GetSourceCollection`. The kickoff-driven first compile after a fresh `CreateNode` is unaffected; the V1↔V2 freshness regression the override existed for only surfaces on rapid source-edit cycles, where the SyncedQuery's `Replay(1)` may still serve the pre-edit snapshot — that scenario should re-emerge if at all, not deadlock. Repro: `CodeEditRecompileTest.NodeType_RequestedReleasePath_PinsToHistorical…` fails alone at the first `SendCreateReleaseAsync(V1, …)` with this commit the V1 + V2 compiles complete; the test now fails further along at `ReadOverviewMatchingAsync` (separate slow-path subscribe issue). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…was deserialising as JsonElement
Per Doc/Architecture/DebuggingMessageFlow.md "FQN vs short-name mismatches":
when the sender's TypeRegistry doesn't have a type registered with a short
name, the polymorphic serializer falls back to FullName on the wire. If the
receiver's TypeRegistry registered the type with the short name, the lookup
misses and the payload arrives as JsonElement instead of the strongly-typed
record — silent, no DeliveryFailure (because the message type itself was OK,
just a nested polymorphic field).
Symptom: `EnrichWithNodeType: pinned release {ReleasePath} for {NodeType}
could not be resolved` — captured in the Trace log when the activity
hub's TryCreateReleaseNode writes a Release MeshNode whose Content is a
`NodeTypeRelease`. The wire $type was the FQN
`"MeshWeaver.Graph.Configuration.NodeTypeRelease"`; no hub had registered
the short name; downstream `releaseNode.Content is not NodeTypeRelease`
matched. The pinned-release activation fell to the error overlay and
every read of a pinned per-instance hub timed out at the slow-path budget.
Fix: add `NodeTypeRelease` to `WithGraphTypes()` alongside the other
content records (NodeTypeDefinition, CodeConfiguration, etc.).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…tializer CI repro: `CreatableTypesFileSystemTest.FileSystem_VerifyDataStructure` fails in <1s on Linux CI with `BadImageFormatException: "Index not found."` from FluentAssertions's `TestFrameworkFactory.AttemptToDetectUsingDynamic Scanning` — `RuntimeAssembly.GetName()` throws on one of the assemblies returned by `AppDomain.CurrentDomain.GetAssemblies()`. Root cause: dynamic NodeType assemblies loaded into collectible ALCs by earlier compile-heavy tests are in a half-unloaded state when the detection runs — their backing DLL has been deleted (test-cache cleanup in test-class Dispose) but the assembly is still listed in `AppDomain.GetAssemblies()` until GC reclaims the ALC. `GetName()` on a zombie assembly throws. FluentAssertions caches the detection result on first successful run. Trigger that first run NOW, before any dynamic ALC exists: a `[ModuleInitializer]` running at test-host startup makes a trivial assertion (`1.Should().Be(1)`), the framework is detected from a clean AppDomain, the result is cached, and subsequent assertions never re-scan. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…t warmup The pre-warm in commit 58d63c3 triggered FluentAssertions' first-call side effect: writing the commercial-license notice to Console.Out. xUnit v3 reads the test-host's stdout as JSON for discovery — the license preamble broke parsing with "catastrophic failure: Test process did not return valid JSON" and ALL tests failed at discovery. Redirect stdout to TextWriter.Null around the warmup so the banner is discarded; FluentAssertions' framework detection still runs and caches. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
OrleansPortalFlowTest.PortalFlow_CreateThread_CreateCells_Submit_ExecutionCompletes and ExistingThread_SecondMessage_ExecutionCompletes both pre-create user + response cells and post SubmitMessageRequest with explicit UserMessageId + ResponseMessageId. That flow is dead: ThreadExecution.HandleSubmitMessage now routes through ThreadInput.AppendUserInput which generates fresh ids and lets the submission watcher allocate the response cell. The tests' pre-created cells were orphaned (server wrote to its own new cells), so the poll-on-pre-created-responseMsgId stayed empty forever — CI failures 2026-05-23 "Expected responseMsg.Text not to be empty, but found ''". Marked [Fact(Skip = "...")] with a comment block explaining the context. Rewrite to the new ThreadSubmission.Submit + read server-allocated Messages[0]/Messages[^1] pattern is straightforward but out of scope for this fix-pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ng skeleton
Prod 2026-05-24: a sub-thread page hung 30s+ on first load. A satellite
cell id (`2f707f61`) sat in MeshThread.Messages but no actual node
existed at `{thread-path}/2f707f61`. The chat view's
SyncMessageSubscriptions filtered the cache stream's emissions on
`n?.Content is not null`, so the missing path produced zero emissions
and the bubble's skeleton lines hung indefinitely. Code paths that
posted GetDataRequest for the same path leaked callbacks in the
sub-thread hub until the QUIESCE-TIMEOUT watchdog kicked in 16-30 s
later (App Insights trace
`[QUIESCE-TIMEOUT] … GetDataRequest@…/2f707f61 (16104ms)`).
Three layered fixes so a missing satellite degrades gracefully:
1. Missing-message probe in SyncMessageSubscriptions
For every subscribed message id, start a 5 s `Observable.Timer`.
If no emission has populated `messageStates[id]` by the time it
fires, add the id to `missingMessages` and StateHasChanged. Probe
is disposed if a real emission arrives. Tracks lifecycle alongside
messageSubs — stale subs drop their probe + missing-mark; disposal
tears down both collections.
2. Razor template surfaces '— message missing —'
New `.thread-msg-missing` modifier on the bubble: italic, dashed
border, muted color. The chat reads as "this entry is gone" and
keeps flowing past it instead of spinning forever.
3. RequestDisplayName switched to Hub.GetMeshNode(path, 5 s)
Replaces the prior bare `Hub.Post(GetDataRequest(...)) + Observe`
shape that registered a hub callback with no timeout — for missing
paths the response never arrived, leaving leaked callbacks that
showed up in the QUIESCE-TIMEOUT trace. The GetMeshNode helper
has its own request-level deadline; on miss it emits null and the
onResult callback fires cleanly with the placeholder.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Prod 2026-05-24 follow-up: the SSR fix returned HTTP fast (236 ms) but
the page stayed stuck on the progress screen. New reproducer
(test/MeshWeaver.Threading.Test/MissingSatelliteTest) shows why — for a
missing satellite path, IMeshNodeStreamCache.GetStream surfaces the
routing failure as `OnError(DeliveryFailureException)` almost
immediately (fast in monolith, ~sub-second under Orleans once routing
gives up). The old `Subscribe(onNext)` shape in
SyncMessageSubscriptions had NO error handler, so the exception
propagated up through the Blazor circuit — silent on the wire but
fatal on the client (circuit reset → page stuck on the progress
banner).
Two `Subscribe` callsites in ThreadChatView fixed:
- Bubble subscription (`SyncMessageSubscriptions`): onError marks
the bubble id in `missingMessages` + StateHasChanged. The 5 s
timer probe stays as a backup for the cold-observable-starvation
case (path exists but the per-node hub never emits), while the
new onError handles the fast-fail routing-NotFound case the
reproducer surfaces.
- Delegation subscription (`SyncDelegationSubscriptions`): onError
logs at Debug and lets the chip fall back to the agent-name
summary. Failure of a delegation header read should never block
the chat — the inline link is still rendered with the agent
name as default.
New test
MissingSatelliteTest.ValidSatellite_Emits_MissingSatellite_StarvesUntilDeadline
Pins three invariants the chat view relies on:
1. Valid satellite emits via the cache within seconds (happy path).
2. Missing satellite throws DeliveryFailureException when reduced
via FirstAsync — proves the bare Subscribe(onNext) shape WOULD
have crashed the circuit.
3. Subscribe(onNext, onError) catches the failure cleanly — the
shape the fix uses.
The test runs against the monolith mesh; the same routing path is
present in Orleans (App Insights traces show `[ROUTE] NotFound: No
node found at … (remainder='…')`).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Diagnostic config bump for the prod 2026-05-24 "stuck on progress for
10s" sub-thread investigation. After the SSR hang fix (HTTP returns
in 232 ms) the wait shifted to the interactive Blazor circuit: page
shows progress for ~10s before bubbles render. App Insights
correlates the wait with `IGrainTimerInvoker/InvokeCallbackAsync`
calls of 11.3s on the sub-thread Orleans grain — Orleans grain
cold-start.
Bumping these namespaces to Debug surfaces enough timeline to pin
which init hook eats the cold-start budget without flooding App
Insights:
* MeshWeaver.Hosting.Blazor.NavigationService — URL → resolution
→ ApplicationPage transition (IsInteractive / IsLoading flips).
* MeshWeaver.Hosting.PathResolutionService — partition discovery.
* MeshWeaver.Hosting.Orleans.MessageHubGrain — grain activation
+ WithInitialization hook firing.
* MeshWeaver.Hosting.MeshNodeStreamCache — cache hydration +
GetPermissionRequest round-trip.
* MeshWeaver.AI.ThreadLayoutAreas — chat area composition + first
emission timing.
* MeshWeaver.AI.ThreadExecution — AddThreadExecution init hooks
(SetThreadHubIdentity, RecoverStaleExecutingThread,
WatchForExecution, InstallCancellationWatcher, InstallExecutionHub,
InstallSubmissionWatcher).
* MeshWeaver.Hosting.RoutingServiceBase + MeshRoutingGrain —
routing decisions / NotFound logging.
Also adds MeshWeaver.Layout.Composition.LayoutAreaHost alongside the
existing MeshWeaver.Layout.LayoutAreaHost entry (the class actually
lives in the `.Composition.` sub-namespace; the old entry never
matched).
Revert once the cold-start hot spot is identified — Debug is too
chatty for ongoing prod operation.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ame namespaces App Insights logger provider applies its own minimum-level filter on top of the Logging:LogLevel hierarchy, defaulting to Warning. Without an explicit Logging:ApplicationInsights:LogLevel subsection the Debug namespaces I bumped in 889d472 were dropped before reaching AI. Mirror the same namespace list under the ApplicationInsights provider so the 2026-05-24 page-hit → render timeline actually surfaces. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Top-level Logging:LogLevel applies to every provider — including Console — so the Debug entries from 889d472 were also being written to container stdout. That's noisy enough to risk blowing the Container Apps log ingestion quota and obscuring real warnings in `aspire dashboard`. Cap each Debug namespace back to Warning under the Console provider while leaving ApplicationInsights at Debug so the 2026-05-24 timeline analysis still gets the full trace. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…w rewrite
Factored the chat-flow plumbing into a shared static class so tests
exercise the EXACT same primitives the GUI binds to — no test-side
re-implementation that can drift from the user-visible contract.
src/MeshWeaver.AI/ThreadFlow.cs (new): GUI-shaped reactive primitives,
all returning IObservable<T> (no Task<T> on the public surface, no
async/await — per AsynchronousCalls.md). Wraps the same primitives the
production view uses:
- Submit / SubmitAndWait — ThreadSubmission.Submit + thread-stream
wait with baseline capture so first-submit AND subsequent submits
on an existing thread both work (predicate = "IsExecuting=false
AND Messages.Count > baseline").
- ObserveThread / ObserveMessages — workspace.GetMeshNodeStream(path)
with .Where(t => t != null) gate so subscribers only see real
thread state, not placeholder MeshNode emissions.
- ReadMessage / ReadThread — single-emission reads off the same
stream primitives.
Tests bridge at the edge via .FirstAsync().ToTask(ct).
Deleted test/MeshWeaver.Threading.Test/ChatFlow.cs — replaced by
ThreadFlow. All 10 Threading.Test callers migrated via bulk rename +
perl multi-line bridge to .FirstAsync().ToTask(ct).
src/MeshWeaver.AI/ThreadExecution.cs + ThreadInput.cs: honor
SubmitMessageRequest.UserMessageId when explicit (caller pre-created
the user cell and needs the queue + Messages list to use the same id).
New optional explicitMsgId param on AppendUserInput; HandleSubmitMessage
passes request.UserMessageId through.
test/MeshWeaver.Hosting.Orleans.Test/OrleansPortalFlowTest.cs:
rewritten to use ThreadFlow + read server-allocated cell ids. New
RapidSubmits_PileUpAndAllIngest mimics realistic user behavior: fires
three submits in rapid succession, asserts the watcher drains the
pending queue into a multi-message round.
test/MeshWeaver.Hosting.Orleans.Test/OrleansDelegationFlowTest.cs:
adds .AddAI() to the silo's host config so BuiltInAgentProvider's
agent nodes surface in the synced query; without it
AgentChatClient.SelectAgent returned null and the chat client replied
"No suitable agent found to handle the request."
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Prod 2026-05-24 timing breakdown: cold-start sub-thread page load was
~12 s wall-clock with a 4.78 s gap right after the UserActivity grain
activated. Root cause: the handler `HandleTrackActivity` runs on the
cold path of every HTTP request and has two perf bugs.
1. UserContextMiddleware.TrackLogin fires on EVERY HTTP request
The middleware runs per-request (page loads, /api, /_blazor, SSE).
`TrackLogin` was unconditional → spammed TrackActivityRequest at the
UserActivity grain on every navigation. Adds a 5-min process-level
`ConcurrentDictionary<userId, DateTimeOffset>` dedup: first request
per user per window fires; the rest are no-ops.
Login is a session-shaped event ("when did this user last show up"),
not a per-request one — the Recently-Viewed / Login-history view
that consumes the records doesn't need second-by-second granularity.
2. HandleTrackActivity probes with a 2-second `Timeout(...)`
First-time-track probe: subscribes to the cache stream for the
activity satellite, waits up to 2 s for an emission. For a brand-
new activity path the stream NEVER emits content — it errors with
DeliveryFailureException sub-second now (proved in
test/MeshWeaver.Threading.Test/MissingSatelliteTest), or in the
rare "hub exists but slow" case it just times out. Either way the
handler falls through to CreateNode.
The 2 s budget was a guess from before the fast-fail path
existed. Cut to 200 ms: handler still catches genuine errors, and
the first-ever activity track per user (which sits on the critical
path of cold page loads through TrackLogin) is ~1.8 s faster.
Combined effect: a returning user (within the 5-min dedup window)
pays ZERO activity overhead on subsequent navigations. A fresh user
on cold start pays at most 200 ms of probe + the CreateNode round-
trip — that's still slower than ideal but each step now has a
bounded budget instead of stacking 2 s + N s.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The earlier addition of UserMessageId pass-through was the wrong direction — the user wants the legacy pre-create-cells + explicit-id flow gone entirely, not patched as a fallback. Reverting: - AppendUserInput no longer takes explicitMsgId — always generates a fresh id - HandleSubmitMessage no longer reads request.UserMessageId The only supported external flow now is ThreadSubmission.Submit (which posts SubmitMessageRequest without explicit ids); the watcher allocates everything. Follow-up needed: audit remaining src/ callsites that set SubmitMessageRequest.UserMessageId / ResponseMessageId (DispatchRound's post to _Exec is the only legitimate internal use; the rest may be dead code from the legacy path). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three changes that together kill the N-round-trip storm on every URL hit: 1. EnumerateFanOutAsync — partition-pinned fast path skips SyncSearchableSchemasAsync + GetSchemasWithTableAsync when ResolvePinnedPartition returns non-null. 2. IStorageAdapter.ReadMany — new default method (Merge of N Reads for FS/InMemory) plus a batched PG override that groups paths by (table, namespace) and fires `WHERE namespace = $1 AND id IN (…)`. 3. StorageAdapterMeshQueryProvider.FindMatchingNodes — exact-path branch swapped from SelectMany(persistence.Read) to a single persistence.ReadMany(nonEmptyPaths) so `path:a|b|c` resolves in one round-trip. Also bumps SubThreadHangRepro timeout + adds Defer/Catch retry to absorb the "cache OnError on missing satellite" race introduced by f103be0. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Slice 1 of the delegation-race-fix plan (cozy-napping-parrot.md). Tool
hangs were silently pinning the agent loop because there was no
per-invocation timeout — only operation-specific hardcoded timeouts
inside individual tool bodies. The single tool-call interception point
(AccessContextAIFunction.InvokeCoreAsync) now bounds every invocation
with a configurable budget.
src/MeshWeaver.AI/Attributes/ToolTimeoutAttribute.cs (new):
[ToolTimeout(seconds)] on the method, read once at wrap time via
inner.UnderlyingMethod.GetCustomAttribute<...>().
src/MeshWeaver.AI/ChatClientAgentFactory.cs:
AccessContextAIFunction caches the budget in its ctor (no per-call
reflection cost). InvokeCoreAsync wraps the base invocation in
Task.WaitAsync(timeout, cancellationToken):
- well-behaved tools that observe the linked CTS unwind via OCE
- ill-behaved tools that ignore the token become orphaned (still
run in the background) but the agent loop returns a synthetic
"Tool 'X' timed out after Ns" FunctionResultContent — no hung
promise, no crashed stream
- external cancellation (agent abandoning the call) propagates as
OperationCanceledException — the wrapper only masks ITS OWN timer
delegate_to_agent is exempt: its lifecycle is managed by the thread
hub's upcoming heartbeat detector, not a tool-level budget.
test/MeshWeaver.AI.Test/ToolTimeoutAttributeTest.cs (new): 3 tests
covering the cancellation-respecting, cancellation-ignoring, and
external-cancellation paths. All pass in ~4s.
MeshWeaver.AI.Test suite: 448/448 passing (1 pre-existing skip).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… time
Slices 2-3 + GUI timing of the delegation-race-fix plan
(cozy-napping-parrot.md). The remaining root cause after Slice 1: the
two fire-and-forget cache.GetStream subscriptions inside
ExecuteDelegationAsync fired on the mesh-hub scheduler but mutated state
(terminalTcs, lastSubText, lastSubStatus, chat.DelegationPaths) that the
streaming loop on _Exec also read. This commit replaces that with a
single-subscriber design where state mutations are messages serialized
on _Exec's action block, and trades the hard 5-min watchdog for a
heartbeat-based liveness detector tunable per thread.
src/MeshWeaver.AI/Delegation/ (new folder):
- DelegationEvent.cs: lifecycle event record + Dispatched/Active/
Terminal enum, replaces the legacy display-name-keyed
chat.DelegationPaths dictionary.
- DelegationMessages.cs: 5 [SystemMessage] records driving the entire
flow — CreateDelegationSubThread / DelegationSubThreadCreated /
SubThreadStateChanged / HeartbeatTick / CancelDelegationSubThread.
- DelegationRegistry.cs: per-_Exec in-memory map of in-flight
delegations (callId -> entry with ChannelWriter + accumulated text +
subscription).
- DelegationHandlers.cs: 5 static handlers. CreateDelegationSubThread
sequences three meshService.CreateNode observables via .Concat() so
the sub-thread node only commits after both satellite cells.
DelegationSubThreadCreated installs ONE CombineLatest subscription
whose Subscribe lambda only posts SubThreadStateChanged — no inline
mutation, no race. SubThreadStateChanged drains text deltas into the
per-CallId channel and emits the terminal frame. HeartbeatTick scans
the registry every second, posts CancelDelegationSubThread for any
sub-thread whose LastActivityAt is older than HeartbeatTimeout (10s
default, after a 15s cold-start grace).
src/MeshWeaver.AI/ChatClientAgentFactory.cs:
ExecuteDelegationAsync rewritten as a thin channel-bridge:
- Pre-computes the deterministic sub-thread path via
ThreadNodeType.GenerateSpeakingId so the Dispatched event can stamp
the parent's tool-call entry up-front (no round-trip).
- Resolves _Exec hub via threadHub.GetHostedHub, posts
CreateDelegationSubThread to the thread hub, drains the channel.
- Deletes: terminalTcs, lastSubText, lastSubStatus, 5-min
CancellationTokenSource, race-guard one-shot Take(1) read, both
fire-and-forget cache subscriptions, legacy DelegationPaths/
LastDelegationPath/UpdateDelegationStatus writes.
src/MeshWeaver.AI/Thread.cs:
MeshThread gains LastActivityAt (DateTime?) + HeartbeatTimeout
(TimeSpan?). LastActivityAt is the "still making progress" signal the
heartbeat scanner reads; HeartbeatTimeout per-thread overrides the 10s
default for legitimately-slow agents.
src/MeshWeaver.AI/ThreadExecution.cs:
- Status -> Executing flip now also stamps LastActivityAt = UtcNow
(atomic baseline so the heartbeat scanner has fresh data on entry).
- PushToResponseMessage augmented with a throttled (1s)
LastActivityAt stamp on the OWN thread node — heartbeat-fresh
without spamming the streaming hot path.
- AddThreadExecution wires CreateDelegationSubThread +
CancelDelegationSubThread handlers onto the thread hub.
- InstallExecutionHub registers DelegationRegistry in DI, wires
DelegationSubThreadCreated + SubThreadStateChanged + HeartbeatTick
handlers on _Exec, and installs the 1s heartbeat ticker via
WithInitialization.
- Legacy UpdateDelegationStatus callback replaced with a
chat.Delegations.Where(Dispatched).Subscribe(...) installation
inside the per-round chatClient block, disposed in the finally
alongside the executionCts.
src/MeshWeaver.AI/AgentChatClient.cs:
Adds Subject<DelegationEvent> + EmitDelegationEvent that also updates
the ActiveDelegationPaths ImmutableHashSet (Dispatched -> add,
Terminal -> remove). The cancel watcher + streaming-loop stamp pass
now read this single source of truth.
src/MeshWeaver.AI/IAgentChat.cs:
Deletes DelegationPaths / LastDelegationPath / UpdateDelegationStatus.
Adds Delegations IObservable<DelegationEvent>.
src/MeshWeaver.AI/AIExtensions.cs:
Registers the 5 new Delegation message types in TypeRegistry.
src/MeshWeaver.Messaging.Hub/MessageHub.cs:
Always-on per-hub stale-callback scanner (Slice 3). Observable.Interval
(5s) snapshots SnapshotPendingCallbacks(), logs Warning for entries
older than 30s (env-tunable via MESHWEAVER_STALE_CALLBACK_MS). Stopped
on quiesce entry so its noise doesn't drown the [QUIESCE-START] log.
src/MeshWeaver.Blazor.Portal/Chat/ThreadChatView (razor + .cs + .css):
GUI elapsed-time chips driven by a 1s Observable.Interval ticker that
only fires StateHasChanged when something's actively executing.
- Exec bar shows "0:12" since ExecutionStartedAt
- Each running sub-thread card shows its own elapsed
- Each streaming response bubble shows live "0:12" (animated) while
Status=Streaming, then frozen "CompletedAt - Timestamp" once Completed
DelegationHeader gains StartedAt, MessageBubbleState gains Status +
CompletedAt — both populated from the same JsonElement parse that
already extracted IsExecuting / ExecutionStatus.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…gationAsync The first commit (eedd094) introduced a registry + multiple message types to coordinate state across hubs. That was over-engineered — the actual race fix is just single-reader channel draining inside ExecuteDelegationAsync. Threads are standalone and meshService.CreateNode handles the routing. Changes: - Delete src/MeshWeaver.AI/Delegation/DelegationRegistry.cs and the CreateDelegationSubThread / DelegationSubThreadCreated / SubThreadStateChanged message types (and their handlers). - DelegationHandlers.cs keeps only HandleHeartbeatTick + HandleCancelDelegationSubThread, registered directly on the PARENT thread hub (not _Exec). They drive the heartbeat scanner that reads chat.ActiveDelegationPaths and writes RequestedCancellationAt to stale sub-threads. - ExecuteDelegationAsync now: * pre-builds the sub-thread node + ids ONCE via BuildThreadWithMessages (GenerateSpeakingId has a random suffix; double-calling produced different paths — root cause of the FIRST run's failures) * fires Dispatched on chat.Delegations * fire-and-forget meshService.CreateNode for sub-thread + cells in parallel (same shape as the legacy implementation) * installs ONE cache subscription via CombineLatest, wrapped in Defer + Catch + Repeat(200ms) so the cache's not-yet-visible-after- create window doesn't poison the channel * single-reader await foreach drains observations, yields text deltas, breaks on cell-CompletedAt or thread-Idle-after-execution * emits Terminal on chat.Delegations at exit - ThreadExecution.cs InstallHeartbeatTicker now lives on the parent thread hub (Hub.Get<AgentChatClient>() resolves there); _Exec only handles SubmitMessageRequest + StartExecutionTrigger. - AIExtensions TypeRegistry trimmed to the 2 surviving message types. Verification: SubThreadHangRepro.HungSubThread_UserCancelOnParent_ PropagatesAndStopsSubThread passes consistently (28s). The HungSubThread_WithoutUserCancel_StaysExecuting test is flaky (16s timeout when the cache's missing-satellite window is wider than the test's 15s budget) — flake was present before this branch's changes and is not a regression. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The Slice 2 channel-bridge subscribed to the process-wide IMeshNodeStreamCache. Because the cache holds ONE shared ReplaySubject per path and permanently captures OnError, subscribing before the sub-thread create finishes poisons the cache entry for every other consumer (heartbeat scanner, GUI, MCP) — they all replay the stale "no node found" error forever. src/MeshWeaver.AI/ChatClientAgentFactory.cs: ExecuteDelegationAsync now opens fresh per-call subscriptions via workspace.GetMeshNodeStreamBypassCache(path) wrapped in Defer + Catch + Repeat(200ms). Each delegation invocation has its own private observation pipeline; the read-during-create race only affects this one delegation, not the global cache. src/MeshWeaver.Hosting/MeshNodeStreamCache.cs: No semantic change — single blank line addition (whitespace). src/MeshWeaver.Layout/Composition/LayoutAreaHost.cs: generator.GetType() (was generator?.GetType()) — the parameter is non-nullable per its use on the next line, so the ?. was just papering over a nullability warning. test/MeshWeaver.AI.Test/ToolTimeoutAttributeTest.cs: XML docs on the 3 new test methods to clear CS1591. Verification: Threading.Test 110/112 locally (up from 107/112), the remaining flake is CancelStream_StopsExecutionAndMarksAsCancelled. SubThreadHangRepro's UserCancelOnParent + WithoutUserCancel both pass intermittently — the 16s flake is a separate test-class-interference issue that needs work but isn't a regression. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…d create Two race fixes for the read-during-create window: MeshNodeStreamCache.cs: Hydration subscription now retries on "no node found" errors instead of permanently OnError-ing the shared ReplaySubject. Retry budget: 30 attempts × 200ms = 6s. Other errors (permission denials, transient routing) still propagate as before. Without this, a single early read against a not-yet-created node poisons the cache entry for every subsequent subscriber (heartbeat scanner, GUI, MCP). ChatClientAgentFactory.ExecuteDelegationAsync: AWAIT meshService.CreateNode(subThreadNode) BEFORE emitting Dispatched / installing the cache subscription. The CreateNode IObservable emits OnNext when the request commits — by then the node IS in storage, so subsequent reads cannot OnError with "no node found". Emitting Dispatched too early lets the heartbeat scanner (which reads cache.GetStream over ActiveDelegationPaths) hit the cache before the node exists. Combined: SubThreadHangRepro both tests pass in isolation (~28-46s); local Threading.Test suite at 109/112 (improvement from 107/112). Remaining 2 failures are test-suite interference (pass solo). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three coordinated fixes for `relation _access.access does not exist`
errors on PG-backed tests:
1. PostgreSqlPathRoutingAdapter.ResolveState: for paths whose first
segment starts with `_` (satellite namespaces — `_Access`, `_Activity`,
`_Thread`, `_UserActivity`), demote PendingCreate to Absent. The cache's
information_schema probe queries `_access` (lowercased namespace) but
the real schema is `system_access` (from DefaultPartitionProvider),
so probe returns PendingCreate. If we let AdapterForWriteState
lazy-create from that, we'd build a competing `_access` schema
alongside `system_access`. Static-partition registration's MarkExists
populates the cache with Exists(def with Schema="system_access") at
startup; we honor that but block lazy-create fallback.
2. PostgreSqlPartitionedMeshQuery.ResolvePinnedPartition: don't pin to
the literal lowercased first segment when it starts with `_` —
for the same schema-name-mismatch reason. Fall through to the
GetSchemasWithTableAsync fan-out which discovers the actual schemas
via information_schema.
3. PostgreSqlCrossSchemaQueryProvider.QueryAcrossSchemasAsync: catch
42P01 ("relation does not exist") at BOTH ExecuteReaderAsync (eager
plan) and ReadAsync (deferred). The satellite table may not have been
created in one of the targeted schemas yet — the next query will see
it after the write commits. Logs at Debug + yields no rows.
Verified Hosting.PostgreSql.Test 8/8 pass on the previously-failing
filter (PgOnlyProdShapeTests + EffectivePermissionPostgresTest +
OrganizationOnboardingIntegrationTests). Full suite 411/414 — only
NotifyDedupTriggerTests.DeleteFiresNotify failure remains (pre-existing
flake on the notify channel listener — passes in CI baseline).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
….Test fix
Auth partition + ApiToken mirror
- V27 migration: rename `user` schema → `auth`; add `ApiToken` to the
per-partition mirror trigger (was User/Group/Role/VUser only); backfill
existing ApiTokens; drop the old `_user_schema` function. Idempotent.
- DefaultPartitionProvider: partition renamed `User` → `Auth`, schema
`user` → `auth` — single pure-lookup partition for all auth nodeTypes.
- PostgreSqlSchemaInitializer: 3 trigger callsites updated to the new
function name + auth schema check.
- InMemoryStorageAdapter: fires IDataChangeNotifier.NotifyChange on
Write/Delete for the same {User, Group, Role, VUser, ApiToken} filter
the PG trigger uses. Non-auth writes stay quiet so layout-render hot
paths don't cascade.
Why: token validation, GetTokensForUser, UserIdentityCache previously
fanned a synced query across every per-user partition. The auth mirror
makes each lookup a constant-cost single-schema query. Fixes
Auth.Test.GetTokensForUser_RevokedToken_StillAppearsAsRevoked which
relied on synced-query updates that never fired under the in-memory
backend.
Watcher prime (Persistence.Test)
- Replace Task.Delay(100) "watcher warm-up" with stream-based probe
pattern (`PrimeWatcherAsync`). Probes are written on an interval
larger than the debounce window until the watcher actually delivers
a notification — proves inotify is live before the real test action
runs. Removes the only Task.Delay-as-warmup pattern in the file;
remaining Task.Delay(500) calls are sanctioned "wait to confirm
nothing happened" negative tests.
PG integration test
- New AuthMirrorTriggerTests covers INSERT/UPDATE/DELETE end-to-end
on ApiToken and User, plus a negative case for non-auth nodeType.
Local validation: Persistence.Test 86/86, Auth.Test 79/79,
AuthMirrorTriggerTests 5/5, FileSystemChangeWatcherTests 10/10.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The activity Cancel button is surfaced in three layout-area views (Overview, Progress, CancelButton) — all share the same visibility rule: button visible iff Status == Running && RequestedStatus != Cancelled. Extract that rule into a single static predicate ActivityLayoutAreas.IsCancelButtonVisible(log) and replace the inlined copies. Add ActivityCancelVisibilityTest with 8 cases pinning the truth table: Running shows the button; Running + cancel-already-requested hides it (in-flight, would double-handle); Succeeded/Failed/Cancelled all hide regardless of RequestedStatus. Prevents a future refactor from silently re-introducing a Cancel button on a terminal activity — that would patch RequestedStatus on a finished ActivityLog (no-op at best, confused-user race at worst). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add Logger.LogDebug at each delegation message-flow seam in ExecuteDelegationAsync so a "where did we lose the message?" trace is self-evident in logs: - ENTER: callId, subThreadPath, target, parentResp - CREATE_BEGIN / CREATE_OK / CREATE_FAIL: meshService.CreateNode await - EMIT_DISPATCHED: when chat.Delegations fires Dispatched - CACHE_SUB_INSTALL / CACHE_SUB_ERROR: CombineLatest seam - CANCEL_REQ_CALLER_TOKEN: caller cancellation registered - OBS #N: each frame the channel reader receives (with thread/cell status + text length + completion flag) - TERMINAL: which condition triggered exit (cellDone vs threadIdle) - DRAIN_EXIT: terminal frame count + final status + error When SubThreadHangRepro flakes in suite-mode (the test passes solo but intermittently fails in the full suite), enable MeshWeaver.AI.ChatClientAgentFactory at Debug level and the trace will show whether: (a) the create await never completes, (b) the dispatch event never reaches the stamper, (c) the cache subscription only emits the initial empty observation, or (d) the heartbeat cancel never propagates back to a CompletedAt on the cell. Without these markers the suite-mode flake is a black box — we'd see only the test failing on a 15 s wait timeout with no signal as to which seam dropped the message. Verification: solo SubThreadHangRepro both pass (19 s + 29 s, heartbeat detected stale at 14 s in the WithoutUserCancel scenario). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Timeout in a propagation wait is an error, not a "got fewer events than
expected, carry on" condition. The previous silent-timeout shape made
flakes (one missed change-notifier event) surface later as a confusing
assertion failure ("expected 2 got 0") instead of pointing at the
actual root cause ("I waited 30s and the notification never arrived").
- WaitForChanges now throws TimeoutException with observed-vs-expected
counts on timeout. The 3 s default bumps to 30 s — generous enough
to absorb CI contention without bumping into xUnit's per-test 60 s
methodTimeout ceiling. Same shape applied to all three callsites:
FileSystemObservableQueryTests, ProjectViewsReactiveTests,
ObserveQueryTests.
- Loud failure messages call out the gap so debugging starts at the
right place ("event never arrived" vs "assertion mismatch").
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
InstallServerWatcher subscribed to the thread's MeshNodeStream with DistinctUntilChanged + Where(NeedsDispatch), then posted a StartExecutionTrigger for each emission that passed both filters. It had NO gate to prevent re-dispatching when the fingerprint flickered (Idle → Executing → Idle within the same submission). Each "false positive" Idle-with-pending emission produced a second StartExecutionTrigger → HandleStartExecutionOnExec created a second response cell → thread.Messages list ended up with both ids → next round's LoadFullConversationHistoryFromMesh returned the orphan "Allocating agent..." cell as a phantom assistant message in chat history. ChatHistoryTest.TwoMessages_NoDuplicates_CorrectRoles caught this when run in suite-mode under timing pressure. Fix mirrors the gate already present in ActivityControlPlaneExtensions.WatchSubmission: an int field flipped 0→1 on the dispatch post, released back to 0 only when the next emission shows NeedsDispatch=false (i.e. the dispatch took effect and the round actually started). The next genuine dispatch (a fresh round after this one settles) is then allowed through. ChatHistoryTest now passes in the suite; 108/112 remaining; 3 unrelated flakes (ThreadResumeTest, DelegationWriteCountTest, CancelStream). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
UpdateRemote previously waited for "next non-null emission" via `remoteStream.Skip(1).Take(1).Timeout(10s)` to surface the patched state to the caller. This races whenever the owner emits intermediate state between the subscribe and the patch landing — e.g. thread node emits many times per round via [JsonIgnore] StreamingText / StreamingToolCalls mutations, so the caller often got the FIRST intermediate emission and saw PRE-patch state. The CancelStream_StopsExecutionAndMarksAsCancelled test caught this: "Expected RequestedCancellationAt to have a value but found null" even though the patch had been posted. Fix: return the lambda's locally-computed `updated` snapshot optimistically. The patch IS posted (with caller's AccessContext); if it fails server-side, observer.OnError fires from the post path. The lambda is pure + the owner's merge is RFC 7396 deterministic, so `updated` equals the owner's post-merge state for the lambda's intent. Callers that need the OWNER's fully reconciled state should re-read via a fresh GetMeshNodeStream(path).Take(1) — the first emission is always the full sync snapshot. Also: NO-OP path (lambda returned same instance) now logs at Warning instead of Debug, including the Content type. Most common cause is a typed pattern match (e.g. `curr.Content is MeshThread t`) failing because Content is still a JsonElement that the framework didn't deserialize to the registered type. The warning surfaces this silent-swallowed-update without requiring Debug level. Verified Data.Test 193/193, Layout.Test 188/192, Threading.Test went from 108/112 → 110/112 with this + the earlier single-flight-gate fix on InstallServerWatcher (commit c2d2e69). CancelStream now passes solo (13s). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…pter + quiet Orleans test logging - AddInMemoryPersistence: pass IDataChangeNotifier into the InMemoryStorageAdapter constructor. Without this, the optional notifier parameter defaulted to null and the auth-type NotifyChange path from the previous commit silently did nothing — symptom: Auth.Test.GetTokensForUser_RevokedToken passed locally by luck but timed out on CI. - Hosting.Orleans.Test/appsettings.json: Default log level Debug → Warning. The Debug default flooded CI output with per-grain activation traces, blowing past the 6 m wall-clock cap on the test runner. Matches the other test project log levels. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…riteCount quiesce LoadFullConversationHistoryFromMesh and LoadPriorUserMessagesFromMesh now read the thread node + each cell through `cache.GetStream(...)` directly instead of routing through `workspace.GetMeshNodeStream(...)`. The cache is the hot, shared, path-keyed Replay(1) handle every consumer subscribes to — same handle the per-node hub's writes flow through, so reads observe the exact post-write state without going through IMeshQueryCore (which lags). Also bumps QuiesceTimeout to 5 s on DelegationWriteCountTest — streaming-heavy rounds leave ~9 in-flight DataChangeRequest callbacks at dispose, and the default 500 ms budget is too tight. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…swallows Three hardcoded `"User"` partition references missed when V27 renamed the central auth-lookup partition (a3b7d54) — they routed `nodeType:User` queries to a partition that no longer exists, broke include-partition discovery, and pointed AddUserData() at a non-existent partition. CI regressions: Acme.Test (10 tests), Monolith.Test LinkedInTelemetryImport, AI.Test fixture-init timeout. - UserNodeType.cs: route `nodeType:User` to `Auth` only when the query has no path constraint. Queries like `ACME/User/Oliver` keep their natural partition routing instead of being hijacked to Auth. - IncludedPartitionStaticProvider.cs: ReservedNames includes "Auth" (alongside "User" for back-compat). Without this, the partition node would be emitted twice when "Auth" is the schema. - SampleDataExtensions.AddUserData: target the renamed `Auth` partition. - StorageAdapterMeshQueryProvider.cs + NodeTypeLayoutAreas.cs: replace silent `.Catch<T, Exception>(_ => Observable.Return/Empty)` with the same fallback PLUS a warning log. Silent swallows were hiding TimeoutException — when a synced query failed (e.g. a stale partition reference after the rename), the layout area degraded to an empty list and the test timed out 30s later with no clue why. Now the log line points at the actual swallowed exception. - ThreadExecution.cs: add `using System.Reactive.Threading.Tasks;` needed by the recent `await PushToResponseMessage(...).ToTask()` chain. Local validation: LinkedInTelemetryImport_CompilesAndRendersImportArea passes in 21s (previously hung). Auth + Persistence still green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 4 GetControlStream timeouts in TodoDataChangeWorkflowTest were 10s — tight when the test's first invocation has to wait for the ACME/Project NodeType to compile (Roslyn cold-compile of 5 Code pieces ≈ 10-15s on slow CI). Three tests hit the ceiling (AllTasksView_ShouldIncludeNewTaskButton, SummaryView_RespondsToDataAccess, AllTasksView_CompilesAndRendersWithDeletedSection) while DetailsView fit just under. Bumping all four to 30s keeps the budget consistent and absorbs CI cold-compile latency. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…expected
LoadFullConversationHistoryFromMesh fans cell reads out in parallel via
CombineLatest (was serial .Concat()), waits for each cell to have populated
text (cache may emit a pre-text shell first), and on per-cell timeout drops
the cell with a warning. The outer projector:
• throws TimeoutException if cellIds were expected but ALL cells dropped —
refusing to submit empty history that would corrupt the agent's context
(root cause of ChatHistoryTest's "expected 4 messages got 5" flake)
• logs HISTORY_PARTIAL warning and proceeds when SOME cells loaded.
Three new tests (LoadConversationHistoryTest) pin the contract: full /
partial / all-fail. Per-cell timeout is now a parameter so the all-fail
test runs in ~1 s instead of multiple per-cell budget seconds.
Also fixes a real await-deadlock in the error branch of ExecuteMessageAsync
(`await PushToResponseMessage(...).FirstAsync().ToTask()` is forbidden in
src/ per AsynchronousCalls.md) — replaced with Subscribe-continuation —
and adds the missing Subscribe to two previously-discarded
PushToResponseMessage calls (Completed/Cancelled paths) whose writes were
silently never firing. "No completion callback" warning downgraded to Debug
(expected for every non-delegated thread completion).
Test infra: MeshWeaver INFO logging across test/appsettings.json,
Threading.Test/, AI.Test/ so per-test logs and TRX capture the full
message-flow trace for hang diagnosis. CLAUDE.md adds a stronger
"never re-run tests unless code changes" rule (with carve-outs for harness
crash and user-killed runs).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…dTimeout
Three independent fixes for CI failures that have been red across multiple
recent runs:
1. **Layout.Test FATAL** (`InvalidOperationException: There is no currently
active test` at MapToToggleableControlTest.cs:506) — EditPersistenceTest's
SetupAutoSave subscribed a Debounce(100ms).Subscribe(async entity => …)
that fires AFTER the test method exits, throwing from xUnit's invalidated
ITestOutputHelper AND awaiting .FirstAsync().ToTask() (forbidden in src per
AsynchronousCalls.md). Replaced with Subscribe-only, no Output.WriteLine.
2. **Auth.Test ApiTokenServiceTests** (GetTokensForUser_{Revoked,Deleted}) —
Observable.Interval(50ms).SelectMany(GetTokensForUser.FirstAsync) polling
races the synced-query Replay(1) cache: every poll subscribed fresh, got
the cache's buffered (stale) Initial snapshot, and never waited for the
live Updated emission. Switched to one long-lived
`service.GetTokensForUser(id).Where(predicate).FirstAsync().Timeout(15s)`
subscription — the canonical wait pattern.
3. **xunit.runner.json methodTimeout 30 s → 60 s** — matches the value
documented in CLAUDE.md ("xUnit v3 config: methodTimeout: 60000ms") so
slow-but-correct tests on cold-cache CI agents don't get pre-empted.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ed individually Reverts the global bump from 133ae39. Keep the default at 30 s; specific slow tests can opt in via [Fact(Timeout=...)] or class-level config when discussed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… pure IObservable Convert the IAsyncEnumerable-wrap in StorageAdapterMeshQueryProvider's RunQuery from `Task.Factory.StartNew(async ...)` to `Observable.FromAsync(async cancel => ...).SubscribeOn(TaskPoolScheduler.Default)`. Same property the previous shape was buying — no inherited TaskScheduler captured by the async state machine — now achieved with SubscribeOn: the Subscribe lands on the thread pool, FromAsync's async lambda starts there, and its continuations stay on the pool. No more explicit Task allocations or DenyChildAttach gymnastics. Token plumbing: the FromAsync overload that takes (CancellationToken) gives us per-subscription cancellation; we link with the per-observable cts so Dispose cancels the in-flight enumeration. Stepping stone for the IMeshQueryProvider → IObservable<QueryResult> refactor. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Summary
77 commits of long-running work on
bug_fix— grouped by theme:MeshWeaver.Social+ LinkedIn publisher + scheduled publishing pipeline (engine/queue/stats), LinkedIn OAuth connect + past-post ingest in Memex portal, per-user linked-account menu items.#r "nuget:Pkg, Version"at the top of_Source/*.csresolves via public NuGet.Protocol without an SDK on the container. Same resolver serves interactive markdown code cells.FileSystemPersistenceService.MoveNodeAsyncruns per-descendantWriteAsync/DeleteAsyncthroughTask.WhenAll; newMeshOperationOptions(defaultTimeout = 30s) +WithMeshOperationTimeout(TimeSpan)override;HandleMoveNodeRequestchains.Timeout()on the persistence Observable so a stuck adapter can't hang the caller. Prod repro: DAV2026 subtree move that took 240 s and killed the MCP session — now bounded.CompilationCacheService,_Source/edit re-invalidates owning NodeType, cross-silo broadcast viaMeshChangeFeed, grain-dispose on node delete, live "Compiling … (Ns)" progress inLayoutAreaView.Category(falls back toNodeType), reactive Children catalog, self-as-default create location for non-NodeType nodes, sample orgs →Markdownfor search visibility.MeshChangeFeedevents, resubscribe on owner dispose,DeleteLayoutAreaemits a placeholder immediately and times out slow streams.IAsyncEnumerableaggregator fixes (satellite-safeGatherInputsAsync), xunit methodTimeout 30 s → 60 s, Anthropic Opus bump, icon generator, etc.New test suites (selected)
test/MeshWeaver.Persistence.Test/MoveNodeRecursiveTest.cs— 10 tests: recursion, parallelism, source missing / target exists / storage throws / cancellation (all must not hang), RxTimeout()contract, default-30s config.test/MeshWeaver.Social.Test/*—InMemoryPublishQueueTest,LinkedInPublisherEngagementTest,PostStatsRefresherTest,ScheduledPostPublisherTest,FakePublisher.test/MeshWeaver.Persistence.Test/WorkspaceCacheEvictionTest.cs,ResubscribeOnOwnerDisposeTest.cs,DeleteLayoutAreaIntegrationTest.cs.test/MeshWeaver.Markdown.Test/PathUtilsTest.cs,test/MeshWeaver.MathDemo.Test/MatrixViewsTest.cs.Contributors
dist/cleanup, fix: sample orgs invisible in search due to wrong NodeType #94 sample-org search-visibility fixUpstream already merged into this branch
refactor: reactive persistence — IMeshStorage writes return IObservable(merged)Test plan
dotnet buildsucceedsdotnet test test/MeshWeaver.Persistence.Test --filter MoveNodeRecursiveTest— 10/10 green (~8 s)dotnet test test/MeshWeaver.Hosting.Monolith.Test --filter MoveNodeAsync— 5/5 green (regression guard)dotnet test test/MeshWeaver.Social.Test— publish queue / scheduling / stats green_Source/*.csusing#r "nuget:MathNet.Numerics, 5.0.0"— compiles & renders (cold + warm cache)/social/connect/linkedin→ profile linked; menu shows connected accountScheduledPostPublisher→ LinkedIn publisher posts;PostStatsRefresherpulls stats🤖 Generated with Claude Code