Skip to content

workiq mcp server crashes repeatedly in msalruntime.DLL during WAM token acquisition #113

@theonezozo

Description

@theonezozo

Summary

The workiq mcp server process crashes repeatedly with an unhandled stowed exception in msalruntime.DLL during WAM (Windows Account Manager) token acquisition. All 7 crashes over 3 days hit the exact same instruction (msalruntime.DLL+0x52214, exception 0xc000027b), indicating a deterministic bug in the MSAL native runtime's headless auth path.

Impact

  • The MCP stdio server becomes unresponsive after a token expires, requiring manual process kill and restart.
  • Auth popups appear repeatedly because the process crashes before it can cache a refreshed token.
  • Downstream MCP clients (e.g., GitHub Copilot CLI) receive Authentication failed or TypeError: fetch failed errors.

Reproduction

  1. Start the MCP server: workiq --account user@microsoft.com mcp
  2. Wait for the initial MSAL token to expire (~1 hour), or run several queries to accelerate token refresh
  3. The process crashes during the next token acquisition attempt
  4. A crash dump is written to %LOCALAPPDATA%\CrashDumps\workiq.exe.<pid>.dmp

Frequency: 7 crashes in 3 days (Apr 28–30, 2026). 100% identical crash signature.

Crash Details (from Windows Event Log — Application Error)

All 7 events are identical except for PID and timestamp:

Faulting application name: workiq.exe, version: 0.4.1.19742
Faulting module name:      msalruntime.DLL, version: 0.0.0.0
Exception code:            0xc000027b (STATUS_STOWED_EXCEPTION)
Fault offset:              0x0000000000052214

Crash Timeline

Date Time PID
Apr 28 10:53 AM 19240
Apr 28 11:01 AM 30332
Apr 28 2:54 PM 37744
Apr 28 3:18 PM 9796
Apr 29 9:19 AM 26476
Apr 30 4:05 PM 18176
Apr 30 4:33 PM 16692

Observations

  • msalruntime.DLL is versioned 0.0.0.0 — this may indicate a debug, pre-release, or improperly versioned build of the MSAL native runtime bundled with WorkIQ.
  • The exception code 0xc000027b (STATUS_STOWED_EXCEPTION) is a COM/WinRT async exception, suggesting the WAM broker throws an error that is not caught by the calling code.
  • Hypothesis: When running as a headless stdio server (no interactive window), msalruntime.DLL cannot fall back to interactive browser auth and instead throws an unhandled stowed exception. The WorkIQ process does not catch this and crashes.
  • After a crash, the process sometimes restarts but remains in a degraded state (3 MB working set vs. normal 28 MB), serving as a zombie that returns auth errors.

Environment

  • WorkIQ version: 0.4.1.19742
  • OS: Windows 11 (NT 10.0.26200.0)
  • Node.js: v22.18.0
  • Install method: npm install -g @microsoft/workiq
  • Binary path: ~\.agency\WorkIQ.Cli.win-x64\0.4.1.19742\tools\workiq.exe
  • msalruntime.DLL path: ~\.agency\WorkIQ.Cli.win-x64\0.4.1.19742\tools\msalruntime.DLL

Crash Dumps

7 dump files are available at %LOCALAPPDATA%\CrashDumps\workiq.exe.*.dmp (each ~5 MB). Happy to share if needed.

Suggested Fix

  • Catch the stowed exception from msalruntime.DLL during token acquisition and fall back gracefully (e.g., return an auth error to the MCP client rather than crashing the process).
  • Consider adding a --no-wam or --use-device-code flag for headless/MCP scenarios where interactive WAM auth is not possible.
  • Version the bundled msalruntime.DLL properly (currently reports 0.0.0.0).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions