Skip to content

NPU models fail during Agent Framework fan‑out (parallel inference) while CPU models succeed #506

@leestott

Description

@leestott

Issue reported via Foundry Discord https://aka.ms/foundry/discord
Foundry Local

Description
When using Foundry Local with the Microsoft Agent Framework, CPU models run successfully, but NPU‑backed models fail at the fan‑out stage when multiple parallel LLM calls are made.
This occurs even on systems with sufficient shared memory (32 GB). Single, serialized inference works on NPU; failures only appear when fan‑out / parallel orchestration is used.
This makes NPU models unusable today for multi‑agent or fan‑out workflows.

Repro steps
Install Foundry Local on a Copilot+ PC with NPU support (Windows 11 24H2)
Clone and run:
https://github.com/leestott/agentframework--foundrylocal
Configure the agent to use:

CPU model → ✅ succeeds
NPU model → ❌ fails during fan‑out

Observe failure when multiple concurrent LLM calls are triggered

Expected behavior
NPU models should either:
Support fan‑out / parallel inference, or
Fail gracefully with a clear error indicating that parallel execution is not supported

Actual behavior
CPU models complete successfully
NPU models fail at fan‑out / parallel execution stage
Failure appears related to NPU execution provider session / memory constraints rather than total system memory

Environment
Foundry Local: latest 0.8.119 (as of March 2026)
OS: Windows 11 24H2
Hardware: Copilot+ PC with NPU QNN ARM
Memory: 32 GB shared system memory
Agent Framework: used via agentframework--foundrylocal sample

Notes
NPU works correctly for single-agent, serialized inference
Issue only occurs with parallel / fan‑out agent execution
This appears to be a limitation of current NPU execution providers (session concurrency, buffer/KV cache duplication)

Clarifying whether this is a known limitation (and documenting it) would help developers avoid confusion when targeting NPUs with Agent Framework.

User asked to validate issue confirm using NPU model Qwen2.5-7b
Upload logs and Screenshot of demo and issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions