Skip to content

fix: improve register command reliability and error handling#395

Merged
ragurubhaarath merged 15 commits intomainfrom
u/bhraguru/public-clients-redirect-uris
Apr 30, 2026
Merged

fix: improve register command reliability and error handling#395
ragurubhaarath merged 15 commits intomainfrom
u/bhraguru/public-clients-redirect-uris

Conversation

@ragurubhaarath
Copy link
Copy Markdown
Contributor

@ragurubhaarath ragurubhaarath commented Apr 30, 2026

Summary

  • Add retry with exponential backoff (3 attempts, 2s/4s/8s) for all post-registration Graph API calls (redirect URIs, API permissions)
  • Escalate Graph API failures from warnings to errors — no more silent misconfiguration
  • Demote ValidateResponseAsync body-level errors to debug so callers control the user-facing message

Test plan

  • Build passes with 0 warnings
  • All 1344 tests pass
  • E2E: fresh registration on preprod succeeds (ext_WikiSearch)
  • E2E: duplicate name registration shows friendly red error
  • E2E: registration against local MCP Platform service succeeds

🤖 Generated with Claude Code

ragurubhaarath and others added 8 commits April 29, 2026 16:03
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…command

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Detect database constraint violation from the API and display a clean
user-facing message instead of raw error details. Raw details are still
available with --verbose. Also demote ValidateResponseAsync body-level
errors to debug so callers control the user-facing message.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… error, correlation ID at warning

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ate failures to errors

All post-registration Graph calls (redirect URIs, API permissions) now
retry up to 3 times with exponential backoff (2s, 4s, 8s) to handle
transient failures like throttling, replication lag, and network blips.
Failures are now logged at Error level instead of Warning to prevent
silent misconfiguration of Entra apps.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 30, 2026 18:18
@ragurubhaarath ragurubhaarath requested review from a team as code owners April 30, 2026 18:18
…ents-redirect-uris

# Conflicts:
#	src/Microsoft.Agents.A365.DevTools.Cli/Commands/RegisterCommandExecutor.cs
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Improves reliability of the register flow by adding retry/backoff and promoting Graph API misconfiguration from warnings to errors, especially around redirect URIs and API permissions.

Changes:

  • Introduce a RetryHelper and wrap post-registration Graph API mutations with retries.
  • Add additional redirect URIs for the Public Clients Entra app and improve failure handling/log levels.
  • Escalate several Graph API failures from warning to error during redirect URI + permission configuration.

Comment thread src/Microsoft.Agents.A365.DevTools.Cli/Commands/RegisterCommandExecutor.cs Outdated
Comment thread src/Microsoft.Agents.A365.DevTools.Cli/Commands/RegisterCommandExecutor.cs Outdated
Comment thread src/Microsoft.Agents.A365.DevTools.Cli/Commands/RegisterCommandExecutor.cs Outdated
Post-registration Entra app configuration (redirect URIs + API permissions)
now runs all independent Graph calls concurrently via Task.WhenAll with
ConcurrentBag for thread-safe warning collection. Retry increased to
5 attempts with 3s base delay (max 93s) to match Entra replication patterns.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 30, 2026

⚠️ Deprecation Warning: The deny-licenses option is deprecated for possible removal in the next major release. For more information, see issue 997.

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@github-actions github-actions Bot added the bug Something isn't working label Apr 30, 2026
- Log full redirect URI count and list instead of singular URI
- Add scope name and resource app ID to RemoteProxy permission error message

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 30, 2026 18:52
@ragurubhaarath ragurubhaarath enabled auto-merge (squash) April 30, 2026 18:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 8 comments.

Comment thread src/Microsoft.Agents.A365.DevTools.Cli/Commands/RegisterCommandExecutor.cs Outdated
Comment thread src/Microsoft.Agents.A365.DevTools.Cli/Commands/RegisterCommandExecutor.cs Outdated
Comment thread src/Microsoft.Agents.A365.DevTools.Cli/Commands/RegisterCommandExecutor.cs Outdated
Comment thread src/Microsoft.Agents.A365.DevTools.Cli/Commands/RegisterCommandExecutor.cs Outdated
Comment thread src/Microsoft.Agents.A365.DevTools.Cli/Commands/RegisterCommandExecutor.cs Outdated
… lookups

- Remove unnecessary Task.Run wrappers around async I/O operations;
  extracted to dedicated async methods called directly for parallelism
- Split RemoteProxy else-if warning into separate messages for missing
  redirect URI vs missing ObjectId
- Wrap GetOAuth2PermissionScopeIdAsync calls in RetryHelper for transient
  failure resilience
- Extract AddPpmiPermissionAsync for reuse across A365 and PublicClients apps

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
pronak657
pronak657 previously approved these changes Apr 30, 2026
Comment thread src/Microsoft.Agents.A365.DevTools.Cli/Commands/RegisterCommandExecutor.cs Outdated
Comment thread src/Microsoft.Agents.A365.DevTools.Cli/Commands/RegisterCommandExecutor.cs Outdated
@sellakumaran
Copy link
Copy Markdown
Contributor

  • id: 1
    file: src/Microsoft.Agents.A365.DevTools.Cli/Commands/RegisterCommandExecutor.cs
    line: 51
    title: "PR summary says '3 attempts, 2s/4s/8s' but code uses maxRetries: 5, baseDelaySeconds: 3"
    body: |
    The PR description states:
    "Add retry with exponential backoff (3 attempts, 2s/4s/8s)"

    The actual constructor call in the diff is:
      ```csharp
      _retryHelper = new RetryHelper(logger, maxRetries: 5, baseDelaySeconds: 3);
      ```
    
    With `maxRetries: 5, baseDelaySeconds: 3`, the effective wait schedule is:
      - Attempt 1: 3s delay
      - Attempt 2: 6s delay
      - Attempt 3: 12s delay
      - Attempt 4: 24s delay
      - Attempt 5: 48s delay (capped at 60s per `CalculateDelay`)
    
    Total maximum wait: 93 seconds. The old behavior was immediate failure (no retry at all).
    
    This is a significant latency increase in the failure case. For a user watching
    `a365 develop register-external-mcp-server` fail on a transient issue, 93 seconds of
    silent retries with only Debug-level messages is a poor UX.
    
    RECOMMENDATION: Either:
    1. Align the code with the PR description (maxRetries: 3, baseDelaySeconds: 2), or
    2. Update the PR description to reflect the actual values, and add at least one
       `LogInformation`-level message on each retry so users can see what is happening
       rather than waiting in silence.
    
    NOTE: The `RetryHelper` already logs at Debug level on each retry. Since `-v` enables
    debug output, verbose mode will show retries. However, the default (non-verbose) output
    is silent during retries, which may look like a hang to the user.
    
    • id: 2
      file: src/Microsoft.Agents.A365.DevTools.Cli/Commands/RegisterCommandExecutor.cs
      line: 777
      title: "Concurrent task fan-out uses abandoned tasks on early exception"
      body: |
      The new ConfigureEntraAppsAsync builds a list of tasks and calls await Task.WhenAll(tasks)
      at the end. However, between the task list being populated and WhenAll being called, there
      is a sequential await for ppmiScopeId:

      ```csharp
      var tasks = new List<Task>();
      // ... tasks.Add(UpdateA365RedirectUrisAsync(...)) ...
      // ... tasks.Add(UpdateRemoteProxyRedirectUrisAsync(...)) ...
      // ... tasks.Add(AddRemoteProxyScopePermissionAsync(...)) ...
      
      ppmiScopeId = await _retryHelper.ExecuteWithRetryAsync(   // <-- sequential await
          async ct => await _graphApiService!.GetOAuth2PermissionScopeIdAsync(...));
      
      // ... tasks.Add(AddPpmiPermissionAsync(...)) ...
      await Task.WhenAll(tasks);   // <-- tasks added BEFORE the sequential await are now running
      ```
      

      The tasks added before the ppmiScopeId sequential await start running immediately
      when Task.Add is called with hot tasks. If the sequential await throws (see CR-001),
      those tasks are abandoned with no await, leaving background Graph API calls whose results
      are never observed and whose exceptions are silently swallowed.

      FIX: Either:

      1. Fix CR-001 (wrap in try/catch) so the exception can't escape before WhenAll.
      2. Restructure so all tasks are added first, then awaited together:
        // Add all tasks without any intermediate sequential awaits, then:
        await Task.WhenAll(tasks);
        // Post-WhenAll: apply ppmi if scope was found
      3. Pre-fetch ppmiScopeId before populating the tasks list, or move the PPMI
        fetch entirely outside the parallel region.
    • id: 3
      enabled: true
      severity: medium
      file: src/Microsoft.Agents.A365.DevTools.Cli/Commands/RegisterCommandExecutor.cs
      line: 640
      title: "LogError with string-interpolated message variable — breaks structured logging"
      body: |
      Multiple new log calls pass a pre-built string variable as the message template:

      ```csharp
      var msg = $"Failed to set redirect URIs on Public Clients app '{publicClientsAppName}' after retries.";
      _logger.LogError(msg);    // <-- string variable as template
      warnings.Add(msg);
      ```
      

      This pattern is present throughout the PR's new code (UpdateA365RedirectUrisAsync,
      UpdateRemoteProxyRedirectUrisAsync, AddRemoteProxyScopePermissionAsync, AddPpmiPermissionAsync,
      and the public-clients section). While it is a pre-existing pattern in some parts of
      this file, the PR introduces ~12 new instances.

      The issue: passing a variable as the first argument to LogError defeats the structured
      logging pipeline — the message template is not a constant, so log sinks cannot index or
      aggregate by template. It also triggers the CA2254 analyzer warning ("The logging message
      template should not vary between calls").

      FIX: Use named placeholders:
      csharp _logger.LogError( "Failed to set redirect URIs on Public Clients app '{AppName}' after retries.", publicClientsAppName); warnings.Add($"Failed to set redirect URIs on Public Clients app '{publicClientsAppName}' after retries.");

    • id: 4
      file: CHANGELOG.md
      line: 7
      title: "Missing CHANGELOG entry for user-visible behavior change"
      body: |
      This PR makes two user-visible behavior changes that meet the ### Fixed threshold:

      1. Post-registration Graph API failures are now escalated from warnings to errors.
        Users who previously saw yellow warning output for failed redirect URI / permission
        configuration will now see red error output. This changes observable CLI behavior.

      2. The register-external-mcp-server command now retries Graph API calls instead of
        failing immediately. This is observable: a registration that previously failed in
        under 1 second now takes up to 90+ seconds to produce an error.

      Per .github/copilot-instructions.md:
      "For user-facing changes (features, bug fixes, behavioral changes): verify
      CHANGELOG.md has an entry in the [Unreleased] section"

      ADD to CHANGELOG.md under [Unreleased] > ### Fixed:
      markdown ### Fixed - `develop register-external-mcp-server`: post-registration Graph API calls for redirect URI and API permission configuration now retry with exponential backoff (up to 5 attempts) to handle AAD replication lag. Failures are now surfaced as errors rather than warnings.

RetryHelper throws RetryExhaustedException when shouldRetry keeps
returning true and the final result is null. For Guid? scope lookups
this caused an unhandled crash instead of a graceful warning. Replaced
shouldRetry-based retry with exception-only overload wrapped in
try/catch for both PPMI and remote scope lookups.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 30, 2026 21:28
Pass context.GetCancellationToken() from DevelopMcpCommand SetHandler
through ExecuteAsync, ConfigureEntraAppsAsync, and all helper methods
to RetryHelper. Ctrl+C now cancels in-flight retries instead of
burning through all 5 attempts (up to 93s).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

Scope lookups can return null due to Entra replication lag after app
creation. Re-added shouldRetry predicate (result => !result.HasValue)
while keeping try/catch for RetryExhaustedException when scope truly
does not exist after all retries.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ragurubhaarath ragurubhaarath merged commit 52742d4 into main Apr 30, 2026
9 checks passed
@ragurubhaarath ragurubhaarath deleted the u/bhraguru/public-clients-redirect-uris branch April 30, 2026 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants