From 9946d5a4d7b76976f82f3f07466f878bd5067b19 Mon Sep 17 00:00:00 2001
From: Rolf Bjarne Kvinge <rolf@xamarin.com>
Date: Mon, 27 Apr 2026 19:12:19 +0200
Subject: [PATCH 1/4] Add fix-random-ci-test-failure skill for diagnosing flaky
 tests

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---
 .../fix-random-ci-test-failure/SKILL.md       | 108 ++++++++++++++++++
 1 file changed, 108 insertions(+)
 create mode 100644 .github/skills/fix-random-ci-test-failure/SKILL.md

diff --git a/.github/skills/fix-random-ci-test-failure/SKILL.md b/.github/skills/fix-random-ci-test-failure/SKILL.md
new file mode 100644
index 00000000000..ce55a66650d
--- /dev/null
+++ b/.github/skills/fix-random-ci-test-failure/SKILL.md
@@ -0,0 +1,108 @@
+---
+name: fix-random-ci-test-failure
+description: >-
+  Investigate and fix flaky/random CI test failures in dotnet/macios. Trigger on
+  GitHub issues describing intermittent test failures, CI postmortem issues, or when
+  asked to fix a flaky test. Analyzes test code, identifies root causes
+  (shared state, environment dependencies, race conditions), and applies fixes.
+---
+
+# Fix Random CI Test Failure
+
+Investigate and fix flaky CI test failures reported in GitHub issues for dotnet/macios.
+
+## When to Use
+
+- User shares a GitHub issue about a flaky/intermittent test failure
+- Issue has the `ci-postmortem` label
+- User asks to fix a randomly failing test
+- Test failures appear across unrelated PRs (indicating environment-dependent flakiness)
+
+## Workflow
+
+### 1. Gather Failure Context
+
+Read the GitHub issue to extract:
+- **Test name** and **test suite** (e.g., monotouch-test)
+- **Error message** and **assertion failure details**
+- **Platform** (iOS, macOS, tvOS, Mac Catalyst)
+- **Affected builds** — check if failures span unrelated PRs (confirms flakiness vs. regression)
+
+If build URLs are provided, inspect logs for additional context (stack traces, timing info).
+
+### 2. Locate and Analyze the Test
+
+Find the test source file:
+```
+grep -r "TestMethodName" tests/
+```
+
+Read the **full test method** and all helper methods it calls. Understand:
+- What external resources does it use? (keychain, file system, network, simulators)
+- Does it use shared/hardcoded identifiers that could collide across parallel runs?
+- Does it properly clean up before and after execution?
+- Are there error paths that silently swallow failures?
+
+### 3. Identify Root Cause Category
+
+Common flaky test root causes in this repo:
+
+#### Shared State / Resource Conflicts
+- **Symptom**: Test uses hardcoded identifiers (e.g., fixed keychain entries, file paths, port numbers)
+- **Fix**: Use process-unique identifiers (PID, GUID, bundle ID + test name)
+- **Reference**: `tests/monotouch-test/Security/KeyChainTest.cs` uses per-process unique IDs
+
+#### Environment-Dependent State
+- **Symptom**: Test depends on OS-level state (keychain, permissions, network availability)
+- **Fix**: Add robust setup/teardown, handle unexpected initial states, add retry logic for transient errors
+
+#### Unhandled Error Codes
+- **Symptom**: Code only handles expected success/failure codes, silently fails on unexpected ones
+- **Fix**: Add fallback handling for unexpected status codes, log diagnostic info
+
+#### Race Conditions / Timing
+- **Symptom**: Test passes locally but fails intermittently in CI
+- **Fix**: Add proper synchronization, increase timeouts, avoid timing-dependent assertions
+
+#### LAContext / Authentication Issues (Security tests)
+- **Symptom**: `InvalidRecord` or authentication-related errors on keychain operations
+- **Fix**: Only attach `LAContext` where actually needed (not on plain query/delete operations)
+
+### 4. Apply the Fix
+
+When fixing:
+
+1. **Prefer unique identifiers over shared ones.** Use `Environment.ProcessId`, `Guid.NewGuid()`, or `{bundleId}-{testName}-{pid}` patterns for resource identifiers.
+
+2. **Create minimal query records.** For search/delete operations, don't attach unnecessary attributes (like `LAContext`) that can cause intermittent errors.
+
+3. **Handle all status codes.** Never silently return `false` for unexpected error codes. Either handle them with a fallback path or fail with a descriptive assertion message.
+
+4. **Add diagnostic logging.** Use `TestContext.Out.WriteLine` to log operation results that would help diagnose future failures.
+
+5. **Clean up legacy state.** If renaming identifiers, also clean up entries from old names that may linger on CI agents.
+
+6. **Always clean up in `finally` blocks.** Ensure test resources are released even on failure.
+
+### 5. If the Fix is Unclear
+
+If you cannot determine the root cause with available information:
+
+1. **Add diagnostic logging** to capture operation results (status codes, error details) in future failures.
+2. **Log the initial state** at test start (e.g., does the resource already exist before the test runs?).
+3. **Include all relevant status codes** in assertion failure messages.
+4. Explain to the user what additional information the logging will provide and what hypotheses it will help test.
+
+### 6. Create the PR
+
+- Branch naming: `dev/rolf/fix-{test-name}` or similar
+- Commit message should explain all root causes addressed
+- Reference the GitHub issue with `Fixes #NNNN`
+- Add the `copilot` label to the PR
+
+## Key Patterns in This Repo
+
+- Keychain tests in `tests/monotouch-test/Security/` are particularly prone to flakiness due to shared macOS keychain state on CI agents
+- `InitSecRecord` in `RecordTest.cs` attaches `LAContext` to all `SecRecord` instances — this is sometimes unnecessary and can cause `InvalidRecord` errors
+- `KeyChainTest.cs` demonstrates the recommended pattern: per-process unique identifiers using bundle ID + test type + PID
+- Tests run on shared CI agent machines where leftover state from previous runs can cause interference

From 1ead3fa5b3388ada3bf5890d413bd6227d07958c Mon Sep 17 00:00:00 2001
From: Rolf Bjarne Kvinge <rolf@xamarin.com>
Date: Mon, 27 Apr 2026 19:16:13 +0200
Subject: [PATCH 2/4] Use 'Ref' instead of 'Fixes' for logging-only changes in
 skill

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---
 .github/skills/fix-random-ci-test-failure/SKILL.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/skills/fix-random-ci-test-failure/SKILL.md b/.github/skills/fix-random-ci-test-failure/SKILL.md
index ce55a66650d..8d5b2b38318 100644
--- a/.github/skills/fix-random-ci-test-failure/SKILL.md
+++ b/.github/skills/fix-random-ci-test-failure/SKILL.md
@@ -97,7 +97,7 @@ If you cannot determine the root cause with available information:
 
 - Branch naming: `dev/rolf/fix-{test-name}` or similar
 - Commit message should explain all root causes addressed
-- Reference the GitHub issue with `Fixes #NNNN`
+- Reference the GitHub issue with `Fixes #NNNN` if the change actually fixes the problem, or `Ref #NNNN` if it only adds logging/diagnostics
 - Add the `copilot` label to the PR
 
 ## Key Patterns in This Repo

From 5c5d6132286b82af28b03aac27d93f3559fa5702 Mon Sep 17 00:00:00 2001
From: Rolf Bjarne Kvinge <rokvin@microsoft.com>
Date: Wed, 29 Apr 2026 10:51:17 +0200
Subject: [PATCH 3/4] Update .github/skills/fix-random-ci-test-failure/SKILL.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---
 .github/skills/fix-random-ci-test-failure/SKILL.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/skills/fix-random-ci-test-failure/SKILL.md b/.github/skills/fix-random-ci-test-failure/SKILL.md
index 8d5b2b38318..6e5caddb0b5 100644
--- a/.github/skills/fix-random-ci-test-failure/SKILL.md
+++ b/.github/skills/fix-random-ci-test-failure/SKILL.md
@@ -72,7 +72,7 @@ Common flaky test root causes in this repo:
 
 When fixing:
 
-1. **Prefer unique identifiers over shared ones.** Use `Environment.ProcessId`, `Guid.NewGuid()`, or `{bundleId}-{testName}-{pid}` patterns for resource identifiers.
+1. **Prefer unique identifiers over shared ones.** Use `Process.GetCurrentProcess ().Id`, `Guid.NewGuid ()`, or `{bundleId}-{testType}-{pid}` patterns for resource identifiers.
 
 2. **Create minimal query records.** For search/delete operations, don't attach unnecessary attributes (like `LAContext`) that can cause intermittent errors.
 

From 4e33fa76723d4da4b98df1dfe41aabb9d099ddbd Mon Sep 17 00:00:00 2001
From: Rolf Bjarne Kvinge <rokvin@microsoft.com>
Date: Wed, 29 Apr 2026 10:51:59 +0200
Subject: [PATCH 4/4] Apply suggestion from @rolfbjarne

---
 .github/skills/fix-random-ci-test-failure/SKILL.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/skills/fix-random-ci-test-failure/SKILL.md b/.github/skills/fix-random-ci-test-failure/SKILL.md
index 6e5caddb0b5..2d9335b0bb2 100644
--- a/.github/skills/fix-random-ci-test-failure/SKILL.md
+++ b/.github/skills/fix-random-ci-test-failure/SKILL.md
@@ -95,7 +95,7 @@ If you cannot determine the root cause with available information:
 
 ### 6. Create the PR
 
-- Branch naming: `dev/rolf/fix-{test-name}` or similar
+- Branch naming: `dev/{username}/fix-{test-name}` or similar
 - Commit message should explain all root causes addressed
 - Reference the GitHub issue with `Fixes #NNNN` if the change actually fixes the problem, or `Ref #NNNN` if it only adds logging/diagnostics
 - Add the `copilot` label to the PR