Skip to content

Add initial intelligence implementation with OpenAI integration#24

Merged
kevinjtwu merged 2 commits intomainfrom
openai-integration
Feb 5, 2026
Merged

Add initial intelligence implementation with OpenAI integration#24
kevinjtwu merged 2 commits intomainfrom
openai-integration

Conversation

@kevinjtwu
Copy link
Collaborator

@kevinjtwu kevinjtwu commented Feb 5, 2026

Changes

  • OpenAI Provider Support: Added OpenAI API key and optional organization ID configuration
  • Evidence Extraction: Implemented LLM-based evidence extraction that evaluates policy criteria against clinical data using structured prompts
  • Form Generation: Implemented LLM-based form generation that calculates recommendations (APPROVE/NEED_INFO/MANUAL_REVIEW) and generates clinical summaries
  • Configuration: Updated setup scripts, AppHost configuration, and README to support OpenAI provider setup

Testing

  • Updated tests in apps/intelligence/src/tests
  • Manual testing via command line to verify OpenAI integration
    • Example request:
      curl -X POST "http://localhost:8000/analyze" \ -H "Content-Type: application/json" \ -d '{ "patient_id": "test-patient-123", "procedure_code": "72148", "clinical_data": { "patient": { "name": "John Doe", "birth_date": "1980-01-15", "gender": "male", "member_id": "INS123456" }, "conditions": [ { "code": "M54.5", "display": "Low back pain", "system": "http://hl7.org/fhir/sid/icd-10-cm", "clinical_status": "active" } ], "observations": [ { "code": "pain-severity", "display": "Pain severity", "value": "7/10", "unit": "scale" } ], "procedures": [] } }'
    • Example response:
      { "patient_name":"John Doe", "patient_dob":"1980-01-15", "member_id":"INS123456", "diagnosis_codes":[ "M54.5" ], "procedure_code":"72148", "clinical_summary":"John Doe has a documented diagnosis of low back pain (ICD-10: M54.5) for which lumbar spine MRI (CPT 72148) is being requested. However, there is no documentation of prior conservative therapy, failed treatment, or neurological symptoms to support the medical necessity of advanced imaging at this time. Additional clinical information is recommended to demonstrate medical necessity per standard guidelines.", "supporting_evidence":[ { "criterion_id":"conservative_therapy", "status":"NOT_MET", "evidence":"Criterion Evaluation: NOT_MET\n\nExplanation: There is no documentation provided to demonstrate that the patient has completed 6 or more weeks of conservative therapy (such as physical therapy or medication). The clinical data only notes the diagnosis and pain severity, but lacks evidence of the required duration or type of conservative treatment.", "source":"LLM analysis", "confidence":0.8 }, { "criterion_id":"failed_treatment", "status":"NOT_MET", "evidence":"NOT_MET\n\nExplanation: There is no documentation provided regarding treatment failure or inadequate response. The only information available is the diagnosis and pain severity, but no records or evidence of previous treatments or their outcomes are included. Therefore, the criterion is not met.", "source":"LLM analysis", "confidence":0.8 }, { "criterion_id":"neurological_symptoms", "status":"NOT_MET", "evidence":"Criterion Evaluation: NOT_MET\n\nExplanation: There is no documentation or evidence provided indicating the presence of red flag neurological symptoms. The only information available is the diagnosis of low back pain and pain severity, which does not meet the criterion for red flag neurological symptoms.", "source":"LLM analysis", "confidence":0.8 }, { "criterion_id":"diagnosis_present", "status":"MET", "evidence":"MET\n\nExplanation: The clinical data lists an ICD-10 diagnosis code (M54.5 - Low back pain). This satisfies the criterion requiring a valid ICD-10 diagnosis code to be present.", "source":"LLM analysis", "confidence":0.8 } ], "recommendation":"NEED_INFO", "confidence_score":0.6, "field_mappings":{ "PatientName":"John Doe", "PatientDOB":"1980-01-15", "MemberID":"INS12345ED_INFO", "confidence_score":0.6, "field_mappings":{ "PatientName":"John Doe", "PatientDOB":"1980-01-15", "MemberID":"INS123456", "PrimaryDiagnosis":"M54.5", "ProcedureCode":"72148", "ClinicalJustification":"John Doe has a documented diagnosis of low back pain (ICD-10: M54.5) for which lumbar spine MRI (CPT 72148) is being requested. However, there is no documentation of prior conservative therapy, failed treatment, or neurological symptoms to support the medical necessity of advanced imaging at this time. Additional clinical information is recommended to demonstrate medical necessity per standard guidelines." } } }

@coderabbitai
Copy link

coderabbitai bot commented Feb 5, 2026

📝 Walkthrough

Walkthrough

Adds OpenAI as a supported LLM provider (API key + org ID) across docs, config, setup, and deployment wiring; replaces intelligence service stubs with an LLM-driven pipeline that parses PDFs, extracts evidence per policy, and generates PA form data; updates tests to mock the LLM.

Changes

Cohort / File(s) Summary
Docs & Quickstart
README.md, scripts/setup.sh
Adds OpenAI to provider lists and examples; introduces OPENAI_API_KEY / OPENAI_ORG_ID, persists them to dotnet user-secrets, and updates setup guidance and secret listings.
Configuration & Wiring
apps/intelligence/src/config.py, apps/intelligence/src/llm_client.py, orchestration/AuthScript.AppHost/AppHost.cs
Adds openai_org_id setting, conditionally passes org to AsyncOpenAI client, and wires OPENAI env vars into container/host config.
API / Ingest
apps/intelligence/src/api/analyze.py
Replaces stub endpoints with LLM-driven flow: validates inputs, parses PDFs into document texts, builds ClinicalBundle, runs evidence extraction and form generation, and returns dynamic PAFormResponse.
Reasoning / LLM logic
apps/intelligence/src/reasoning/evidence_extractor.py, apps/intelligence/src/reasoning/form_generator.py
Implements LLM-based per-criterion evidence extraction (status + confidence) and PA form generation (clinical summary, recommendation, confidence, field mappings) using chat_completion calls.
Tests
apps/intelligence/src/tests/test_analyze.py, .../test_evidence_extractor.py, .../test_form_generator.py
Mocks chat_completion with AsyncMock via patches; updates expected confidence values and asserts per-criterion ids where applicable.
OpenAPI / Specs
apps/intelligence/openapi.json, shared/schemas/intelligence.openapi.json
Updated endpoint descriptions to reflect LLM-driven analysis and PDF parsing (descriptive text only).

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant API as /analyze<br/>Endpoint
    participant Parser as PDF/Text<br/>Parser
    participant Extractor as Evidence<br/>Extractor
    participant Generator as Form<br/>Generator
    participant LLM as OpenAI<br/>LLM

    Client->>API: POST /analyze (clinical_data, policy[, documents])
    API->>Parser: parse_pdf / extract text
    Parser-->>API: document text
    API->>API: Build ClinicalBundle (patient, diagnoses, observations, docs)
    API->>Extractor: extract_evidence(bundle, policy)

    loop per policy criterion
        Extractor->>LLM: chat_completion(system + user prompt)
        LLM-->>Extractor: MET/NOT_MET/UNCLEAR + confidence
        Extractor->>Extractor: produce EvidenceItem
    end

    Extractor-->>API: list[EvidenceItem]
    API->>Generator: generate_form_data(bundle, evidence, policy)
    Generator->>LLM: chat_completion(summary / justification prompt)
    LLM-->>Generator: clinical summary / recommendation
    Generator-->>API: PAFormResponse (fields, recommendation, confidence)
    API-->>Client: PAFormResponse
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related issues

Possibly related PRs

Poem

OpenAI keys turned docs to light,
PDFs whispered context through the night,
Criteria weighed by careful rhyme,
Forms assembled, one last time,
A pipeline hums and answers right.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Add initial intelligence implementation with OpenAI integration' accurately reflects the main changes, clearly summarizing the addition of OpenAI support and intelligence implementation across the codebase.
Description check ✅ Passed The description is directly related to the changeset, detailing OpenAI provider support, evidence extraction implementation, form generation logic, configuration updates, and testing approach with concrete examples.
Docstring Coverage ✅ Passed Docstring coverage is 90.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch openai-integration

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
apps/intelligence/src/api/analyze.py (1)

72-124: ⚠️ Potential issue | 🟠 Major

Validate procedure_code in /with-documents too.
Unlike analyze (Line 39), this endpoint never checks SUPPORTED_PROCEDURE_CODES, so unsupported codes still hit the LLM. Add the same guard early to keep behavior consistent and avoid wasted work.

Proposed fix
 async def analyze_with_documents(
     patient_id: str,
     procedure_code: str,
     clinical_data: str,  # JSON string
     documents: list[UploadFile] = File(default=[]),
 ) -> PAFormResponse:
     """
     Analyze clinical data with attached PDF documents.

     Parses PDF documents and includes extracted text in the analysis.
     """
     import json

+    if procedure_code not in SUPPORTED_PROCEDURE_CODES:
+        raise HTTPException(
+            status_code=400,
+            detail=f"Procedure code {procedure_code} not supported",
+        )
+
     # Parse clinical data
     try:
         clinical_data_dict = json.loads(clinical_data)
     except json.JSONDecodeError as e:
         raise HTTPException(status_code=400, detail=f"Invalid clinical data JSON: {e}")
🤖 Fix all issues with AI agents
In `@apps/intelligence/src/api/analyze.py`:
- Around line 57-68: The procedure_code is being overwritten after LLM output,
which can desync mappings and summary; instead inject the requested code into
the policy or pass it into generate_form_data so the LLM prompt and
field_mappings align with the requested value—specifically, before calling
generate_form_data (and similarly before the later call around lines 125-126),
set policy["procedure_codes"] to include request.procedure_code or modify the
generate_form_data signature to accept a procedure_code param and use that
inside the function so form_response.procedure_code, field_mappings, and the
clinical summary remain consistent with request.procedure_code.

In `@apps/intelligence/src/reasoning/evidence_extractor.py`:
- Around line 48-71: The long prompt and summary strings (document_text,
clinical_summary, system_prompt, user_prompt) exceed line-length limits and must
be wrapped; update construction of clinical_summary and user_prompt to build
multi-line text without overlong lines (e.g., use textwrap.dedent and explicit
concatenation or join smaller strings/lines) and reflow the Criterion and
Clinical Data sections so no single literal exceeds the line-length rule; ensure
system_prompt likewise is short or split, and preserve the same variable names
(document_text, clinical_summary, system_prompt, user_prompt) and semantics so
the LLM prompt logic and loop over policy.get("criteria", []) remain unchanged.
- Around line 81-97: The status parsing logic using llm_response and
response_upper is fragile because "NOT MET" (with a space) won't match "NOT_MET"
and may be misclassified as MET; update the parsing in the block that sets
status, evidence_text, and confidence to normalize response_upper (e.g.,
collapse whitespace and underscores or remove spaces/underscores) and then check
for NOT_MET variants before checking MET, or use a regex word-boundary check for
NOT\s*_?\s*MET to detect both "NOT MET" and "NOT_MET", then set status =
"NOT_MET" and confidence = 0.8 accordingly.
- Around line 30-35: The code currently includes patient identifiers when
building patient_info (using clinical_bundle.patient.name and
clinical_bundle.patient.birth_date); change this to redact or replace direct
identifiers before any prompt is sent to external LLMs by substituting names/DOB
with placeholders (e.g., "[REDACTED_NAME]", "[REDACTED_DOB]" or age-only), and
add a simple feature flag/config (e.g., REDACT_PHI or SKIP_EXTERNAL_LLM) checked
where patient_info is used so that external-provider calls never include raw PHI
unless explicitly enabled and compliant. Ensure the changes touch the
patient_info construction in evidence_extractor.py and any prompt assembly paths
that consume patient_info so no raw clinical_bundle.patient fields are
forwarded.

In `@apps/intelligence/src/reasoning/form_generator.py`:
- Around line 49-70: The approval logic can wrongly approve when there are no
required criteria and evidence is UNCLEAR and also defines an unused variable;
remove the unused required_criteria and compute required_criterion_ids directly,
then set met_required to require at least one required criterion (e.g.,
met_required = bool(required_criterion_ids) and all(... for e in evidence if
e.criterion_id in required_criterion_ids)), and change the APPROVE branch to
require not has_not_met and not has_unclear (i.e., if met_required and not
has_not_met and not has_unclear: recommendation = "APPROVE"...).
- Around line 80-92: The prompt strings (system_prompt and user_prompt) exceed
the project's line-length limits causing E501 failures; break long literal lines
into shorter concatenated pieces or use textwrap.dedent on a separately defined
short-line triple-quoted template and then format/ interpolate variables. Update
system_prompt and user_prompt so you either (a) build them from multiple short
string literals inside parentheses to allow implicit concatenation, or (b)
create a dedented template like prompt_template = textwrap.dedent("""...""")
with safe placeholders and then set user_prompt =
prompt_template.format(evidence_summary=evidence_summary,
patient_name=patient_name, diagnosis_codes=', '.join(diagnosis_codes),
procedure_code=procedure_code) or use an f-string on a wrapped template—ensuring
no single source line exceeds the length limit while preserving the same content
and variable substitutions for system_prompt and user_prompt.

In `@scripts/setup.sh`:
- Around line 104-124: The script currently uses the sentinel "not-configured"
for OPENAI_KEY and OPENAI_ORG which gets persisted and passed to the OpenAI
client; change the logic around OPENAI_KEY and OPENAI_ORG so the sentinel is an
empty string instead of "not-configured" (e.g. OPENAI_KEY="${OPENAI_API_KEY:-}"
and OPENAI_ORG="${OPENAI_ORG_ID:-}") and only call dotnet user-secrets set
"Parameters:openai-api-key" and "Parameters:openai-org-id" when the
corresponding variables are non-empty; update the info messages accordingly so
you don't persist or pass the literal "not-configured" value to the OpenAI
client.

@kevinjtwu kevinjtwu closed this Feb 5, 2026
@github-project-automation github-project-automation bot moved this from Todo to Done in Authscript Demo Feb 5, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In `@apps/intelligence/src/reasoning/evidence_extractor.py`:
- Around line 1-14: The file has duplicated top-level docstring lines and a
duplicated import of chat_completion causing syntax errors; remove the redundant
duplicate lines so there is a single triple-quoted module docstring and only one
import of chat_completion, leaving the imports for ClinicalBundle and
EvidenceItem intact (look for the module-level docstring and the imports
referencing chat_completion, ClinicalBundle, and EvidenceItem in
evidence_extractor.py and delete the repeated entries).
- Around line 21-22: The docstring in evidence_extractor.py contains a
duplicated sentence "Extract evidence from clinical bundle using LLM to evaluate
policy criteria." Remove the redundant duplicate so the function/module
docstring contains that sentence only once (i.e., keep a single occurrence of
the exact string in the docstring for clarity).

In `@apps/intelligence/src/reasoning/form_generator.py`:
- Around line 22-26: Remove the duplicated docstring lines in the form_generator
function in apps/intelligence/src/reasoning/form_generator.py: the sentences
"Generate PA form data from extracted evidence using LLM." and "Calculates
recommendation based on evidence and generates clinical summary." appear
twice—keep each sentence only once in the function/module docstring so the
description is concise and non-repetitive; update the docstring in the function
(e.g., generate_pa_form or the top-level function/class docstring) to contain a
single instance of each sentence.
- Around line 127-142: The PAFormResponse return is passing duplicate keyword
args (clinical_summary, recommendation, confidence_score, field_mappings)
causing a syntax error; in the function that returns PAFormResponse remove the
repeated assignments so each field appears only once (keep patient_name,
patient_dob, member_id, diagnosis_codes, procedure_code, clinical_summary,
supporting_evidence/evidence, recommendation, confidence_score, field_mappings
and ensure supporting_evidence uses the correct variable name `evidence`) —
update the return in form_generator.py (the PAFormResponse constructor call) to
a single, non-duplicated set of keyword arguments.
- Around line 1-14: The module has duplicated docstring lines and a duplicated
import causing syntax errors; remove the redundant docstring lines so there is a
single module docstring at the top and delete the duplicate import of
chat_completion so only one "from src.llm_client import chat_completion"
remains; ensure the remaining imports for ClinicalBundle, EvidenceItem, and
PAFormResponse are unchanged and the file starts with a single valid
triple-quoted docstring followed by the imports.

@kevinjtwu kevinjtwu reopened this Feb 5, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@apps/intelligence/src/reasoning/evidence_extractor.py`:
- Around line 88-110: The variable `status` in evidence_extractor.py is inferred
as plain str but must be annotated as Literal["MET","NOT_MET","UNCLEAR"] to
match EvidenceItem.status; update the declaration of `status` (the local
variable used before building EvidenceItem) to include the type annotation
`Literal["MET","NOT_MET","UNCLEAR"]` and ensure imports include typing.Literal
if not present, keeping the same assignment and conditional branches that set
"MET", "NOT_MET", or "UNCLEAR" so the type checker recognizes the correct
Literal type for EvidenceItem(...) construction.

In `@apps/intelligence/src/reasoning/form_generator.py`:
- Around line 63-71: The local variable recommendation should be narrowed to a
Literal type so it matches PAFormResponse's typed argument; update the
assignments for recommendation in the block handling met_required/has_not_met
(and the similar block at 122-132) by declaring recommendation with a Literal
union (e.g., from typing import Literal) or using a cast to Literal for the
values "APPROVE", "NEED_INFO", and "MANUAL_REVIEW" before passing it to
PAFormResponse so the static type checker no longer reports an arg-type
mismatch.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@apps/intelligence/src/reasoning/form_generator.py`:
- Around line 93-95: The prompt template currently injects patient_name directly
(the Patient: {patient_name} line), which can leak PHI; update the code that
builds the prompt (where patient_name is passed to the template in
form_generator.py—e.g., the function that constructs/generates the prompt) to
either redact or omit identifiers by default and only include them when an
explicit allow_identifiers (or include_identifiers) flag is true; implement a
small helper (e.g., redact_identifier or format_patient_field) to replace the
name with a pseudonym or "[REDACTED]" when the flag is false, and wire that
helper into the prompt construction so Patient: uses the sanitized value instead
of raw patient_name.

In `@apps/intelligence/src/tests/test_analyze.py`:
- Around line 33-36: The test patches src.llm_client.chat_completion which
doesn't override the local imports used by extract_evidence and
generate_form_data; update the test to patch chat_completion at the import sites
used by those functions (patch the names where they are imported into the
modules that define extract_evidence and generate_form_data) using AsyncMock and
set its return_value before calling analyze(valid_request) so the locally
imported chat_completion is mocked during the test.

In `@apps/intelligence/src/tests/test_evidence_extractor.py`:
- Around line 39-41: The test is patching chat_completion in its original module
instead of where extract_evidence imports it; update the AsyncMock patch target
from "src.llm_client.chat_completion" to
"src.reasoning.evidence_extractor.chat_completion" (and likewise for the other
occurrence around lines 66-68) so that extract_evidence's direct import is
properly mocked; ensure the tests still set mock_llm.return_value and call
extract_evidence(sample_bundle, sample_policy) as before.

In `@apps/intelligence/src/tests/test_form_generator.py`:
- Around line 57-59: The test is patching src.llm_client.chat_completion but
generate_form_data imported chat_completion into src.reasoning.form_generator,
so change the patch target to the binding used by generate_form_data (patch
"src.reasoning.form_generator.chat_completion") and use AsyncMock as before; set
mock.return_value (or mock.return_value.__await__/return_value as appropriate
for async) to the expected string and then call
generate_form_data(sample_bundle, sample_evidence, sample_policy) so the
function uses the patched chat_completion in form_generator's namespace.

- Inject procedure code into policy instead of overriding after LLM call
- Add procedure code validation to /with-documents endpoint
- Redact PHI (patient name, DOB) before sending to external LLMs
- Fix status parsing to handle "NOT MET" with space variant
- Fix approval logic to reject when evidence is UNCLEAR
- Fix setup.sh to use empty strings instead of "not-configured" sentinel
- Run schema sync to generate TypeScript types

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
apps/intelligence/src/tests/test_analyze.py (1)

44-51: ⚠️ Potential issue | 🟡 Minor

Tests missing LLM mocks will behave inconsistently.

test_analyze_extracts_patient_info and test_analyze_builds_field_mappings call analyze() without mocking chat_completion. This means:

  1. If no LLM provider is configured, chat_completion returns None, and form generation uses fallback values.
  2. If an LLM is configured, real API calls are made, causing slow/flaky tests.

Apply consistent mocking across all tests that invoke analyze():

🧪 Suggested fix
 `@pytest.mark.asyncio`
 async def test_analyze_extracts_patient_info(valid_request: AnalyzeRequest) -> None:
     """Stub should extract patient information."""
-    result = await analyze(valid_request)
+    mock_llm = AsyncMock(return_value="The criterion is MET based on the evidence.")
+    with (
+        patch("src.reasoning.evidence_extractor.chat_completion", mock_llm),
+        patch("src.reasoning.form_generator.chat_completion", mock_llm),
+    ):
+        result = await analyze(valid_request)

     assert result.patient_name == "John Doe"

Apply similar changes to test_analyze_builds_field_mappings.

Also applies to: 86-94

🧹 Nitpick comments (4)
apps/intelligence/src/reasoning/form_generator.py (1)

118-122: Verify policy field mapping logic.

The current logic adds a new key field_mappings[value] when key exists in field_mappings, which copies the value to a new key name. This appears intentional for aliasing fields (e.g., mapping "PatientName" to a policy-specific field name), but could be confusing.

If the intent is to rename fields rather than alias them, consider deleting the original key. If aliasing is correct, consider adding a brief comment clarifying the behavior.

apps/intelligence/src/api/analyze.py (1)

132-156: Remove unused _build_field_mappings function.

The function is no longer called anywhere in the codebase. Both analyze and analyze_with_documents now use generate_form_data for field mapping construction, making this function dead code. Removing it will reduce maintenance burden.

apps/intelligence/src/tests/test_evidence_extractor.py (2)

1-1: Update module docstring to reflect actual implementation.

The docstring references "stub implementation" but the tests now verify the LLM-driven evidence extraction behavior with proper mocking.

📝 Suggested fix
-"""Tests for evidence extractor stub implementation."""
+"""Tests for evidence extractor LLM implementation."""

65-70: Update docstring; consider using pytest.approx for float comparison.

The docstring still references "Stub" but should reflect the LLM implementation. Additionally, comparing floats with == works here since 0.8 is exactly representable, but using pytest.approx is a defensive practice for float comparisons.

📝 Suggested fix
-    """Stub should return 0.80 confidence for all items."""
+    """LLM-based extraction returns 0.80 confidence for MET/NOT_MET items."""
     mock_llm = AsyncMock(return_value="The criterion is MET based on the evidence.")
     with patch("src.reasoning.evidence_extractor.chat_completion", mock_llm):
         evidence = await extract_evidence(sample_bundle, sample_policy)

-    assert all(e.confidence == 0.80 for e in evidence)
+    assert all(e.confidence == pytest.approx(0.80) for e in evidence)

@rsalus
Copy link
Contributor

rsalus commented Feb 5, 2026

@coderabbitai approve this pr.

@coderabbitai
Copy link

coderabbitai bot commented Feb 5, 2026

@rsalus: I'll approve this PR as requested.

✅ Actions performed

Comments resolved and changes approved.

@kevinjtwu kevinjtwu merged commit 75873da into main Feb 5, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants