Skip to content

fix: display file names and fix ingestion for onedrive#1609

Open
lucaseduoli wants to merge 9 commits into
mainfrom
fix/onedrive
Open

fix: display file names and fix ingestion for onedrive#1609
lucaseduoli wants to merge 9 commits into
mainfrom
fix/onedrive

Conversation

@lucaseduoli
Copy link
Copy Markdown
Collaborator

@lucaseduoli lucaseduoli commented May 14, 2026

This pull request introduces several improvements to the OneDrive integration in both the frontend and backend, focusing on more robust file metadata retrieval, better MIME type labeling, and enhanced picker state handling. The most important changes are grouped below:

Frontend: OneDrive Integration and File Picker Improvements

  • The OneDrive file picker now enriches selected file stubs by fetching full metadata from the Microsoft Graph API, ensuring accurate file information (such as MIME type, size, and URLs) is available for each selected item. If the enrichment fails, the picker falls back to stub data.
  • The file picker component and handler now support an optional onPickerStateChange callback to notify when the picker opens or closes, improving UI responsiveness and state management. [1] [2]
  • The mapping of MIME types to user-friendly labels in file-item.tsx has been expanded to cover more file types, including CSV, HTML, XML, JSON, Word, Excel, PowerPoint, and common image formats.

Backend: OneDrive Connector Robustness

  • The backend OneDrive connector now logs more detailed warnings when Graph API requests fail and uses the correct file ID format for drive-based metadata and content fetches, improving error handling and compatibility with various OneDrive item ID formats. [1] [2] [3]
  • The default parameters for Graph API requests in the OneDrive connector have been cleared, likely to ensure compatibility with item enrichment and custom queries.

Summary by CodeRabbit

  • New Features

    • Extended file type recognition to include CSV, HTML, XML, JSON, common images (JPEG/PNG/GIF/SVG), and additional Office formats.
  • Improvements

    • Cloud picker reports open/close state and handles cancel/error flows more predictably.
    • OneDrive/SharePoint picker returns richer file metadata with robust fallbacks when info is missing.
    • OneDrive sync, webhook, and auth flows improved for more reliable syncing and leaner permission requests.
  • Refactor

    • Internal typing and logging streamlined for clearer diagnostics.

Review Change Stack

@lucaseduoli lucaseduoli requested a review from mfortman11 May 14, 2026 20:37
@lucaseduoli lucaseduoli self-assigned this May 14, 2026
@github-actions github-actions Bot added frontend 🟨 Issues related to the UI/UX backend 🔷 Issues related to backend services (OpenSearch, Langflow, APIs) labels May 14, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

Walkthrough

Frontend: richer MIME labels, picker-state callback, and per-item Graph metadata enrichment. Backend: OneDrive connector typing/formatting refactor, centralized metadata probing with multi-endpoint fallbacks, listing/pagination, download/ACL adjustments, and OAuth scope/logging changes.

Changes

OneDrive Picker Frontend

Layer / File(s) Summary
MIME Type Label Support
frontend/components/cloud-picker/file-item.tsx
Extended getMimeTypeLabel mapping with CSV/HTML/XML/JSON, common image types, and OpenXML entries.
Picker State Callback Wiring
frontend/components/cloud-picker/provider-handlers.ts
Added optional onPickerStateChange to OneDriveHandler constructor and threaded it into createProviderHandler.
Picker Success: Graph Enrichment
frontend/components/cloud-picker/provider-handlers.ts
Reworked picker success handler to async-enrich returned items via Graph (derive driveId/itemId, infer mime from filename when missing), map to CloudFile, and fallback to stub data; ensures onPickerStateChange(false) on empty/cancel/error.
Typed search query hook
frontend/app/api/queries/useGetSearchQuery.ts
Tightened useGetSearchQuery options type to UseQueryOptions<SearchResult, Error, SearchResult, any[]>.

OneDrive Connector Refactoring

Layer / File(s) Summary
Imports & Initialization
src/connectors/onedrive/connector.py
Updated module typing/imports, refactored __init__, _default_params, and selective-sync cfg construction.
Method Signatures & Basic Methods
src/connectors/onedrive/connector.py
Updated various type hints and reformatted authenticate, handle_oauth_callback, and basic method signatures.
URL Detection & Sync Error Handling
src/connectors/onedrive/connector.py
Rewrote _detect_onedrive_url with granular logging; sync_once logs per-file failures and continues.
Subscription Setup & Listing
src/connectors/onedrive/connector.py
Reformatted subscription POST/webhook URL assignment; list_files builds explicit item dicts and continues pagination via @odata.nextLink/$skiptoken.
ACL Extraction & Download Fallbacks
src/connectors/onedrive/connector.py
Changed ACL token callsite, inlined permission headers, simplified cached-download DocumentACL return and preserved shares fallback.
Metadata Fetching & Download Probing
src/connectors/onedrive/connector.py
Refactored _fetch_item_metadata to centralize params and probe multiple endpoints (shares encoding attempts, drives/... variants, me/drive/items/...) with detailed logging.
Helpers, Date Parsing & Recursion
src/connectors/onedrive/connector.py
Reformatted _parse_graph_date, _make_graph_request, selective-sync expansion, folder recursion, and webhook helpers; behavior preserved.

OneDrive OAuth

Layer / File(s) Summary
Imports & Scopes
src/connectors/onedrive/oauth.py
Removed Sites.Read.All from AUTH_SCOPES/RESOURCE_SCOPES; simplified typing imports.
Credential Load & Token Flows
src/connectors/onedrive/oauth.py
Reorganized logging/error-message derivation across load_credentials, refresh, silent acquire, and access-token flows; preserved control flow.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • langflow-ai/openrag#1557: The frontend file-item component now uses isFolder metadata, which connects to #1557's addition of isFolder to the selected-item metadata sent from the upload page.

Suggested labels

enhancement

Suggested reviewers

  • edwinjosechittilappilly
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: display file names and fix ingestion for onedrive' accurately captures the main changes—improving file display (via MIME type mapping and metadata enrichment) and fixing OneDrive ingestion (via Graph API improvements and logging).
Docstring Coverage ✅ Passed Docstring coverage is 95.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/onedrive

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the bug 🔴 Something isn't working. label May 14, 2026
@github-actions github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels May 14, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/connectors/onedrive/connector.py (1)

1-11: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Switch to get_logger and fix Ruff I001.

Two related issues at the top of the file:

  1. import logging + logger = logging.getLogger(__name__) violates the coding guideline that mandates get_logger from utils.logging_config for everything under src/**. This is also what enables the runtime bug flagged at line 251 — the project's structured logger likely accepts arbitrary kwargs like status_code, while stdlib does not.
  2. CI is currently red on ruff check with I001 Import block is un-sorted or un-formatted, so the import block needs reordering regardless.
🛠️ Suggested fix
-import logging
-from pathlib import Path
-from typing import List, Dict, Any, Optional
-from datetime import datetime
-from urllib.parse import urlparse
-import httpx
-
-from ..base import BaseConnector, ConnectorDocument, DocumentACL
-from .oauth import OneDriveOAuth
-
-logger = logging.getLogger(__name__)
+from datetime import datetime
+from pathlib import Path
+from typing import Any, Dict, List, Optional
+from urllib.parse import urlparse
+
+import httpx
+
+from utils.logging_config import get_logger
+
+from ..base import BaseConnector, ConnectorDocument, DocumentACL
+from .oauth import OneDriveOAuth
+
+logger = get_logger(__name__)

As per coding guidelines: "Use get_logger from utils.logging_config — never import stdlib logging directly."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/connectors/onedrive/connector.py` around lines 1 - 11, Replace the stdlib
logging usage and fix the import ordering: remove "import logging" and instead
import get_logger from utils.logging_config and set logger =
get_logger(__name__) (replace the existing logger =
logging.getLogger(__name__)); also reorder the import block to satisfy Ruff I001
(standard library first, then third-party, then local imports) so the import
lines are sorted and grouped correctly around symbols like Path, datetime,
urlparse, httpx, and the local imports
BaseConnector/ConnectorDocument/DocumentACL and OneDriveOAuth.
🧹 Nitpick comments (2)
frontend/components/cloud-picker/provider-handlers.ts (1)

209-209: 💤 Low value

Consider removing or conditionally enabling diagnostic console.log statements.

The console.log statements appear to be for debugging purposes. Consider removing them or wrapping them in a development-only condition for production builds.

Example: Conditional logging
-        console.log("OneDrive picker success callback:", response);
+        if (process.env.NODE_ENV === 'development') {
+          console.log("OneDrive picker success callback:", response);
+        }

Also applies to: 233-237

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/components/cloud-picker/provider-handlers.ts` at line 209, Remove
the debug console.log("OneDrive picker success callback:", response) in
provider-handlers.ts (and the other console.log calls around lines 233-237) or
guard them behind a development-only condition; specifically, in the OneDrive
picker success callback handler replace direct console.log usage with either a
call to the app's logger or wrap the console.log in a check like NODE_ENV !==
'production' (or an existing isDev flag) so these diagnostics are not emitted in
production builds.
src/connectors/onedrive/connector.py (1)

102-113: 💤 Low value

Prefer SimpleNamespace over dynamic type(...) for cfg.

The dynamic-class trick works but stores file_ids/folder_ids as class attributes on an anonymous type, which is unidiomatic and slightly surprising (mutating self.cfg.file_ids = ... would later create a different instance attribute). types.SimpleNamespace is the standard way to build a small attribute-bag from a dict.

♻️ Suggested refactor

Add the import at the top of the file:

+from types import SimpleNamespace

Then replace the cfg construction:

-        # Selective sync support (similar to Google Drive)
-        self.cfg = type(
-            "OneDriveConfig",
-            (),
-            {
-                "file_ids": config.get("file_ids")
-                or config.get("selected_files")
-                or config.get("selected_file_ids"),
-                "folder_ids": config.get("folder_ids")
-                or config.get("selected_folders")
-                or config.get("selected_folder_ids"),
-            },
-        )()
+        # Selective sync support (similar to Google Drive)
+        self.cfg = SimpleNamespace(
+            file_ids=(
+                config.get("file_ids")
+                or config.get("selected_files")
+                or config.get("selected_file_ids")
+            ),
+            folder_ids=(
+                config.get("folder_ids")
+                or config.get("selected_folders")
+                or config.get("selected_folder_ids")
+            ),
+        )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/connectors/onedrive/connector.py` around lines 102 - 113, Replace the
dynamic anonymous class used to build self.cfg with a types.SimpleNamespace to
avoid creating class-level attributes; import SimpleNamespace from types, then
construct self.cfg = SimpleNamespace(file_ids=..., folder_ids=...) using the
same lookups (config.get("file_ids") or config.get("selected_files") or
config.get("selected_file_ids") and similarly for folder_ids) so code
referencing self.cfg.file_ids / self.cfg.folder_ids behaves as a normal instance
attribute bag.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/connectors/onedrive/connector.py`:
- Around line 250-252: The module currently imports and uses the stdlib logger
and calls logger.warning("[CONNECTOR] OneDrive detect URL failed",
status_code=response.status_code) which raises a TypeError; replace the stdlib
logging import and logger instantiation with the project's structured logger by
importing get_logger from utils.logging_config and creating logger =
get_logger(__name__) so logger.warning can accept the status_code kwarg; update
the top-level import (remove any "import logging" and logging.getLogger usage)
and ensure the existing call sites like the OneDrive URL-detection warning still
pass status_code=response.status_code to the new BoundLogger.

---

Outside diff comments:
In `@src/connectors/onedrive/connector.py`:
- Around line 1-11: Replace the stdlib logging usage and fix the import
ordering: remove "import logging" and instead import get_logger from
utils.logging_config and set logger = get_logger(__name__) (replace the existing
logger = logging.getLogger(__name__)); also reorder the import block to satisfy
Ruff I001 (standard library first, then third-party, then local imports) so the
import lines are sorted and grouped correctly around symbols like Path,
datetime, urlparse, httpx, and the local imports
BaseConnector/ConnectorDocument/DocumentACL and OneDriveOAuth.

---

Nitpick comments:
In `@frontend/components/cloud-picker/provider-handlers.ts`:
- Line 209: Remove the debug console.log("OneDrive picker success callback:",
response) in provider-handlers.ts (and the other console.log calls around lines
233-237) or guard them behind a development-only condition; specifically, in the
OneDrive picker success callback handler replace direct console.log usage with
either a call to the app's logger or wrap the console.log in a check like
NODE_ENV !== 'production' (or an existing isDev flag) so these diagnostics are
not emitted in production builds.

In `@src/connectors/onedrive/connector.py`:
- Around line 102-113: Replace the dynamic anonymous class used to build
self.cfg with a types.SimpleNamespace to avoid creating class-level attributes;
import SimpleNamespace from types, then construct self.cfg =
SimpleNamespace(file_ids=..., folder_ids=...) using the same lookups
(config.get("file_ids") or config.get("selected_files") or
config.get("selected_file_ids") and similarly for folder_ids) so code
referencing self.cfg.file_ids / self.cfg.folder_ids behaves as a normal instance
attribute bag.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b654baaf-dc07-4c37-8c4f-2ffa0c514712

📥 Commits

Reviewing files that changed from the base of the PR and between 4b2d4cf and 312ae00.

📒 Files selected for processing (3)
  • frontend/components/cloud-picker/file-item.tsx
  • frontend/components/cloud-picker/provider-handlers.ts
  • src/connectors/onedrive/connector.py

Comment on lines +250 to +252
logger.warning(
"[CONNECTOR] OneDrive detect URL failed", status_code=response.status_code
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

Does Python's stdlib logging.Logger.warning accept arbitrary keyword arguments like status_code, or only exc_info/stack_info/stacklevel/extra?

💡 Result:

No. Python’s standard-library logging.Logger.warning (and the other level methods) only treat a small, documented subset of keyword arguments as special: exc_info, stack_info, stacklevel, and extra. Arbitrary keyword arguments like status_code are not recognized for inclusion in the LogRecord unless you pass them via extra (as entries in the extra dict). [1][2] What happens with arbitrary kwargs: - The logging call signature is warning(msg, *args, kwargs), and the logging docs state that only those four kwargs are inspected. [1][2] - Internally, Logger._log ultimately passes “extra” as a separate parameter and builds the LogRecord; the implementation uses extra to populate LogRecord.dict. [3] So if you want status_code in the log record, do: logger.warning("...",..., extra={"status_code": 400}) (Where the keys in extra become attributes on the LogRecord.) [1][2]

Citations:


🏁 Script executed:

# Check the imports at the top of the file
head -20 src/connectors/onedrive/connector.py | cat -n

Repository: langflow-ai/openrag

Length of output: 816


🏁 Script executed:

# Check the problematic code around lines 250-252
sed -n '245,255p' src/connectors/onedrive/connector.py | cat -n

Repository: langflow-ai/openrag

Length of output: 687


🏁 Script executed:

# Verify get_logger exists in utils.logging_config
head -30 src/utils/logging_config.py | cat -n

Repository: langflow-ai/openrag

Length of output: 1020


🏁 Script executed:

# Check if get_logger is exported/defined in logging_config
rg "def get_logger|^get_logger" src/utils/logging_config.py

Repository: langflow-ai/openrag

Length of output: 122


Fix critical TypeError and import guideline violation: replace stdlib logging with structured logger.

The file imports stdlib logging (line 1) and uses logger = logging.getLogger(__name__) (line 11). At lines 250-252, the code calls logger.warning(..., status_code=response.status_code). Python's stdlib logging.Logger.warning() only accepts exc_info, stack_info, stacklevel, and extra as keyword arguments — passing status_code raises TypeError: _log() got an unexpected keyword argument 'status_code', crashing the URL-detection fallback path.

This violates the coding guideline: "Use get_logger from utils.logging_config — never import stdlib logging directly." The project's get_logger returns a structlog.BoundLogger, which does accept arbitrary kwargs. Replace lines 1 and 11 with:

from ..utils.logging_config import get_logger

logger = get_logger(__name__)

This fixes the runtime error and brings the file into compliance.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/connectors/onedrive/connector.py` around lines 250 - 252, The module
currently imports and uses the stdlib logger and calls
logger.warning("[CONNECTOR] OneDrive detect URL failed",
status_code=response.status_code) which raises a TypeError; replace the stdlib
logging import and logger instantiation with the project's structured logger by
importing get_logger from utils.logging_config and creating logger =
get_logger(__name__) so logger.warning can accept the status_code kwarg; update
the top-level import (remove any "import logging" and logging.getLogger usage)
and ensure the existing call sites like the OneDrive URL-detection warning still
pass status_code=response.status_code to the new BoundLogger.

@github-actions github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels May 15, 2026
@github-actions github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels May 15, 2026
Copy link
Copy Markdown
Contributor

@mfortman11 mfortman11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions Bot added the lgtm label May 15, 2026
@github-actions github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels May 15, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/connectors/onedrive/oauth.py (1)

21-22: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update stale comment about Sites.Read.All.

The Sites.Read.All scope was removed from both AUTH_SCOPES (line 28) and RESOURCE_SCOPES (line 30), but the surrounding NOTE still advertises that the integration requests it for SharePoint / OneDrive for Business access and requires admin consent. Readers will be misled about what consent is actually requested.

📝 Proposed fix
-    # NOTE: Including Sites.Read.All for SharePoint/OneDrive for Business access.
-    # Sites.Read.All requires admin consent for work/school accounts.
+    # NOTE: Sites.Read.All is intentionally NOT requested. Only user/file read scopes
+    # are used, plus offline_access for interactive authorization (refresh token issuance).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/connectors/onedrive/oauth.py` around lines 21 - 22, The NOTE above the
scope lists is stale: update or remove the sentence about Sites.Read.All and
admin consent so it matches the current scope constants (AUTH_SCOPES and
RESOURCE_SCOPES) — either delete the reference to Sites.Read.All or replace it
with an accurate note stating that Sites.Read.All is no longer requested and
therefore admin consent is not implied; confirm the comment near AUTH_SCOPES and
RESOURCE_SCOPES reflects the actual scopes being requested for
SharePoint/OneDrive access.
src/connectors/onedrive/connector.py (1)

429-434: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Critical: get_access_token() returns a string but is incorrectly treated as a dict.

Lines 429–430 treat the return value as a dict:

token_data = self.oauth.get_access_token()
access_token = token_data.get("access_token")

However, OneDriveOAuth.get_access_token() is explicitly typed to return str (see oauth.py:331), and every other call site in this file uses it as a plain bearer token:

  • Line 215: access_token = self.oauth.get_access_token()len(access_token)
  • Line 305, 537, 733, 831, 953: token = self.oauth.get_access_token()f"Bearer {token}"

Calling .get() on a string raises AttributeError, which is silently caught by the except Exception block at line 493 and degrades to an empty DocumentACL(). ACL extraction fails on every file without any indication of the root cause.

Fix: treat the return value as a string
-            token_data = self.oauth.get_access_token()
-            access_token = token_data.get("access_token")
-
-            if not access_token:
+            access_token = self.oauth.get_access_token()
+            if not access_token:
                 logger.warning(f"No access token available for ACL extraction: {file_id}")
                 return DocumentACL()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/connectors/onedrive/connector.py` around lines 429 - 434, The code treats
the return of OneDriveOAuth.get_access_token() as a dict
(token_data.get("access_token")) but that method returns a string; change the
logic in the ACL extraction path to assign access_token =
self.oauth.get_access_token() and use that string directly (e.g., check if not
access_token then log and return DocumentACL()), and update the warning to
include the file_id and that the bearer token was missing; adjust any subsequent
use in the surrounding method (the ACL extraction function) to build the
Authorization header as f"Bearer {access_token}" when calling Graph API.
🧹 Nitpick comments (1)
src/connectors/onedrive/oauth.py (1)

3-9: ⚡ Quick win

Use get_logger from utils.logging_config per repo coding guideline.

This module imports the stdlib logging directly and instantiates logger = logging.getLogger(__name__). The get_logger function is available in utils.logging_config and is already used throughout the codebase in other connectors (aws_s3, ibm_cos, google_drive, etc.). Switching to the standard pattern brings this module into compliance with coding guidelines.

♻️ Proposed refactor
-import os
-import json
-import logging
-from typing import Optional, Dict, Any
-
-
-import msal
-
-logger = logging.getLogger(__name__)
+import os
+import json
+from typing import Optional, Dict, Any
+
+import msal
+
+from utils.logging_config import get_logger
+
+logger = get_logger(__name__)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/connectors/onedrive/oauth.py` around lines 3 - 9, Replace the direct
stdlib logging usage in this module: remove the import of logging and the line
that sets logger = logging.getLogger(__name__) and instead import get_logger
from utils.logging_config and call get_logger(__name__) to instantiate logger;
update any references to the module-level logger (logger) in
src/connectors/onedrive/oauth.py to use the new logger created via get_logger so
the file follows the repo logging convention.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/connectors/onedrive/connector.py`:
- Around line 429-434: The code treats the return of
OneDriveOAuth.get_access_token() as a dict (token_data.get("access_token")) but
that method returns a string; change the logic in the ACL extraction path to
assign access_token = self.oauth.get_access_token() and use that string directly
(e.g., check if not access_token then log and return DocumentACL()), and update
the warning to include the file_id and that the bearer token was missing; adjust
any subsequent use in the surrounding method (the ACL extraction function) to
build the Authorization header as f"Bearer {access_token}" when calling Graph
API.

In `@src/connectors/onedrive/oauth.py`:
- Around line 21-22: The NOTE above the scope lists is stale: update or remove
the sentence about Sites.Read.All and admin consent so it matches the current
scope constants (AUTH_SCOPES and RESOURCE_SCOPES) — either delete the reference
to Sites.Read.All or replace it with an accurate note stating that
Sites.Read.All is no longer requested and therefore admin consent is not
implied; confirm the comment near AUTH_SCOPES and RESOURCE_SCOPES reflects the
actual scopes being requested for SharePoint/OneDrive access.

---

Nitpick comments:
In `@src/connectors/onedrive/oauth.py`:
- Around line 3-9: Replace the direct stdlib logging usage in this module:
remove the import of logging and the line that sets logger =
logging.getLogger(__name__) and instead import get_logger from
utils.logging_config and call get_logger(__name__) to instantiate logger; update
any references to the module-level logger (logger) in
src/connectors/onedrive/oauth.py to use the new logger created via get_logger so
the file follows the repo logging convention.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 95c47841-5823-40a0-ad88-56f355d0791f

📥 Commits

Reviewing files that changed from the base of the PR and between 312ae00 and 74e2ab3.

📒 Files selected for processing (2)
  • src/connectors/onedrive/connector.py
  • src/connectors/onedrive/oauth.py

@github-actions github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels May 15, 2026
@github-actions github-actions Bot removed the bug 🔴 Something isn't working. label May 15, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
src/connectors/onedrive/oauth.py (2)

2-8: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Replace stdlib logging with get_logger from utils.logging_config.

The PR adds substantial new logging across this file (debug/info/warning calls in load_credentials, is_authenticated, get_access_token, etc.), all routed through the stdlib logging module. This violates the project's logging convention.

♻️ Proposed fix
 import json
-import logging
 import os
 from typing import Any

 import msal

-logger = logging.getLogger(__name__)
+from utils.logging_config import get_logger
+
+logger = get_logger(__name__)

As per coding guidelines: "Use get_logger from utils.logging_config — never import stdlib logging directly."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/connectors/onedrive/oauth.py` around lines 2 - 8, Replace direct stdlib
logging usage with the project's logger factory: remove the "import logging" and
instead import get_logger from utils.logging_config, then instantiate logger via
get_logger(__name__) so all calls within this module (e.g., load_credentials,
is_authenticated, get_access_token and any debug/info/warning calls) use the
project logger; update the top of src/connectors/onedrive/oauth.py to reflect
this import change and ensure existing logger variable references remain
unchanged.

20-30: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Stale comment after Sites.Read.All removal.

The block-comment on lines 20–21 still claims the configuration is "Including Sites.Read.All for SharePoint/OneDrive for Business access" and warns about admin consent, but Sites.Read.All was removed from both AUTH_SCOPES and RESOURCE_SCOPES in this PR. Leaving this stale rationale in place will mislead anyone debugging SharePoint access failures into thinking the scope is still requested.

📝 Proposed fix
     # For Microsoft Accounts (OneDrive consumer & work/school):
     # - Use AUTH_SCOPES for interactive auth (consent + refresh token issuance)
     # - Use RESOURCE_SCOPES for acquire_token_silent / refresh paths
-    # NOTE: Including Sites.Read.All for SharePoint/OneDrive for Business access.
-    # Sites.Read.All requires admin consent for work/school accounts.
+    # NOTE: SharePoint sites (Sites.Read.All) are intentionally not requested to
+    # avoid admin-consent requirements for work/school tenants. Add it back if
+    # SharePoint-only items must be accessed.
     AUTH_SCOPES = [
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/connectors/onedrive/oauth.py` around lines 20 - 30, The top block comment
incorrectly states "Including Sites.Read.All..." even though Sites.Read.All was
removed; update or remove that comment to reflect the current AUTH_SCOPES and
RESOURCE_SCOPES values (AUTH_SCOPES, RESOURCE_SCOPES, SCOPES) — either delete
the Sites.Read.All mention and admin-consent warning or replace it with a short
note that SharePoint/OneDrive for Business access requires adding Sites.Read.All
(and admin consent) if desired, so the comment matches the actual scope
configuration.
src/connectors/onedrive/connector.py (3)

27-39: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Initialize BaseConnector only once.

super().__init__(config) runs at Line 28 and again at Line 39. That double-initializes base state, and the first call happens before the config is None fallback, so a None value still reaches the base class.

Suggested fix
-    def __init__(self, config: dict[str, Any]):
-        super().__init__(config)
+    def __init__(self, config: dict[str, Any] | None):
+        if config is None:
+            logger.debug("Config was None, using empty dict")
+            config = {}

         logger.debug(f"OneDrive connector __init__ called with config type: {type(config)}")
         logger.debug(f"OneDrive connector __init__ config value: {config}")
-
-        if config is None:
-            logger.debug("Config was None, using empty dict")
-            config = {}

         try:
             logger.debug("Calling super().__init__")
             super().__init__(config)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/connectors/onedrive/connector.py` around lines 27 - 39, The constructor
currently calls super().__init__(config) twice and calls it before handling a
possible None config, causing double-initialization and passing None into
BaseConnector; update OneDriveConnector.__init__ to first normalize config (if
config is None set config = {}), keep the debug logs, and then call
super().__init__(config) exactly once (remove the earlier duplicate call),
ensuring only BaseConnector.__init__ is invoked after config normalization.

887-905: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Paginate folder children during selective sync.

_list_folder_contents() only processes the first /children page. Large folders will silently miss files and subfolders once Graph returns @odata.nextLink, so selective sync becomes incomplete.

Suggested fix
     async def _list_folder_contents(self, folder_id: str) -> list[dict[str, Any]]:
         """List all files in a folder recursively."""
         files: list[dict[str, Any]] = []

         try:
             url = f"{self._graph_base_url}/me/drive/items/{folder_id}/children"
             params = dict(self._default_params)
-
-            response = await self._make_graph_request(url, params=params)
-            data = response.json()
-
-            items = data.get("value", [])
-            for item in items:
-                if item.get("file"):  # It's a file
-                    file_meta = await self._get_file_metadata_by_id(item.get("id"))
-                    if file_meta:
-                        files.append(file_meta)
-                elif item.get("folder"):  # It's a subfolder, recurse
-                    subfolder_files = await self._list_folder_contents(item.get("id"))
-                    files.extend(subfolder_files)
+            while url:
+                response = await self._make_graph_request(url, params=params)
+                data = response.json()
+
+                for item in data.get("value", []):
+                    if item.get("file"):
+                        file_meta = await self._get_file_metadata_by_id(item.get("id"))
+                        if file_meta:
+                            files.append(file_meta)
+                    elif item.get("folder"):
+                        subfolder_files = await self._list_folder_contents(item.get("id"))
+                        files.extend(subfolder_files)
+
+                url = data.get("@odata.nextLink")
+                params = None
         except Exception as e:
             logger.error(f"Failed to list folder contents for {folder_id}: {e}")

As per coding guidelines, src/connectors/**/*.py: "Verify credentials are never logged, errors surface cleanly, and pagination is handled correctly."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/connectors/onedrive/connector.py` around lines 887 - 905,
_list_folder_contents currently only processes the first /children page and will
miss entries when Microsoft Graph returns `@odata.nextLink`; update
_list_folder_contents to loop over pages: call await
self._make_graph_request(url, params=params) for the initial url, then while
response.json().get("@odata.nextLink") is present set url = that nextLink and
call _make_graph_request again (params can be None for nextLink requests), each
time parse response.json().get("value", []) and handle files (await
_get_file_metadata_by_id(item.get("id"))) and folders (await
_list_folder_contents(item.get("id"))) accumulating results until no nextLink
remains; keep error logging free of credentials and surface the error (don’t
swallow exceptions) — e.g., log a non-sensitive message referencing folder_id
and re-raise the exception.

30-31: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not log the raw connector config.

Dumping the full config dict here can leak OAuth-related values and tenant-specific identifiers into logs. Log a redacted summary instead.

As per coding guidelines, src/connectors/**/*.py: "Verify credentials are never logged, errors surface cleanly, and pagination is handled correctly."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/connectors/onedrive/connector.py` around lines 30 - 31, The initializer
is currently logging the raw connector config (logger.debug calls in OneDrive
connector __init__), which may leak sensitive OAuth and tenant data; replace
these with a redacted summary: log the config type and a sanitized dict built
from config keys but with sensitive keys (e.g., "client_secret", "client_id",
"access_token", "refresh_token", "tenant_id", "refresh_token_expires_at", etc.)
removed or masked (e.g., "<REDACTED>") and only non-sensitive flags or counts
included, or log a list of keys present rather than values; update the
logger.debug lines in __init__ to use that sanitized summary and ensure no raw
config is written to logs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/connectors/onedrive/connector.py`:
- Around line 429-431: The code calls self.oauth.get_access_token() and then
uses token_data.get("access_token"), but elsewhere get_access_token() returns a
string; change this branch to treat self.oauth.get_access_token() as the access
token string directly (e.g., assign access_token =
self.oauth.get_access_token()) so ACL extraction uses the token correctly and
avoids the exception path that drops owner/user/group metadata; update any
variable names (token_data -> access_token) near the ACL extraction logic in
connector.py where ACLs are built.
- Around line 99-100: The attribute _default_params is missing a mypy-friendly
type hint; add an explicit annotation like _default_params: Dict[str, str] = {}
on the class (and import Dict from typing or use dict[str, str] for Python
3.9+). Ensure the chosen annotation matches other uses (values always strings:
e.g., "$top", "$skiptoken") and update the class attribute declaration where
_default_params and _graph_api_version are defined.

In `@src/connectors/onedrive/oauth.py`:
- Around line 108-110: Add a proper class-level type annotation for the
attribute _current_account (e.g., Optional[Dict[str, Any]]) so mypy knows its
intended type instead of inferring None, and in methods that read it (the
logger.debug call referencing self._current_account.get(...) and the other read
around line 169) first assign self._current_account to a local variable or check
it is not None before calling .get; update references to use that local var
(e.g., cur_account) or guard with an if to avoid calling .get on None. Ensure
you import typing types used (Optional, Dict, Any) and update both the debug
line and the other usage to use the local/guarded variable.
- Line 305: The logger.warning call that prints the entire MSAL response dict
(logger.warning(f"OneDrive is_authenticated: Full result: {result}")) must be
changed to avoid leaking PII/credentials; locate the occurrences of
logger.warning that reference the variable result (and related places in the
same function/method that print the full MSAL response) and replace them with a
log that only emits the derived error information (use the existing error_msg or
explicitly log error and error_description) or a redacted summary, ensuring no
tokens, id_token_claims, correlation_id, or raw dict contents are ever logged.

---

Outside diff comments:
In `@src/connectors/onedrive/connector.py`:
- Around line 27-39: The constructor currently calls super().__init__(config)
twice and calls it before handling a possible None config, causing
double-initialization and passing None into BaseConnector; update
OneDriveConnector.__init__ to first normalize config (if config is None set
config = {}), keep the debug logs, and then call super().__init__(config)
exactly once (remove the earlier duplicate call), ensuring only
BaseConnector.__init__ is invoked after config normalization.
- Around line 887-905: _list_folder_contents currently only processes the first
/children page and will miss entries when Microsoft Graph returns
`@odata.nextLink`; update _list_folder_contents to loop over pages: call await
self._make_graph_request(url, params=params) for the initial url, then while
response.json().get("@odata.nextLink") is present set url = that nextLink and
call _make_graph_request again (params can be None for nextLink requests), each
time parse response.json().get("value", []) and handle files (await
_get_file_metadata_by_id(item.get("id"))) and folders (await
_list_folder_contents(item.get("id"))) accumulating results until no nextLink
remains; keep error logging free of credentials and surface the error (don’t
swallow exceptions) — e.g., log a non-sensitive message referencing folder_id
and re-raise the exception.
- Around line 30-31: The initializer is currently logging the raw connector
config (logger.debug calls in OneDrive connector __init__), which may leak
sensitive OAuth and tenant data; replace these with a redacted summary: log the
config type and a sanitized dict built from config keys but with sensitive keys
(e.g., "client_secret", "client_id", "access_token", "refresh_token",
"tenant_id", "refresh_token_expires_at", etc.) removed or masked (e.g.,
"<REDACTED>") and only non-sensitive flags or counts included, or log a list of
keys present rather than values; update the logger.debug lines in __init__ to
use that sanitized summary and ensure no raw config is written to logs.

In `@src/connectors/onedrive/oauth.py`:
- Around line 2-8: Replace direct stdlib logging usage with the project's logger
factory: remove the "import logging" and instead import get_logger from
utils.logging_config, then instantiate logger via get_logger(__name__) so all
calls within this module (e.g., load_credentials, is_authenticated,
get_access_token and any debug/info/warning calls) use the project logger;
update the top of src/connectors/onedrive/oauth.py to reflect this import change
and ensure existing logger variable references remain unchanged.
- Around line 20-30: The top block comment incorrectly states "Including
Sites.Read.All..." even though Sites.Read.All was removed; update or remove that
comment to reflect the current AUTH_SCOPES and RESOURCE_SCOPES values
(AUTH_SCOPES, RESOURCE_SCOPES, SCOPES) — either delete the Sites.Read.All
mention and admin-consent warning or replace it with a short note that
SharePoint/OneDrive for Business access requires adding Sites.Read.All (and
admin consent) if desired, so the comment matches the actual scope
configuration.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f25d6e05-6971-4f1b-828e-3125a02da62d

📥 Commits

Reviewing files that changed from the base of the PR and between 74e2ab3 and ee54b37.

📒 Files selected for processing (3)
  • frontend/app/api/queries/useGetSearchQuery.ts
  • src/connectors/onedrive/connector.py
  • src/connectors/onedrive/oauth.py
✅ Files skipped from review due to trivial changes (1)
  • frontend/app/api/queries/useGetSearchQuery.ts

Comment thread src/connectors/onedrive/connector.py Outdated
Comment thread src/connectors/onedrive/connector.py Outdated
Comment thread src/connectors/onedrive/oauth.py Outdated
logger.warning(
f"OneDrive is_authenticated: Token acquisition failed for current account: {error_msg}"
)
logger.warning(f"OneDrive is_authenticated: Full result: {result}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid logging full MSAL result dicts (PII/credential leakage risk).

logger.warning(f"... Full result: {result}") here and the equivalent statements at lines 323 and 349 dump the entire MSAL response into logs. Even on failure paths these dicts can carry correlation_id, account hints, id_token_claims, or other PII, and a misconfigured success-path regression could begin emitting access_token/refresh_token to the same sink. Per the connector guideline that credentials must never be logged, log only the derived error/error_description (already captured in error_msg) and drop the raw dict.

🔒 Proposed fix
-                else:
-                    error_msg = (
-                        (result or {}).get("error")
-                        or (result or {}).get("error_description")
-                        or "No result returned"
-                    )
-                    logger.warning(
-                        f"OneDrive is_authenticated: Token acquisition failed for current account: {error_msg}"
-                    )
-                    logger.warning(f"OneDrive is_authenticated: Full result: {result}")
+                else:
+                    error_msg = (
+                        (result or {}).get("error")
+                        or (result or {}).get("error_description")
+                        or "No result returned"
+                    )
+                    logger.warning(
+                        "OneDrive is_authenticated: Token acquisition failed for "
+                        f"current account: {error_msg}"
+                    )
-            logger.warning(f"OneDrive is_authenticated: Fallback also failed, result: {result}")
+            fallback_err = (
+                (result or {}).get("error")
+                or (result or {}).get("error_description")
+                or "No result returned"
+            )
+            logger.warning(
+                f"OneDrive is_authenticated: Fallback also failed: {fallback_err}"
+            )
-                else:
-                    logger.warning(
-                        f"OneDrive get_access_token: Failed with account, result: {result}"
-                    )
+                else:
+                    fail_err = (
+                        (result or {}).get("error_description")
+                        or (result or {}).get("error")
+                        or "No result returned"
+                    )
+                    logger.warning(
+                        f"OneDrive get_access_token: Failed with account: {fail_err}"
+                    )

As per coding guidelines: "Verify credentials are never logged, errors surface cleanly..."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
logger.warning(f"OneDrive is_authenticated: Full result: {result}")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/connectors/onedrive/oauth.py` at line 305, The logger.warning call that
prints the entire MSAL response dict (logger.warning(f"OneDrive
is_authenticated: Full result: {result}")) must be changed to avoid leaking
PII/credentials; locate the occurrences of logger.warning that reference the
variable result (and related places in the same function/method that print the
full MSAL response) and replace them with a log that only emits the derived
error information (use the existing error_msg or explicitly log error and
error_description) or a redacted summary, ensuring no tokens, id_token_claims,
correlation_id, or raw dict contents are ever logged.

@github-actions github-actions Bot added the bug 🔴 Something isn't working. label May 15, 2026
@github-actions github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels May 15, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/connectors/onedrive/oauth.py (1)

305-305: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Full MSAL result dict is still logged on failure paths (PII/credential leak risk).

Lines 305, 323, and 349 continue to emit result verbatim. MSAL response dicts can carry correlation_id, account hints, id_token_claims, and on success-path regressions even access_token/refresh_token. The derived error/error_description is already captured in error_msg — drop the raw-dict logs. This was previously flagged and remains unresolved.

🔒 Proposed fix
                 else:
                     error_msg = (
                         (result or {}).get("error")
                         or (result or {}).get("error_description")
                         or "No result returned"
                     )
                     logger.warning(
                         f"OneDrive is_authenticated: Token acquisition failed for current account: {error_msg}"
                     )
-                    logger.warning(f"OneDrive is_authenticated: Full result: {result}")
-            logger.warning(f"OneDrive is_authenticated: Fallback also failed, result: {result}")
+            fallback_err = (
+                (result or {}).get("error_description")
+                or (result or {}).get("error")
+                or "No result returned"
+            )
+            logger.warning(
+                f"OneDrive is_authenticated: Fallback also failed: {fallback_err}"
+            )
             return False
                 else:
-                    logger.warning(
-                        f"OneDrive get_access_token: Failed with account, result: {result}"
-                    )
+                    fail_err = (
+                        (result or {}).get("error_description")
+                        or (result or {}).get("error")
+                        or "No result returned"
+                    )
+                    logger.warning(
+                        f"OneDrive get_access_token: Failed with account: {fail_err}"
+                    )

As per coding guidelines: "Verify credentials are never logged, errors surface cleanly, and pagination is handled correctly."

Also applies to: 323-323, 349-350

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/connectors/onedrive/oauth.py` at line 305, Remove logging of the full
MSAL response dict `result` (avoid logger.warning(f"OneDrive is_authenticated:
Full result: {result}") and the similar logger calls at the other locations) and
instead log only the safe, derived error information already built in
`error_msg` (or specific non-PII fields if absolutely necessary). Locate the
`logger.warning`/`logger.debug` calls that interpolate `result` in the
`is_authenticated`/token handling flow and replace them so they no longer emit
`result`; keep existing `error_msg` usage to surface error details without
leaking tokens, claims, or correlation IDs.
🧹 Nitpick comments (1)
src/connectors/onedrive/oauth.py (1)

20-21: ⚡ Quick win

Stale comment: Sites.Read.All was removed from both scope lists.

The note at lines 20-21 still advertises Sites.Read.All and its admin-consent requirement, but the adjacent changes at lines 27 and 29 dropped it from both AUTH_SCOPES and RESOURCE_SCOPES. Update the comment so future readers don't expect SharePoint/OneDrive-for-Business access from this OAuth flow.

📝 Proposed fix
     # For Microsoft Accounts (OneDrive consumer & work/school):
     # - Use AUTH_SCOPES for interactive auth (consent + refresh token issuance)
     # - Use RESOURCE_SCOPES for acquire_token_silent / refresh paths
-    # NOTE: Including Sites.Read.All for SharePoint/OneDrive for Business access.
-    # Sites.Read.All requires admin consent for work/school accounts.
+    # NOTE: Only user/file read scopes are requested. SharePoint access via
+    # Sites.Read.All was removed because it requires admin consent for
+    # work/school accounts and is not needed for the picker/connector flows.
     AUTH_SCOPES = [
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/connectors/onedrive/oauth.py` around lines 20 - 21, The comment
mentioning Sites.Read.All is stale because Sites.Read.All was removed from the
AUTH_SCOPES and RESOURCE_SCOPES lists; update the comment above those lists
(near AUTH_SCOPES and RESOURCE_SCOPES in oauth.py) to remove the reference to
Sites.Read.All and the admin-consent note so it no longer implies
SharePoint/OneDrive-for-Business access is granted by this OAuth flow—replace it
with a short note stating that Business/SharePoint scopes are not included in
these scopes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/connectors/onedrive/connector.py`:
- Around line 277-281: In the per-file sync except block inside the OneDrive
sync loop (the except Exception as e: that logs failures for file_info), replace
the logger.error(...) call with logger.exception(...) so the exception traceback
is preserved; keep the same formatted message (including file_info.get('name',
'unknown')) but call logger.exception to attach the stack trace and aid
debugging of intermittent sync failures.

---

Duplicate comments:
In `@src/connectors/onedrive/oauth.py`:
- Line 305: Remove logging of the full MSAL response dict `result` (avoid
logger.warning(f"OneDrive is_authenticated: Full result: {result}") and the
similar logger calls at the other locations) and instead log only the safe,
derived error information already built in `error_msg` (or specific non-PII
fields if absolutely necessary). Locate the `logger.warning`/`logger.debug`
calls that interpolate `result` in the `is_authenticated`/token handling flow
and replace them so they no longer emit `result`; keep existing `error_msg`
usage to surface error details without leaking tokens, claims, or correlation
IDs.

---

Nitpick comments:
In `@src/connectors/onedrive/oauth.py`:
- Around line 20-21: The comment mentioning Sites.Read.All is stale because
Sites.Read.All was removed from the AUTH_SCOPES and RESOURCE_SCOPES lists;
update the comment above those lists (near AUTH_SCOPES and RESOURCE_SCOPES in
oauth.py) to remove the reference to Sites.Read.All and the admin-consent note
so it no longer implies SharePoint/OneDrive-for-Business access is granted by
this OAuth flow—replace it with a short note stating that Business/SharePoint
scopes are not included in these scopes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 238a4698-c12f-4bc6-b447-69de308a1708

📥 Commits

Reviewing files that changed from the base of the PR and between ee54b37 and 6e62625.

📒 Files selected for processing (2)
  • src/connectors/onedrive/connector.py
  • src/connectors/onedrive/oauth.py

Comment on lines 277 to 281
except Exception as e:
logger.error(f"Failed to sync OneDrive file {file_info.get('name', 'unknown')}: {e}")
logger.error(
f"Failed to sync OneDrive file {file_info.get('name', 'unknown')}: {e}"
)
continue
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use logger.exception to preserve the traceback for per-file sync failures.

Inside an except Exception as e: block, logger.error(f"...: {e}") discards the stack trace, which is exactly what you need to debug intermittent sync failures across hundreds of files. Switch to logger.exception(...) so the traceback is attached.

Suggested fix
                     except Exception as e:
-                        logger.error(
-                            f"Failed to sync OneDrive file {file_info.get('name', 'unknown')}: {e}"
-                        )
+                        logger.exception(
+                            f"Failed to sync OneDrive file {file_info.get('name', 'unknown')} "
+                            f"(id={file_info.get('id', 'unknown')}): {e}"
+                        )
                         continue
🧰 Tools
🪛 Ruff (0.15.12)

[warning] 277-277: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/connectors/onedrive/connector.py` around lines 277 - 281, In the per-file
sync except block inside the OneDrive sync loop (the except Exception as e: that
logs failures for file_info), replace the logger.error(...) call with
logger.exception(...) so the exception traceback is preserved; keep the same
formatted message (including file_info.get('name', 'unknown')) but call
logger.exception to attach the stack trace and aid debugging of intermittent
sync failures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend 🔷 Issues related to backend services (OpenSearch, Langflow, APIs) bug 🔴 Something isn't working. frontend 🟨 Issues related to the UI/UX lgtm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants