feat(composio): incremental sync with per-item persistence for Gmail and Notion by senamakel · Pull Request #519 · tinyhumansai/openhuman

senamakel · 2026-04-13T00:49:09Z

Summary

Incremental sync — Gmail and Notion providers now track what has been synced via persistent cursor + synced-ID set in the local KV store, fetching only new/updated items instead of re-downloading everything each cycle.
Per-item memory documents — Each email and Notion page is stored as its own memory document (e.g. composio-gmail-msg-{id}, composio-notion-page-{id}) instead of one giant snapshot blob, improving agent recall granularity.
Daily request budget — Each provider connection is capped at 500 execute_tool API calls per calendar day (auto-resets), preventing runaway backfills during initial sync.
Pagination support — Both providers now follow pagination tokens (nextPageToken for Gmail, next_cursor for Notion) across multiple pages within budget.
Edit detection (Notion) — Uses composite {page_id}@{edited_time} keys so pages edited after their last sync are re-persisted.

Changes

File	What
`src/openhuman/composio/providers/sync_state.rs`	New — `SyncState`, `DailyBudget`, KV-backed load/save, dedup helpers, per-item persist helper
`src/openhuman/composio/providers/gmail.rs`	Rewritten sync loop: cursor-based date filtering, pagination, per-message persistence
`src/openhuman/composio/providers/notion.rs`	Rewritten sync loop: cursor boundary detection, pagination, per-page persistence with title extraction
`src/openhuman/composio/providers/mod.rs`	Added `pub mod sync_state`

Test plan

cargo check passes
All 33 provider unit tests pass (sync_state, gmail, notion, registry)
Full cargo test suite passes (2464 passed, 1 pre-existing flaky failure unrelated to this PR)
Pre-push hooks pass (format, lint, typecheck, rust check)
Manual test: connect Gmail via OAuth → verify incremental sync persists individual message docs
Manual test: connect Notion via OAuth → verify per-page docs and edit re-sync
Manual test: verify daily budget caps at 500 requests and resets next day

Summary by CodeRabbit

New Features
- Gmail and Notion providers now support incremental syncing, fetching only new and updated items instead of complete snapshots.
- Added pagination support for handling large datasets efficiently.
- Implemented daily request budget tracking to manage API usage limits.
- Duplicate messages and pages are automatically deduplicated during sync.

…mail and Notion providers - Enhanced the Gmail and Notion providers to support incremental synchronization with per-item persistence, improving data handling efficiency. - Introduced a new `SyncState` module to manage persistent sync state, including cursor tracking, synced IDs, and daily request budget management. - Updated sync logic to load state from a KV store, check daily budget limits, and handle paginated API requests, ensuring robust data retrieval and deduplication. - Refactored existing sync methods to utilize the new state management, enhancing overall reliability and performance of the providers. - Improved documentation for the sync process and state management, clarifying the operational flow and usage of the new features.

…otion providers - Enhanced error handling in the Gmail provider to format error messages more clearly during email fetching. - Streamlined debug logging in both Gmail and Notion providers to improve readability by consolidating multiline statements into single lines. - Refactored the `extract_page_title` function in the Notion provider for better clarity in property extraction logic. - Overall, these changes aim to enhance maintainability and improve the clarity of error reporting and logging across the providers.

coderabbitai · 2026-04-13T00:49:23Z

📝 Walkthrough

Walkthrough

Replaces snapshot-based syncing with incremental, stateful synchronization for Gmail and Notion providers. Introduces a new sync_state.rs module managing persistent per-connection state (cursor, deduplication set, daily request budget), pagination logic, and per-item persistence. Adds helper functions for token extraction and filter conversion.

Changes

Cohort / File(s)	Summary
Sync State Foundation `src/openhuman/composio/providers/sync_state.rs`	New module implementing `SyncState` and `DailyBudget` for persistent synchronization tracking, request budgeting, deduplication, and per-item persistence helpers; includes KV storage integration and unit tests.
Incremental Provider Syncing `src/openhuman/composio/providers/gmail.rs`, `src/openhuman/composio/providers/notion.rs`	Refactored sync flows from snapshot-based to incremental pagination with state loading/saving, daily budget enforcement, per-item deduplication and persistence, cursor advancement, and updated reporting; removed batch persistence helpers.
Module Exports `src/openhuman/composio/providers/mod.rs`	Exposed new `sync_state` public module.

Sequence Diagram

sequenceDiagram
    participant Provider as Gmail/Notion<br/>Provider
    participant Memory as Memory Store<br/>(KV)
    participant API as External API<br/>(Gmail/Notion)
    participant Persist as Memory Persistence
    
    Provider->>Memory: Load SyncState<br/>(cursor, synced_ids, budget)
    Memory-->>Provider: Return persisted state
    
    alt Budget Exhausted
        Provider-->>Provider: Early exit with<br/>budget_exhausted
    else Budget Available
        loop Paginate while budget available
            Provider->>API: Fetch items (paginated)<br/>with optional cursor filter
            API-->>Provider: Return page of items
            
            loop Process each item
                Provider->>Provider: Extract ID, compute sync_key
                alt Item already synced
                    Provider->>Provider: Skip (dedup)
                else New/Updated item
                    Provider->>Persist: Persist single item<br/>as memory document
                    Persist-->>Provider: Document persisted
                    Provider->>Provider: Mark in synced_ids,<br/>record request
                end
            end
            
            Provider->>Provider: Advance cursor to<br/>newest item timestamp
        end
    end
    
    Provider->>Memory: Save updated SyncState<br/>(cursor, synced_ids, budget)
    Memory-->>Provider: State persisted
    Provider-->>Provider: Return sync outcome

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

openhuman#509: Introduces provider-based Composio sync architecture that this PR builds upon with incremental state-tracked syncing and per-item persistence patterns applied to Gmail and Notion providers.

Poem

🐰 Hop, hop—the sync state hops along,
With cursor marks and budgets strong,
Each email, page, de-duped with care,
Incremental blessings everywhere! 📬

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title accurately and concisely summarizes the main change: adding incremental sync with per-item persistence for Gmail and Notion providers, which is the primary focus of the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/composeio-sync

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

src/openhuman/composio/providers/sync_state.rs (1)
53-57: Potential unbounded growth of synced_ids set.

For high-volume providers like Gmail, synced_ids could grow to tens of thousands of entries over time. Consider adding a pruning mechanism or a size cap with LRU eviction to prevent excessive memory/storage consumption.

This isn't blocking for the initial implementation, but should be tracked for follow-up.

Would you like me to open an issue to track implementing a pruning strategy for the synced_ids set?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/composio/providers/sync_state.rs` around lines 53 - 57, The
synced_ids HashSet<String> can grow unbounded for high-volume providers; replace
or wrap it with a bounded, eviction-capable structure (e.g., an LRU cache) and
perform insertions via that API instead of direct HashSet inserts so old IDs are
pruned; specifically change the synced_ids field (or add a new field in the same
struct) to use a capacity-limited container (for example lru::LruCache<String,
()> or a custom RingBuffer+HashSet combo), update any methods that modify/access
synced_ids to use the new API (insert/check/evict), and ensure serde
(de)serialization is handled (use a serializable wrapper or convert on
save/load) so persistence semantics remain intact.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/openhuman/composio/providers/sync_state.rs`:
- Around line 53-57: The synced_ids HashSet<String> can grow unbounded for
high-volume providers; replace or wrap it with a bounded, eviction-capable
structure (e.g., an LRU cache) and perform insertions via that API instead of
direct HashSet inserts so old IDs are pruned; specifically change the synced_ids
field (or add a new field in the same struct) to use a capacity-limited
container (for example lru::LruCache<String, ()> or a custom RingBuffer+HashSet
combo), update any methods that modify/access synced_ids to use the new API
(insert/check/evict), and ensure serde (de)serialization is handled (use a
serializable wrapper or convert on save/load) so persistence semantics remain
intact.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f2fe14d9-e289-43bd-bc82-c2debb5bff98

📥 Commits

Reviewing files that changed from the base of the PR and between a115758 and b34388a.

📒 Files selected for processing (4)

src/openhuman/composio/providers/gmail.rs
src/openhuman/composio/providers/mod.rs
src/openhuman/composio/providers/notion.rs
src/openhuman/composio/providers/sync_state.rs

…and Notion (tinyhumansai#519) * feat(composio): implement incremental sync and state management for Gmail and Notion providers - Enhanced the Gmail and Notion providers to support incremental synchronization with per-item persistence, improving data handling efficiency. - Introduced a new `SyncState` module to manage persistent sync state, including cursor tracking, synced IDs, and daily request budget management. - Updated sync logic to load state from a KV store, check daily budget limits, and handle paginated API requests, ensuring robust data retrieval and deduplication. - Refactored existing sync methods to utilize the new state management, enhancing overall reliability and performance of the providers. - Improved documentation for the sync process and state management, clarifying the operational flow and usage of the new features. * refactor(composio): improve error handling and logging in Gmail and Notion providers - Enhanced error handling in the Gmail provider to format error messages more clearly during email fetching. - Streamlined debug logging in both Gmail and Notion providers to improve readability by consolidating multiline statements into single lines. - Refactored the `extract_page_title` function in the Notion provider for better clarity in property extraction logic. - Overall, these changes aim to enhance maintainability and improve the clarity of error reporting and logging across the providers.

senamakel added 2 commits April 12, 2026 17:23

coderabbitai Bot reviewed Apr 13, 2026

View reviewed changes

senamakel merged commit 08178b5 into main Apr 13, 2026
13 of 14 checks passed

This was referenced Apr 13, 2026

feat(composio): provider folder modules + user profile persistence #523

Merged

Refactor: rust modules #633

Merged

senamakel deleted the feat/composeio-sync branch May 2, 2026 13:05

senamakel restored the feat/composeio-sync branch May 2, 2026 13:05

senamakel deleted the feat/composeio-sync branch May 2, 2026 13:05

coderabbitai Bot mentioned this pull request May 11, 2026

perf(composio/gmail): cut redundant fetches on incremental sync (#1404) #1474

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(composio): incremental sync with per-item persistence for Gmail and Notion#519

feat(composio): incremental sync with per-item persistence for Gmail and Notion#519
senamakel merged 2 commits into
mainfrom
feat/composeio-sync

senamakel commented Apr 13, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 13, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

senamakel commented Apr 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

senamakel commented Apr 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 13, 2026 •

edited

Loading