Skip to content

Conversation

@abnegate
Copy link
Member

@abnegate abnegate commented Jul 22, 2025

Reverts #89

Summary by CodeRabbit

  • Improvements
    • Streamlined and simplified resource reporting for authentication, storage, and functions, resulting in more accurate counts and reduced complexity.
    • Enhanced reporting of database tables, columns, and indexes by switching to direct API calls for more precise data.
    • Adjusted data fetching to improve reliability and consistency across resources.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 22, 2025

Walkthrough

The changes refactor internal logic for resource reporting and data fetching in the Appwrite migration source. Pagination and caching are removed in favor of direct API calls with default limits. Resource counts, such as users, teams, buckets, files, functions, and database tables, are now retrieved through simplified queries and explicit per-item API calls where needed.

Changes

File(s) Change Summary
src/Migration/Sources/Appwrite.php Simplified resource reporting for auth, storage, and functions by removing pagination/caching and using direct API calls; adjusted logic for counting files, file sizes, and environment variables.
src/Migration/Sources/Appwrite/Reader/API.php Refactored table/resource counting: removed pagination, now uses single API calls and explicit per-table attribute/index queries.

Sequence Diagram(s)

sequenceDiagram
    participant MigrationSource
    participant AppwriteAPI

    MigrationSource->>AppwriteAPI: listUsers()
    MigrationSource->>AppwriteAPI: listTeams()
    loop for each team
        MigrationSource->>AppwriteAPI: listMemberships(teamId)
    end
    MigrationSource->>AppwriteAPI: listBuckets()
    loop for each bucket (paginated)
        MigrationSource->>AppwriteAPI: listFiles(bucketId, limit)
        loop for each file
            MigrationSource->>AppwriteAPI: getFile(bucketId, fileId)
        end
    end
    MigrationSource->>AppwriteAPI: listFunctions()
    loop for each function
        MigrationSource->>AppwriteAPI: listDeployments(functionId)
        MigrationSource->>AppwriteAPI: listVariables(functionId)
    end
    MigrationSource->>AppwriteAPI: listDatabases()
    loop for each database
        MigrationSource->>AppwriteAPI: listCollections(databaseId)
        loop for each collection
            MigrationSource->>AppwriteAPI: listDocuments(databaseId, collectionId)
            MigrationSource->>AppwriteAPI: listAttributes(databaseId, collectionId)
            MigrationSource->>AppwriteAPI: listIndexes(databaseId, collectionId)
        end
    end
Loading

Estimated code review effort

3 (~45 minutes)

Possibly related PRs

  • Optimize reports fetching #89: Introduces pagination and caching optimizations in the same resource reporting methods that this PR now simplifies by removing such logic.

Poem

In the warren of code, we hop and we bound,
Old loops and caches no longer are found.
With simpler calls and logic anew,
We count every file, each table, each queue.
The migration is lighter, the journey more bright—
A rabbit’s delight in the refactored night! 🐇✨

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🔭 Outside diff range comments (1)
src/Migration/Sources/Appwrite.php (1)

212-218: Performance concern: Fetching all teams to count memberships

This approach fetches all teams first, then makes individual API calls for each team to count memberships. This is inefficient and won't scale with many teams.

Consider implementing pagination for both teams and memberships to handle large datasets efficiently.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b3c51e3 and a2203ae.

📒 Files selected for processing (2)
  • src/Migration/Sources/Appwrite.php (5 hunks)
  • src/Migration/Sources/Appwrite/Reader/API.php (1 hunks)
🧠 Learnings (3)
📓 Common learnings
Learnt from: ItzNotABug
PR: utopia-php/migration#80
File: src/Migration/Sources/Appwrite/Reader/API.php:8-8
Timestamp: 2025-06-28T09:47:11.436Z
Learning: In the Appwrite migration codebase, commented-out Tables service references (import statements and constructor parameters) are intentionally kept for future implementation when the Tables service becomes available in the Appwrite SDK, rather than being dead code that should be removed.
Learnt from: ItzNotABug
PR: utopia-php/migration#89
File: src/Migration/Sources/Appwrite/Reader/API.php:64-84
Timestamp: 2025-07-19T08:29:22.290Z
Learning: In the Appwrite API, the default page limit for listing collections is 25 records, so when using cursor-based pagination with Query::cursorAfter(), there's no need to explicitly specify Query::limit(25) as the API will default to this limit.
Learnt from: ItzNotABug
PR: utopia-php/migration#80
File: src/Migration/Sources/Supabase.php:300-308
Timestamp: 2025-06-28T09:47:58.757Z
Learning: In the utopia-php/migration codebase, during the terminology swap from Collection/Attribute/Document to Table/Column/Row, the user ItzNotABug prefers to keep the existing query logic unchanged even if it becomes semantically incorrect with the new naming. The focus is purely on resource type renaming, not on fixing logical issues that become apparent after the terminology change.
Learnt from: ItzNotABug
PR: utopia-php/migration#80
File: src/Migration/Sources/Appwrite.php:843-851
Timestamp: 2025-06-28T09:47:08.333Z
Learning: In the utopia-php/migration codebase, during the terminology swap from Collection/Attribute/Document to Table/Column/Row, the class constructors and method parameters use the new terminology (like "relatedTable"), but the underlying data structures and API responses still use the legacy keys (like "relatedCollection"). This is an intentional design pattern to allow gradual migration while maintaining compatibility with existing data sources.
src/Migration/Sources/Appwrite.php (3)

Learnt from: ItzNotABug
PR: #80
File: src/Migration/Sources/Supabase.php:300-308
Timestamp: 2025-06-28T09:47:58.757Z
Learning: In the utopia-php/migration codebase, during the terminology swap from Collection/Attribute/Document to Table/Column/Row, the user ItzNotABug prefers to keep the existing query logic unchanged even if it becomes semantically incorrect with the new naming. The focus is purely on resource type renaming, not on fixing logical issues that become apparent after the terminology change.

Learnt from: ItzNotABug
PR: #80
File: src/Migration/Sources/Appwrite/Reader/API.php:8-8
Timestamp: 2025-06-28T09:47:11.436Z
Learning: In the Appwrite migration codebase, commented-out Tables service references (import statements and constructor parameters) are intentionally kept for future implementation when the Tables service becomes available in the Appwrite SDK, rather than being dead code that should be removed.

Learnt from: ItzNotABug
PR: #80
File: src/Migration/Sources/Appwrite.php:843-851
Timestamp: 2025-06-28T09:47:08.333Z
Learning: In the utopia-php/migration codebase, during the terminology swap from Collection/Attribute/Document to Table/Column/Row, the class constructors and method parameters use the new terminology (like "relatedTable"), but the underlying data structures and API responses still use the legacy keys (like "relatedCollection"). This is an intentional design pattern to allow gradual migration while maintaining compatibility with existing data sources.

src/Migration/Sources/Appwrite/Reader/API.php (5)

Learnt from: ItzNotABug
PR: #80
File: src/Migration/Sources/Appwrite.php:843-851
Timestamp: 2025-06-28T09:47:08.333Z
Learning: In the utopia-php/migration codebase, during the terminology swap from Collection/Attribute/Document to Table/Column/Row, the class constructors and method parameters use the new terminology (like "relatedTable"), but the underlying data structures and API responses still use the legacy keys (like "relatedCollection"). This is an intentional design pattern to allow gradual migration while maintaining compatibility with existing data sources.

Learnt from: ItzNotABug
PR: #80
File: src/Migration/Sources/Supabase.php:300-308
Timestamp: 2025-06-28T09:47:58.757Z
Learning: In the utopia-php/migration codebase, during the terminology swap from Collection/Attribute/Document to Table/Column/Row, the user ItzNotABug prefers to keep the existing query logic unchanged even if it becomes semantically incorrect with the new naming. The focus is purely on resource type renaming, not on fixing logical issues that become apparent after the terminology change.

Learnt from: ItzNotABug
PR: #80
File: src/Migration/Sources/Appwrite/Reader/API.php:8-8
Timestamp: 2025-06-28T09:47:11.436Z
Learning: In the Appwrite migration codebase, commented-out Tables service references (import statements and constructor parameters) are intentionally kept for future implementation when the Tables service becomes available in the Appwrite SDK, rather than being dead code that should be removed.

Learnt from: ItzNotABug
PR: #89
File: src/Migration/Sources/Appwrite/Reader/API.php:64-84
Timestamp: 2025-07-19T08:29:22.290Z
Learning: In the Appwrite API, the default page limit for listing collections is 25 records, so when using cursor-based pagination with Query::cursorAfter(), there's no need to explicitly specify Query::limit(25) as the API will default to this limit.

Learnt from: ItzNotABug
PR: #80
File: src/Migration/Resources/Database/Columns/Relationship.php:86-89
Timestamp: 2025-06-28T09:45:57.650Z
Learning: In the utopia-php/migration codebase Relationship column class, the getRelatedTable() method intentionally returns $this->options['relatedCollection'] (not relatedTable) because the underlying API still uses "collection" terminology, even though the internal codebase has been refactored to use "table" terminology.

🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: ItzNotABug
PR: utopia-php/migration#80
File: src/Migration/Sources/Appwrite/Reader/API.php:8-8
Timestamp: 2025-06-28T09:47:11.436Z
Learning: In the Appwrite migration codebase, commented-out Tables service references (import statements and constructor parameters) are intentionally kept for future implementation when the Tables service becomes available in the Appwrite SDK, rather than being dead code that should be removed.
Learnt from: ItzNotABug
PR: utopia-php/migration#89
File: src/Migration/Sources/Appwrite/Reader/API.php:64-84
Timestamp: 2025-07-19T08:29:22.290Z
Learning: In the Appwrite API, the default page limit for listing collections is 25 records, so when using cursor-based pagination with Query::cursorAfter(), there's no need to explicitly specify Query::limit(25) as the API will default to this limit.
Learnt from: ItzNotABug
PR: utopia-php/migration#80
File: src/Migration/Sources/Supabase.php:300-308
Timestamp: 2025-06-28T09:47:58.757Z
Learning: In the utopia-php/migration codebase, during the terminology swap from Collection/Attribute/Document to Table/Column/Row, the user ItzNotABug prefers to keep the existing query logic unchanged even if it becomes semantically incorrect with the new naming. The focus is purely on resource type renaming, not on fixing logical issues that become apparent after the terminology change.
Learnt from: ItzNotABug
PR: utopia-php/migration#80
File: src/Migration/Sources/Appwrite.php:843-851
Timestamp: 2025-06-28T09:47:08.333Z
Learning: In the utopia-php/migration codebase, during the terminology swap from Collection/Attribute/Document to Table/Column/Row, the class constructors and method parameters use the new terminology (like "relatedTable"), but the underlying data structures and API responses still use the legacy keys (like "relatedCollection"). This is an intentional design pattern to allow gradual migration while maintaining compatibility with existing data sources.
src/Migration/Sources/Appwrite.php (3)

Learnt from: ItzNotABug
PR: #80
File: src/Migration/Sources/Supabase.php:300-308
Timestamp: 2025-06-28T09:47:58.757Z
Learning: In the utopia-php/migration codebase, during the terminology swap from Collection/Attribute/Document to Table/Column/Row, the user ItzNotABug prefers to keep the existing query logic unchanged even if it becomes semantically incorrect with the new naming. The focus is purely on resource type renaming, not on fixing logical issues that become apparent after the terminology change.

Learnt from: ItzNotABug
PR: #80
File: src/Migration/Sources/Appwrite/Reader/API.php:8-8
Timestamp: 2025-06-28T09:47:11.436Z
Learning: In the Appwrite migration codebase, commented-out Tables service references (import statements and constructor parameters) are intentionally kept for future implementation when the Tables service becomes available in the Appwrite SDK, rather than being dead code that should be removed.

Learnt from: ItzNotABug
PR: #80
File: src/Migration/Sources/Appwrite.php:843-851
Timestamp: 2025-06-28T09:47:08.333Z
Learning: In the utopia-php/migration codebase, during the terminology swap from Collection/Attribute/Document to Table/Column/Row, the class constructors and method parameters use the new terminology (like "relatedTable"), but the underlying data structures and API responses still use the legacy keys (like "relatedCollection"). This is an intentional design pattern to allow gradual migration while maintaining compatibility with existing data sources.

src/Migration/Sources/Appwrite/Reader/API.php (5)

Learnt from: ItzNotABug
PR: #80
File: src/Migration/Sources/Appwrite.php:843-851
Timestamp: 2025-06-28T09:47:08.333Z
Learning: In the utopia-php/migration codebase, during the terminology swap from Collection/Attribute/Document to Table/Column/Row, the class constructors and method parameters use the new terminology (like "relatedTable"), but the underlying data structures and API responses still use the legacy keys (like "relatedCollection"). This is an intentional design pattern to allow gradual migration while maintaining compatibility with existing data sources.

Learnt from: ItzNotABug
PR: #80
File: src/Migration/Sources/Supabase.php:300-308
Timestamp: 2025-06-28T09:47:58.757Z
Learning: In the utopia-php/migration codebase, during the terminology swap from Collection/Attribute/Document to Table/Column/Row, the user ItzNotABug prefers to keep the existing query logic unchanged even if it becomes semantically incorrect with the new naming. The focus is purely on resource type renaming, not on fixing logical issues that become apparent after the terminology change.

Learnt from: ItzNotABug
PR: #80
File: src/Migration/Sources/Appwrite/Reader/API.php:8-8
Timestamp: 2025-06-28T09:47:11.436Z
Learning: In the Appwrite migration codebase, commented-out Tables service references (import statements and constructor parameters) are intentionally kept for future implementation when the Tables service becomes available in the Appwrite SDK, rather than being dead code that should be removed.

Learnt from: ItzNotABug
PR: #89
File: src/Migration/Sources/Appwrite/Reader/API.php:64-84
Timestamp: 2025-07-19T08:29:22.290Z
Learning: In the Appwrite API, the default page limit for listing collections is 25 records, so when using cursor-based pagination with Query::cursorAfter(), there's no need to explicitly specify Query::limit(25) as the API will default to this limit.

Learnt from: ItzNotABug
PR: #80
File: src/Migration/Resources/Database/Columns/Relationship.php:86-89
Timestamp: 2025-06-28T09:45:57.650Z
Learning: In the utopia-php/migration codebase Relationship column class, the getRelatedTable() method intentionally returns $this->options['relatedCollection'] (not relatedTable) because the underlying API still uses "collection" terminology, even though the internal codebase has been refactored to use "table" terminology.

🔇 Additional comments (1)
src/Migration/Sources/Appwrite.php (1)

1300-1300: Please confirm the correct field name for function deployments

I wasn’t able to find any other references to either deployment or deploymentId in the SDK or service definitions—this migration file is the only place that uses deploymentId. Before merging, double-check the Appwrite Functions API response to ensure it truly returns a deploymentId key (and not deployment). If the API still uses deployment, update this line back to:

    $function['deployment'],

so the migration won’t break at runtime.

$teamList = $this->teams->list([Query::limit(1)]);
$teams = ['total' => $teamList['total'], 'teams' => []];
}
$report[Resource::TYPE_USER] = $this->users->list()['total'];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Critical: Removing pagination for users will fail with large user bases

This change removes pagination when fetching users. If there are thousands or millions of users, this single API call will likely timeout or exceed memory limits.

Consider keeping the pagination logic or at least document the limitations of this approach.

🤖 Prompt for AI Agents
In src/Migration/Sources/Appwrite.php at line 203, the code fetches the total
user count without pagination, which can cause timeouts or memory issues with
large user bases. To fix this, restore the pagination logic to fetch users in
manageable batches or, if only the total count is needed, use an API method that
returns just the user count without retrieving all user data. Alternatively, add
documentation clearly stating the limitations of this approach when handling
large datasets.

$lastBucket
? [Query::cursorAfter($lastBucket)]
: [Query::limit($pageLimit)]
: [Query::limit(20)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Hardcoded limit of 20 seems arbitrary

The pagination limit is hardcoded to 20, which is smaller than the Appwrite API default of 25. This will result in more API calls than necessary.

Consider using the API's default limit or making this configurable:

-                        : [Query::limit(20)]
+                        : [Query::limit(25)]

Also applies to: 272-272

🤖 Prompt for AI Agents
In src/Migration/Sources/Appwrite.php at lines 252 and 272, the pagination limit
is hardcoded to 20, which is below the Appwrite API default of 25 and causes
unnecessary extra API calls. Update the code to use the API's default limit of
25 instead of 20, or better, make the limit configurable via a variable or
constant so it can be adjusted without code changes.

Comment on lines +284 to +289
foreach ($files as $file) {
$report['size'] += $this->storage->getFile(
$bucket['$id'],
$file['$id']
)['sizeOriginal'];
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Severe performance issue: Individual API calls for each file

Making a separate API call to get the size of each file is extremely inefficient. For buckets with thousands of files, this will result in thousands of API calls and likely hit rate limits.

This approach will not scale. Consider:

  1. Using the file size from the list response if available
  2. Implementing batch operations
  3. Adding progress tracking and error handling for large datasets
🤖 Prompt for AI Agents
In src/Migration/Sources/Appwrite.php around lines 284 to 289, the current code
makes an individual API call for each file to get its size, causing severe
performance issues and potential rate limiting. To fix this, modify the code to
use the file size information directly from the initial list response if it
includes size data, or implement batch API calls to retrieve sizes for multiple
files at once. Additionally, add progress tracking and error handling to manage
large datasets efficiently and avoid failures during migration.

Comment on lines +64 to 70
/* $tablesResponse = $this->tables->list(...); */
$tablesResponse = $this->database->listCollections($databaseId);
$tables = $tablesResponse['collections'];

if (Resource::isSupported(Resource::TYPE_TABLE, $resources)) {
$report[Resource::TYPE_TABLE] += count($tables);
$report[Resource::TYPE_TABLE] += $tablesResponse['total'];
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Performance concern: Removing pagination may cause issues with large datasets

The removal of pagination for table fetching could potentially cause memory issues or API timeouts when dealing with databases that have many tables. The previous implementation used cursor-based pagination which is more scalable.

Without pagination, if a database has hundreds or thousands of tables, this single API call might fail or consume excessive memory.

🤖 Prompt for AI Agents
In src/Migration/Sources/Appwrite/Reader/API.php around lines 64 to 70, the
current code fetches all tables without pagination, which can cause memory or
timeout issues with large datasets. To fix this, reintroduce cursor-based
pagination by repeatedly calling listCollections with a cursor parameter until
all tables are retrieved, accumulating results incrementally instead of fetching
all at once.

Comment on lines +86 to +96
if (Resource::isSupported(Resource::TYPE_COLUMN, $resources)) {
/* $columnsResponse = $this->tables->listColumns(...); */
$columnsResponse = $this->database->listAttributes($databaseId, $tableId);
$report[Resource::TYPE_COLUMN] += $columnsResponse['total'];
}

if (in_array(Resource::TYPE_INDEX, $resources)) {
/* $indexesResponse = $this->tables->listIndexes(...); */
$indexesResponse = $this->database->listIndexes($databaseId, $tableId);
$report[Resource::TYPE_INDEX] += $indexesResponse['total'];
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Performance impact: Multiple API calls instead of using embedded metadata

This change shifts from using embedded table metadata to making separate API calls for each table to get column and index counts. For a database with N tables, this results in 2N additional API calls (one for columns, one for indexes per table).

This could significantly impact performance and API rate limits when dealing with many tables.

Consider implementing a caching mechanism or batch API calls if the embedded metadata approach had accuracy issues that necessitated this change.

🤖 Prompt for AI Agents
In src/Migration/Sources/Appwrite/Reader/API.php around lines 86 to 96, the
current code makes separate API calls for each table to fetch column and index
counts, causing 2N additional calls for N tables and impacting performance. To
fix this, revert to using embedded table metadata for columns and indexes counts
if it was accurate, or implement a caching mechanism or batch API calls to
reduce the number of requests. This will minimize API calls and improve
efficiency when processing many tables.

@abnegate abnegate closed this Jul 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants