Skip to content

Conversation

@abnegate
Copy link
Member

@abnegate abnegate commented Oct 22, 2025

Summary by CodeRabbit

  • New Features

    • Enhanced CSV export: per-value serialization for arrays/objects and improved header/column selection.
    • Source export accepts additional query fragments to augment row retrieval.
  • Bug Fixes

    • Fixed malformed CSV when fields contained quotes, delimiters, or newlines.
  • Behavior Change

    • CSV escaping now uses the standard double-quote method for improved compatibility.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 22, 2025

Walkthrough

Constructor default escape in src/Migration/Destinations/CSV.php changed from backslash () to double-quote ("). Direct fputcsv calls were replaced by a protected writeCSVLine($handle, array $fields) that implements RFC 4180 escaping and writes via a single fwrite. resourceToCSVData now builds a local $rowData from $resource->getData(), extracts and unsets createdAt/updatedAt, filters columns by iterating $rowData when allowedColumns is set (or uses $rowData when none specified), and converts each value using convertValueToCSV/convertArrayToCSV/convertObjectToCSV. In src/Migration/Sources/Appwrite.php the constructor now promotes a new array $queries and exportRows merges ...$this->queries into the retrieval query.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title Check ❓ Inconclusive The title "Fix array export" is vague and generic, failing to convey meaningful information about the actual changeset. While the PR does involve array handling (a new $queries property in Appwrite and array value serialization methods in CSV), the title doesn't clearly describe what is being fixed or changed. The non-descriptive language doesn't help teammates quickly understand the primary changes when scanning commit history, as it's unclear whether the PR addresses CSV array serialization, query handling, or something else entirely. Consider revising the title to be more specific and descriptive. For example, "Add CSV value serialization helpers and query parameter support" or "Refactor CSV data handling with custom serialization and expandable query arrays" would better capture the core changes. The new title should clearly identify both the CSV serialization improvements and the Appwrite query array enhancements that are the main focus of this changeset.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix-array-export

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9226db2 and f45c403.

📒 Files selected for processing (1)
  • src/Migration/Destinations/CSV.php (7 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/Migration/Destinations/CSV.php (3)
src/Migration/Resource.php (2)
  • Resource (5-236)
  • setStatus (168-174)
src/Migration/Sources/CSV.php (1)
  • delimiter (476-582)
src/Migration/Resources/Database/Row.php (1)
  • getData (94-97)
🔇 Additional comments (5)
src/Migration/Destinations/CSV.php (5)

10-10: LGTM: Clean aliasing of Resource class.

The introduction of UtopiaResource alias and its consistent usage throughout the file improves clarity and avoids potential naming conflicts.

Also applies to: 64-64, 141-141


193-214: LGTM: Solid RFC 4180-compliant CSV writing implementation.

The writeCSVLine method correctly implements RFC 4180 escaping:

  • Fields containing special characters (delimiter, newlines, enclosure) are properly wrapped
  • Enclosure characters within fields are escaped by doubling them
  • Uses efficient single fwrite call per line

The absence of a type hint on $handle correctly avoids the conflict with the imported Resource class that was flagged in previous reviews.


246-273: LGTM: Clean refactoring of CSV data preparation.

The changes improve the code by:

  • Extracting getData() once into $rowData to reduce method calls
  • Properly unsetting timestamp fields from $rowData after extraction to prevent duplication in the output
  • Clearer data flow: extract → process timestamps → filter/merge → convert

The logic correctly handles both filtered (allowedColumns) and unfiltered cases.


301-310: LGTM: Appropriate array serialization strategy.

The method correctly handles different array types:

  • Empty arrays → empty string
  • Relational references (arrays with $id) → ID value only
  • Complex arrays → JSON-encoded

This approach addresses the array delimiter ambiguity concerns from previous reviews.


281-296: LGTM: Well-structured value conversion logic.

The convertValueToCSV and convertObjectToCSV methods provide clean type-specific handling:

  • Explicit null/boolean to string conversion for CSV clarity
  • Row objects correctly extract ID to maintain relational references
  • Fallback to JSON encoding for complex objects prevents data loss

The implementation handles edge cases appropriately.

Also applies to: 315-321


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/Migration/Destinations/CSV.php (1)

1-1: Address PSR-12 linting violations.

The pipeline reports PSR-12 violations. Please run the linter locally and fix the formatting issues before merging.

Run the following command to check and auto-fix PSR-12 violations:

#!/bin/bash
# Description: Run Pint to identify and fix PSR-12 violations

# Check for Pint violations
vendor/bin/pint --test src/Migration/Destinations/CSV.php

# Auto-fix violations
vendor/bin/pint src/Migration/Destinations/CSV.php
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b6985b2 and abafba0.

📒 Files selected for processing (1)
  • src/Migration/Destinations/CSV.php (3 hunks)
🧰 Additional context used
🪛 GitHub Actions: Linter
src/Migration/Destinations/CSV.php

[error] 1-1: PSR-12 violation detected by Pint: function_declaration, no_whitespace_in...

🔇 Additional comments (4)
src/Migration/Destinations/CSV.php (4)

40-40: Good: RFC 4180 compliant escape character.

The change from backslash to double-quote as the default escape character aligns with RFC 4180 CSV standards and is consistent with the quote-doubling escape logic added throughout the file.

Note: This is a breaking change to the constructor signature that may affect existing callers.


297-298: LGTM: JSON encoding with proper CSV escaping.

The JSON encoding with quote-doubling for complex arrays is appropriate for CSV format.


309-311: LGTM: Consistent quote escaping for JSON-encoded objects.

The quote-doubling escape logic ensures JSON-encoded objects are properly formatted for CSV output, consistent with the array handling.


279-281: $id is a reserved field—the short-circuit is correct.

The search results confirm $id is not an arbitrary array key. It's explicitly defined as a reserved internal field in the codebase (see src/Migration/Sources/CSV.php:409 where it's listed alongside $permissions, $createdAt, and $updatedAt). This field is consistently used across all Appwrite resources (databases, tables, users, files, functions, etc.) to represent resource identifiers. The source code properly separates this field from user data before processing, making the short-circuit behavior in lines 279-281 both intentional and safe.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/Migration/Destinations/CSV.php (1)

296-305: Add explicit false check or use JSON_THROW_ON_ERROR in convertArrayToCSV() and convertObjectToCSV()

Both methods at lines 296–305 and 310–316 declare : string but call json_encode() which can return false (on invalid UTF-8, circular references, or depth limit), causing a runtime TypeError. The suggested casting approach silently converts failures to empty strings. A more explicit approach would be:

-        return \json_encode($value);
+        $encoded = \json_encode($value);
+        if ($encoded === false) {
+            throw new \JsonException('Failed to encode value');
+        }
+        return $encoded;

Alternatively, use JSON_THROW_ON_ERROR flag (available since PHP 7.3) to throw immediately. Decide on error strategy consistently: the import() method docblock declares @throws \JsonException, but these conversion methods lack error handling that would trigger such exceptions. Either adopt throwing (and update docblocks), or drop the @throws \JsonException annotation from import().

🧹 Nitpick comments (2)
src/Migration/Destinations/CSV.php (2)

39-42: Unused $escape param/property; default change introduces noisy API surface

$escape is no longer used (manual RFC4180 quoting). Keeping it—and changing its default—adds BC noise without effect. Either remove it or mark it deprecated to avoid confusion.

Apply one of:

Option A — remove param (breaking for callers passing it explicitly):

-        private readonly string $delimiter = ',',
-        private readonly string $enclosure = '"',
-        private readonly string $escape = '"',
+        private readonly string $delimiter = ',',
+        private readonly string $enclosure = '"',

Option B — keep but deprecate (non‑breaking now, removable later):

     public function __construct(
@@
-        private readonly string $escape = '"',
+        /** @deprecated Unused since manual CSV writing; will be removed in a future release. */
+        private readonly string $escape = '"',

85-101: Batch write optimization (optional): format then single fwrite per flush

You currently call writeCSVLine per row (one syscall per line). For large exports, join formatted lines into one buffer and write once per flush for fewer syscalls.

Sketch:

-                foreach ($buffer['lines'] as $line) {
-                    if (!$this->writeCSVLine($handle, $line)) {
-                        throw new \Exception("Failed to write CSV line to file: $log");
-                    }
-                }
+                $chunk = '';
+                foreach ($buffer['lines'] as $line) {
+                    $chunk .= $this->formatCSVLine($line); // new helper: returns string
+                }
+                if (!self::writeAll($handle, $chunk)) { // reuse full-write loop
+                    throw new \Exception("Failed to write CSV chunk to file: $log");
+                }

Where formatCSVLine(array $fields): string shares the quoting logic, and writeAll() is the full-write helper from writeCSVLine. Keeps logic DRY and minimizes syscalls.

Also applies to: 137-151

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between abafba0 and 4b72f60.

📒 Files selected for processing (1)
  • src/Migration/Destinations/CSV.php (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/Migration/Destinations/CSV.php (1)
src/Migration/Sources/CSV.php (1)
  • delimiter (476-582)
🔇 Additional comments (2)
src/Migration/Destinations/CSV.php (2)

296-305: Arrays now JSON-encoded — resolves prior delimiter ambiguity

Switching arrays to JSON within a quoted CSV field avoids the “comma-inside-cell” ambiguity raised earlier. Good change.


195-216: Current implementation is correct; the concerns don't apply to this use case.

The fwrite partial write issue cited applies primarily to network streams. This code opens a local file in append mode ('a'), where PHP guarantees atomic writes. Comprehensive existing tests—particularly testCSVExportImportCompatibility—parse the exported CSV output completely with fgetcsv(), reconstructing full records without truncation, confirming the current implementation works correctly.

Regarding RFC 4180 compliance: while the code uses "\n" instead of "\r\n", RFC 4180 explicitly allows implementations to deviate from CRLF. The tests verify the CSV parses successfully with PHP's standard functions, confirming compatibility.

No code changes are necessary.

Likely an incorrect or invalid review comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/Migration/Destinations/CSV.php (1)

301-321: json_encode may return false → TypeError against declared string return; add robust handling.

convertArrayToCSV() and convertObjectToCSV() declare string but return json_encode($value), which can be false (invalid UTF‑8, recursion, depth). That triggers a TypeError. Also, failures are silent.

Harden with JSON_THROW_ON_ERROR and a safe fallback:

     protected function convertArrayToCSV(array $value): string
     {
         if (empty($value)) {
             return '';
         }
         if (isset($value['$id'])) {
             return $value['$id'];
         }
-        return \json_encode($value);
+        try {
+            return \json_encode($value, \JSON_UNESCAPED_UNICODE | \JSON_UNESCAPED_SLASHES | \JSON_THROW_ON_ERROR);
+        } catch (\JsonException $e) {
+            Console::error('JSON encode (array) failed: ' . $e->getMessage());
+            return '';
+        }
     }

     protected function convertObjectToCSV($value): string
     {
         if ($value instanceof Row) {
             return $value->getId();
         }
-        return \json_encode($value);
+        try {
+            return \json_encode($value, \JSON_UNESCAPED_UNICODE | \JSON_UNESCAPED_SLASHES | \JSON_THROW_ON_ERROR);
+        } catch (\JsonException $e) {
+            Console::error('JSON encode (object) failed: ' . $e->getMessage());
+            return '';
+        }
     }
🧹 Nitpick comments (4)
src/Migration/Destinations/CSV.php (4)

40-40: $escape default changed and appears unused; consider deprecating or removing.

The constructor’s $escape now defaults to ", but it isn’t referenced anywhere (custom writer ignores it). Keeping an unused param risks confusion and is a silent BC change in defaults. Either wire it into the writer, or deprecate it (docblock + release note) and plan removal in the next major. Please confirm intended path.


97-101: Switch to writeCSVLine is good; ensure short/partial writes are detected.

The check fwrite(...) !== false treats 0/short writes as success. After adopting the custom writer, prefer verifying the exact byte count (see next comment’s diff).


195-216: Strengthen CSV write: detect partial writes and use CRLF for RFC 4180.

  • Detect short writes by comparing bytes written vs line length.
  • RFC 4180 recommends CRLF line endings; currently using LF only.

Apply:

-        return \fwrite($handle, \implode($this->delimiter, $parts) . "\n") !== false;
+        $line = \implode($this->delimiter, $parts) . "\r\n";
+        $written = \fwrite($handle, $line);
+        return $written === \strlen($line);

248-269: Prevent system fields from being overridden; clarify allowedColumns behavior.

  • $rowData can contain $id/$permissions. array_merge lets $rowData override values set from the Resource.
  • When allowedColumns is set, $id and $permissions are still always included; confirm if that’s intended.

Suggested changes:

         // Remove internal fields from rowData before merging to avoid duplication
-        unset($rowData['$createdAt'], $rowData['$updatedAt']);
+        unset($rowData['$createdAt'], $rowData['$updatedAt'], $rowData['$id'], $rowData['$permissions']);

         // Add all attributes if no filter specified, otherwise only allowed ones
         if (empty($this->allowedColumns)) {
-            $data = \array_merge($data, $rowData);
+            // Preserve left-hand (system) fields if duplicated in $rowData
+            $data = $data + $rowData;
         } else {
             foreach ($rowData as $key => $value) {
                 if (isset($this->allowedColumns[$key])) {
                     $data[$key] = $value;
                 }
             }
         }

If the intention is for allowedColumns to be authoritative for all fields (including system ones), we can gate $id/$permissions/$createdAt/$updatedAt behind the allow‑list as well. Confirm preference.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4b72f60 and 526c3c2.

📒 Files selected for processing (1)
  • src/Migration/Destinations/CSV.php (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/Migration/Destinations/CSV.php (3)
src/Migration/Sources/CSV.php (1)
  • delimiter (476-582)
src/Migration/Resources/Database/Row.php (1)
  • getData (94-97)
src/Migration/Resource.php (2)
  • getId (127-130)
  • getPermissions (191-194)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/Migration/Sources/Appwrite.php (1)

75-76: Document the expected structure of the $queries parameter.

The $queries parameter accepts any array but is later spread into query arrays that expect Query objects. Consider adding a PHPDoc type hint or runtime validation to ensure type safety.

Apply this diff to add type documentation:

+    /**
+     * @param array<Query|mixed> $queries Additional query constraints to apply during row export
+     */
     public function __construct(
         protected string $project,
         protected string $endpoint,
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 526c3c2 and 60a3a6f.

📒 Files selected for processing (2)
  • src/Migration/Destinations/CSV.php (5 hunks)
  • src/Migration/Sources/Appwrite.php (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/Migration/Destinations/CSV.php (3)
src/Migration/Sources/CSV.php (1)
  • delimiter (476-582)
src/Migration/Resources/Database/Row.php (1)
  • getData (94-97)
src/Migration/Resource.php (2)
  • getId (127-130)
  • getPermissions (191-194)
src/Migration/Sources/Appwrite.php (3)
src/Migration/Sources/Appwrite/Reader/API.php (1)
  • queryLimit (239-242)
src/Migration/Sources/Appwrite/Reader/Database.php (1)
  • queryLimit (411-414)
src/Migration/Sources/Appwrite/Reader.php (1)
  • queryLimit (108-108)
🪛 GitHub Actions: CodeQL
src/Migration/Destinations/CSV.php

[error] 6-6: Cannot use Utopia\Console as Console because the name is already in use. (PHPStan error) [Command: ./vendor/bin/phpstan analyse --level 3 src tests --memory-limit 2G]

🪛 GitHub Actions: Tests
src/Migration/Destinations/CSV.php

[error] 6-6: Fatal error: Cannot use Utopia\Console as Console because the name is already in use.

🔇 Additional comments (4)
src/Migration/Destinations/CSV.php (3)

194-215: LGTM: RFC 4180 compliant escaping implementation.

The custom writeCSVLine method correctly implements RFC 4180 double-quote escaping with efficient field detection and a single write operation.


247-259: LGTM: Clearer data extraction logic.

Extracting $rowData once and removing timestamp fields before merging prevents duplication and improves readability.


279-322: LGTM: Standardized value conversion helpers.

The new conversion helpers provide consistent serialization, addressing the previous comma-ambiguity concern by using JSON encoding for complex arrays while handling document references and Row objects appropriately.

src/Migration/Sources/Appwrite.php (1)

1104-1107: Remove this review comment—the concern is unfounded.

The $this->queries property is initialized to an empty array in the constructor and is never assigned anywhere in the codebase. The Appwrite instantiation at bin/MigrationCLI.php:213 omits the $queries parameter, so it always defaults to []. The spread operator produces no effect, and no conflicting queries can be introduced.

Even theoretically, Appwrite Query semantics treat multiple queries additively (AND operations), where filter queries and pagination queries coexist without override behavior. The codebase consistently uses this pattern successfully throughout (e.g., combining Query::limit() and Query::cursorAfter() in multiple methods).

Likely an incorrect or invalid review comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/Migration/Destinations/CSV.php (1)

40-40: The $escape parameter remains unused.

This issue was previously flagged: the constructor parameter $escape is never referenced in the code. The writeCSVLine method (lines 200-214) hardcodes escaping using $this->enclosure instead.

Apply this diff to remove the unused parameter:

         array $allowedColumns = [],
         private readonly string $delimiter = ',',
         private readonly string $enclosure = '"',
-        private readonly string $escape = '"',
         private readonly bool $includeHeaders = true,
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 60a3a6f and 9226db2.

⛔ Files ignored due to path filters (1)
  • composer.lock is excluded by !**/*.lock
📒 Files selected for processing (1)
  • src/Migration/Destinations/CSV.php (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/Migration/Destinations/CSV.php (3)
src/Migration/Sources/CSV.php (1)
  • delimiter (476-582)
src/Migration/Resources/Database/Row.php (1)
  • getData (94-97)
src/Migration/Resource.php (2)
  • getId (127-130)
  • getPermissions (191-194)
🪛 GitHub Actions: Tests
src/Migration/Destinations/CSV.php

[error] 98-98: TypeError: CSV::writeCSVLine(): Argument #1 ($handle) must be of type Utopia\Migration\Resource, resource given.


[error] 98-98: TypeError: CSV::writeCSVLine(): Argument #1 ($handle) must be of type Utopia\Migration\Resource, resource given.


[error] 149-149: TypeError: CSV::writeCSVLine(): Argument #1 ($handle) must be of type Utopia\Migration\Resource, resource given.


[error] 149-149: TypeError: CSV::writeCSVLine(): Argument #1 ($handle) must be of type Utopia\Migration\Resource, resource given.


[error] 200-200: TypeError: CSV::writeCSVLine(): Argument #1 ($handle) must be of type Utopia\Migration\Resource, resource given.


[error] 200-200: TypeError: CSV::writeCSVLine(): Argument #1 ($handle) must be of type Utopia\Migration\Resource, resource given.

🔇 Additional comments (2)
src/Migration/Destinations/CSV.php (2)

244-276: LGTM: Resource data extraction logic is correct.

The method properly:

  • Extracts system fields from both Resource methods and data array
  • Removes duplicate timestamps from rowData before merging
  • Applies column filtering correctly for both filtered and unfiltered cases
  • Converts all values consistently using the helper methods

298-310: LGTM: Array serialization strategy is clean.

The simplified approach avoids parsing ambiguity by consistently using JSON encoding for complex arrays while handling special cases (empty arrays, nested documents with $id) appropriately.

@abnegate abnegate merged commit f5c1d2c into main Oct 22, 2025
4 checks passed
@abnegate abnegate deleted the fix-array-export branch October 22, 2025 12:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants