-
Notifications
You must be signed in to change notification settings - Fork 3
Fix array export #119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix array export #119
Conversation
WalkthroughConstructor default escape in src/Migration/Destinations/CSV.php changed from backslash () to double-quote ("). Direct fputcsv calls were replaced by a protected writeCSVLine($handle, array $fields) that implements RFC 4180 escaping and writes via a single fwrite. resourceToCSVData now builds a local $rowData from $resource->getData(), extracts and unsets createdAt/updatedAt, filters columns by iterating $rowData when allowedColumns is set (or uses $rowData when none specified), and converts each value using convertValueToCSV/convertArrayToCSV/convertObjectToCSV. In src/Migration/Sources/Appwrite.php the constructor now promotes a new array $queries and exportRows merges ...$this->queries into the retrieval query. Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧬 Code graph analysis (1)src/Migration/Destinations/CSV.php (3)
🔇 Additional comments (5)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/Migration/Destinations/CSV.php (1)
1-1: Address PSR-12 linting violations.The pipeline reports PSR-12 violations. Please run the linter locally and fix the formatting issues before merging.
Run the following command to check and auto-fix PSR-12 violations:
#!/bin/bash # Description: Run Pint to identify and fix PSR-12 violations # Check for Pint violations vendor/bin/pint --test src/Migration/Destinations/CSV.php # Auto-fix violations vendor/bin/pint src/Migration/Destinations/CSV.php
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/Migration/Destinations/CSV.php(3 hunks)
🧰 Additional context used
🪛 GitHub Actions: Linter
src/Migration/Destinations/CSV.php
[error] 1-1: PSR-12 violation detected by Pint: function_declaration, no_whitespace_in...
🔇 Additional comments (4)
src/Migration/Destinations/CSV.php (4)
40-40: Good: RFC 4180 compliant escape character.The change from backslash to double-quote as the default escape character aligns with RFC 4180 CSV standards and is consistent with the quote-doubling escape logic added throughout the file.
Note: This is a breaking change to the constructor signature that may affect existing callers.
297-298: LGTM: JSON encoding with proper CSV escaping.The JSON encoding with quote-doubling for complex arrays is appropriate for CSV format.
309-311: LGTM: Consistent quote escaping for JSON-encoded objects.The quote-doubling escape logic ensures JSON-encoded objects are properly formatted for CSV output, consistent with the array handling.
279-281: $id is a reserved field—the short-circuit is correct.The search results confirm
$idis not an arbitrary array key. It's explicitly defined as a reserved internal field in the codebase (seesrc/Migration/Sources/CSV.php:409where it's listed alongside$permissions,$createdAt, and$updatedAt). This field is consistently used across all Appwrite resources (databases, tables, users, files, functions, etc.) to represent resource identifiers. The source code properly separates this field from user data before processing, making the short-circuit behavior in lines 279-281 both intentional and safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/Migration/Destinations/CSV.php (1)
296-305: Add explicit false check or use JSON_THROW_ON_ERROR in convertArrayToCSV() and convertObjectToCSV()Both methods at lines 296–305 and 310–316 declare
: stringbut calljson_encode()which can returnfalse(on invalid UTF-8, circular references, or depth limit), causing a runtime TypeError. The suggested casting approach silently converts failures to empty strings. A more explicit approach would be:- return \json_encode($value); + $encoded = \json_encode($value); + if ($encoded === false) { + throw new \JsonException('Failed to encode value'); + } + return $encoded;Alternatively, use
JSON_THROW_ON_ERRORflag (available since PHP 7.3) to throw immediately. Decide on error strategy consistently: theimport()method docblock declares@throws \JsonException, but these conversion methods lack error handling that would trigger such exceptions. Either adopt throwing (and update docblocks), or drop the@throws \JsonExceptionannotation fromimport().
🧹 Nitpick comments (2)
src/Migration/Destinations/CSV.php (2)
39-42: Unused$escapeparam/property; default change introduces noisy API surface
$escapeis no longer used (manual RFC4180 quoting). Keeping it—and changing its default—adds BC noise without effect. Either remove it or mark it deprecated to avoid confusion.Apply one of:
Option A — remove param (breaking for callers passing it explicitly):
- private readonly string $delimiter = ',', - private readonly string $enclosure = '"', - private readonly string $escape = '"', + private readonly string $delimiter = ',', + private readonly string $enclosure = '"',Option B — keep but deprecate (non‑breaking now, removable later):
public function __construct( @@ - private readonly string $escape = '"', + /** @deprecated Unused since manual CSV writing; will be removed in a future release. */ + private readonly string $escape = '"',
85-101: Batch write optimization (optional): format then single fwrite per flushYou currently call
writeCSVLineper row (one syscall per line). For large exports, join formatted lines into one buffer and write once per flush for fewer syscalls.Sketch:
- foreach ($buffer['lines'] as $line) { - if (!$this->writeCSVLine($handle, $line)) { - throw new \Exception("Failed to write CSV line to file: $log"); - } - } + $chunk = ''; + foreach ($buffer['lines'] as $line) { + $chunk .= $this->formatCSVLine($line); // new helper: returns string + } + if (!self::writeAll($handle, $chunk)) { // reuse full-write loop + throw new \Exception("Failed to write CSV chunk to file: $log"); + }Where
formatCSVLine(array $fields): stringshares the quoting logic, andwriteAll()is the full-write helper fromwriteCSVLine. Keeps logic DRY and minimizes syscalls.Also applies to: 137-151
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/Migration/Destinations/CSV.php(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/Migration/Destinations/CSV.php (1)
src/Migration/Sources/CSV.php (1)
delimiter(476-582)
🔇 Additional comments (2)
src/Migration/Destinations/CSV.php (2)
296-305: Arrays now JSON-encoded — resolves prior delimiter ambiguitySwitching arrays to JSON within a quoted CSV field avoids the “comma-inside-cell” ambiguity raised earlier. Good change.
195-216: Current implementation is correct; the concerns don't apply to this use case.The
fwritepartial write issue cited applies primarily to network streams. This code opens a local file in append mode ('a'), where PHP guarantees atomic writes. Comprehensive existing tests—particularlytestCSVExportImportCompatibility—parse the exported CSV output completely withfgetcsv(), reconstructing full records without truncation, confirming the current implementation works correctly.Regarding RFC 4180 compliance: while the code uses
"\n"instead of"\r\n", RFC 4180 explicitly allows implementations to deviate from CRLF. The tests verify the CSV parses successfully with PHP's standard functions, confirming compatibility.No code changes are necessary.
Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/Migration/Destinations/CSV.php (1)
301-321: json_encode may return false → TypeError against declared string return; add robust handling.
convertArrayToCSV()andconvertObjectToCSV()declarestringbut returnjson_encode($value), which can befalse(invalid UTF‑8, recursion, depth). That triggers a TypeError. Also, failures are silent.Harden with JSON_THROW_ON_ERROR and a safe fallback:
protected function convertArrayToCSV(array $value): string { if (empty($value)) { return ''; } if (isset($value['$id'])) { return $value['$id']; } - return \json_encode($value); + try { + return \json_encode($value, \JSON_UNESCAPED_UNICODE | \JSON_UNESCAPED_SLASHES | \JSON_THROW_ON_ERROR); + } catch (\JsonException $e) { + Console::error('JSON encode (array) failed: ' . $e->getMessage()); + return ''; + } } protected function convertObjectToCSV($value): string { if ($value instanceof Row) { return $value->getId(); } - return \json_encode($value); + try { + return \json_encode($value, \JSON_UNESCAPED_UNICODE | \JSON_UNESCAPED_SLASHES | \JSON_THROW_ON_ERROR); + } catch (\JsonException $e) { + Console::error('JSON encode (object) failed: ' . $e->getMessage()); + return ''; + } }
🧹 Nitpick comments (4)
src/Migration/Destinations/CSV.php (4)
40-40: $escape default changed and appears unused; consider deprecating or removing.The constructor’s
$escapenow defaults to", but it isn’t referenced anywhere (custom writer ignores it). Keeping an unused param risks confusion and is a silent BC change in defaults. Either wire it into the writer, or deprecate it (docblock + release note) and plan removal in the next major. Please confirm intended path.
97-101: Switch to writeCSVLine is good; ensure short/partial writes are detected.The check
fwrite(...) !== falsetreats 0/short writes as success. After adopting the custom writer, prefer verifying the exact byte count (see next comment’s diff).
195-216: Strengthen CSV write: detect partial writes and use CRLF for RFC 4180.
- Detect short writes by comparing bytes written vs line length.
- RFC 4180 recommends CRLF line endings; currently using LF only.
Apply:
- return \fwrite($handle, \implode($this->delimiter, $parts) . "\n") !== false; + $line = \implode($this->delimiter, $parts) . "\r\n"; + $written = \fwrite($handle, $line); + return $written === \strlen($line);
248-269: Prevent system fields from being overridden; clarify allowedColumns behavior.
$rowDatacan contain$id/$permissions.array_mergelets$rowDataoverride values set from theResource.- When
allowedColumnsis set,$idand$permissionsare still always included; confirm if that’s intended.Suggested changes:
// Remove internal fields from rowData before merging to avoid duplication - unset($rowData['$createdAt'], $rowData['$updatedAt']); + unset($rowData['$createdAt'], $rowData['$updatedAt'], $rowData['$id'], $rowData['$permissions']); // Add all attributes if no filter specified, otherwise only allowed ones if (empty($this->allowedColumns)) { - $data = \array_merge($data, $rowData); + // Preserve left-hand (system) fields if duplicated in $rowData + $data = $data + $rowData; } else { foreach ($rowData as $key => $value) { if (isset($this->allowedColumns[$key])) { $data[$key] = $value; } } }If the intention is for
allowedColumnsto be authoritative for all fields (including system ones), we can gate$id/$permissions/$createdAt/$updatedAtbehind the allow‑list as well. Confirm preference.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/Migration/Destinations/CSV.php(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/Migration/Destinations/CSV.php (3)
src/Migration/Sources/CSV.php (1)
delimiter(476-582)src/Migration/Resources/Database/Row.php (1)
getData(94-97)src/Migration/Resource.php (2)
getId(127-130)getPermissions(191-194)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/Migration/Sources/Appwrite.php (1)
75-76: Document the expected structure of the$queriesparameter.The
$queriesparameter accepts any array but is later spread into query arrays that expect Query objects. Consider adding a PHPDoc type hint or runtime validation to ensure type safety.Apply this diff to add type documentation:
+ /** + * @param array<Query|mixed> $queries Additional query constraints to apply during row export + */ public function __construct( protected string $project, protected string $endpoint,
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/Migration/Destinations/CSV.php(5 hunks)src/Migration/Sources/Appwrite.php(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/Migration/Destinations/CSV.php (3)
src/Migration/Sources/CSV.php (1)
delimiter(476-582)src/Migration/Resources/Database/Row.php (1)
getData(94-97)src/Migration/Resource.php (2)
getId(127-130)getPermissions(191-194)
src/Migration/Sources/Appwrite.php (3)
src/Migration/Sources/Appwrite/Reader/API.php (1)
queryLimit(239-242)src/Migration/Sources/Appwrite/Reader/Database.php (1)
queryLimit(411-414)src/Migration/Sources/Appwrite/Reader.php (1)
queryLimit(108-108)
🪛 GitHub Actions: CodeQL
src/Migration/Destinations/CSV.php
[error] 6-6: Cannot use Utopia\Console as Console because the name is already in use. (PHPStan error) [Command: ./vendor/bin/phpstan analyse --level 3 src tests --memory-limit 2G]
🪛 GitHub Actions: Tests
src/Migration/Destinations/CSV.php
[error] 6-6: Fatal error: Cannot use Utopia\Console as Console because the name is already in use.
🔇 Additional comments (4)
src/Migration/Destinations/CSV.php (3)
194-215: LGTM: RFC 4180 compliant escaping implementation.The custom
writeCSVLinemethod correctly implements RFC 4180 double-quote escaping with efficient field detection and a single write operation.
247-259: LGTM: Clearer data extraction logic.Extracting
$rowDataonce and removing timestamp fields before merging prevents duplication and improves readability.
279-322: LGTM: Standardized value conversion helpers.The new conversion helpers provide consistent serialization, addressing the previous comma-ambiguity concern by using JSON encoding for complex arrays while handling document references and Row objects appropriately.
src/Migration/Sources/Appwrite.php (1)
1104-1107: Remove this review comment—the concern is unfounded.The
$this->queriesproperty is initialized to an empty array in the constructor and is never assigned anywhere in the codebase. The Appwrite instantiation atbin/MigrationCLI.php:213omits the$queriesparameter, so it always defaults to[]. The spread operator produces no effect, and no conflicting queries can be introduced.Even theoretically, Appwrite Query semantics treat multiple queries additively (AND operations), where filter queries and pagination queries coexist without override behavior. The codebase consistently uses this pattern successfully throughout (e.g., combining
Query::limit()andQuery::cursorAfter()in multiple methods).Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/Migration/Destinations/CSV.php (1)
40-40: The$escapeparameter remains unused.This issue was previously flagged: the constructor parameter
$escapeis never referenced in the code. ThewriteCSVLinemethod (lines 200-214) hardcodes escaping using$this->enclosureinstead.Apply this diff to remove the unused parameter:
array $allowedColumns = [], private readonly string $delimiter = ',', private readonly string $enclosure = '"', - private readonly string $escape = '"', private readonly bool $includeHeaders = true,
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
composer.lockis excluded by!**/*.lock
📒 Files selected for processing (1)
src/Migration/Destinations/CSV.php(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/Migration/Destinations/CSV.php (3)
src/Migration/Sources/CSV.php (1)
delimiter(476-582)src/Migration/Resources/Database/Row.php (1)
getData(94-97)src/Migration/Resource.php (2)
getId(127-130)getPermissions(191-194)
🪛 GitHub Actions: Tests
src/Migration/Destinations/CSV.php
[error] 98-98: TypeError: CSV::writeCSVLine(): Argument #1 ($handle) must be of type Utopia\Migration\Resource, resource given.
[error] 98-98: TypeError: CSV::writeCSVLine(): Argument #1 ($handle) must be of type Utopia\Migration\Resource, resource given.
[error] 149-149: TypeError: CSV::writeCSVLine(): Argument #1 ($handle) must be of type Utopia\Migration\Resource, resource given.
[error] 149-149: TypeError: CSV::writeCSVLine(): Argument #1 ($handle) must be of type Utopia\Migration\Resource, resource given.
[error] 200-200: TypeError: CSV::writeCSVLine(): Argument #1 ($handle) must be of type Utopia\Migration\Resource, resource given.
[error] 200-200: TypeError: CSV::writeCSVLine(): Argument #1 ($handle) must be of type Utopia\Migration\Resource, resource given.
🔇 Additional comments (2)
src/Migration/Destinations/CSV.php (2)
244-276: LGTM: Resource data extraction logic is correct.The method properly:
- Extracts system fields from both Resource methods and data array
- Removes duplicate timestamps from rowData before merging
- Applies column filtering correctly for both filtered and unfiltered cases
- Converts all values consistently using the helper methods
298-310: LGTM: Array serialization strategy is clean.The simplified approach avoids parsing ambiguity by consistently using JSON encoding for complex arrays while handling special cases (empty arrays, nested documents with
$id) appropriately.
Summary by CodeRabbit
New Features
Bug Fixes
Behavior Change