-
Notifications
You must be signed in to change notification settings - Fork 3
Feat: auto-detect delimiters #90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughDelimiter detection for CSV files is introduced in the CSV source class. The Changes
Sequence Diagram(s)sequenceDiagram
participant Test as CSVTest
participant CSV as CSV
participant File as CSV File Stream
Test->>File: Open test CSV file
Test->>CSV: detectDelimiter(File stream)
CSV->>File: Read sample lines
CSV->>CSV: Analyze lines for delimiter
CSV-->>Test: Return detected delimiter
Test->>Test: Assert detected delimiter matches expected
Test->>File: Close file
sequenceDiagram
participant App as Application
participant CSV as CSV
participant File as CSV File Stream
App->>CSV: withCsvStream(callback)
CSV->>File: Open CSV file
CSV->>File: Sample lines for delimiter
CSV->>CSV: Detect delimiter
CSV->>App: Invoke callback(File stream, detected delimiter)
App->>File: Parse rows using detected delimiter
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/Migration/Sources/CSV.php (1)
412-518: Algorithm is well-designed but could benefit from refactoring.The delimiter detection algorithm is mathematically sound and handles edge cases well. The scoring system combining consistency and quality metrics is effective. However, the method is quite complex and could be improved.
Consider breaking down this method into smaller, focused functions:
+ private function sampleLines($stream, int $maxLines = 5): array + { + $sampleLines = []; + for ($i = 0; $i < $maxLines && !feof($stream); $i++) { + $line = fgets($stream); + if ($line === false) break; + + $line = trim($line); + if (empty($line)) { + $i--; + continue; + } + $sampleLines[] = $line; + } + return $sampleLines; + } + + private function calculateDelimiterScore(string $delimiter, array $sampleLines): float + { + // Extract the scoring logic here + }This would improve readability and testability of individual components.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (7)
tests/Migration/resources/csv/comma.csvis excluded by!**/*.csvtests/Migration/resources/csv/empty.csvis excluded by!**/*.csvtests/Migration/resources/csv/pipe.csvis excluded by!**/*.csvtests/Migration/resources/csv/quoted_fields.csvis excluded by!**/*.csvtests/Migration/resources/csv/semicolon.csvis excluded by!**/*.csvtests/Migration/resources/csv/single_column.csvis excluded by!**/*.csvtests/Migration/resources/csv/tab.csvis excluded by!**/*.csv
📒 Files selected for processing (2)
src/Migration/Sources/CSV.php(5 hunks)tests/Migration/Unit/General/CSVTest.php(1 hunks)
🔇 Additional comments (4)
src/Migration/Sources/CSV.php (2)
183-183: LGTM: Callback signature and fgetcsv usage correctly updated.The callback signature properly accepts the delimiter parameter, and the
fgetcsvcall correctly uses the detected delimiter with theseparatorparameter. This ensures consistent parsing behavior across different CSV formats.Also applies to: 193-193
317-317: LGTM: Method signature and delimiter detection integration.The PHPDoc correctly documents the new callback signature, and the delimiter detection is properly integrated before invoking the callback. The stream handling remains consistent.
Also applies to: 339-342
tests/Migration/Unit/General/CSVTest.php (2)
15-26: LGTM: Appropriate use of reflection for testing private method.Using reflection to test the private
delimitermethod is the right approach here. It avoids code duplication while ensuring the actual implementation is tested. ThenewInstanceWithoutConstructor()method properly bypasses constructor dependencies.
28-48: All CSV test resources validated
All required.csvfiles are present undertests/resources/csv/and their contents match the expected delimiters (comma, semicolon, tab, pipe) with correct fallback behavior for single-column and empty files. No further action needed.
,,;,\t,|and decides the best delimiter for further CSV process.Summary by CodeRabbit
New Features
Tests