Adds a new JSON multi-sheet extractor plugin#697
Adds a new JSON multi-sheet extractor plugin#697amarbakir wants to merge 5 commits intoFlatFilers:mainfrom
Conversation
…actor Init JSON multi-sheet extractor plugin
WalkthroughThis pull request introduces updates to two Flatfile plugins: Changes
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Outside diff range and nitpick comments (18)
plugins/json-multisheet-extractor/src/index.ts (2)
4-10: Add JSDoc documentation and improve type safety.Consider adding documentation and type improvements:
- Add JSDoc to describe the purpose and parameters
- Consider creating an interface for the options object
- Add explicit return type
+/** + * Creates a JSON multi-sheet extractor that processes JSON files into multiple sheets + * @param options Configuration options + * @param options.chunkSize Number of records to process in each chunk + * @param options.parallel Number of parallel processing threads + * @param options.debug Enable debug logging + * @returns Configured extractor instance + */ +interface ExtractorOptions { + chunkSize?: number; + parallel?: number; + debug?: boolean; +} -export const JSONMultiSheetExtractor = (options?: { - chunkSize?: number - parallel?: number - debug?: boolean -}) => { +export const JSONMultiSheetExtractor = (options?: ExtractorOptions): ReturnType<typeof Extractor> => {
12-12: Add documentation for the jsonParser export.Consider adding JSDoc to explain why this function is exported separately and its multi-sheet parsing capabilities.
+/** + * Direct access to the multi-sheet JSON parsing functionality + * @see parseBuffer + */ export const jsonParser = parseBufferplugins/json-multisheet-extractor/src/parser.ts (2)
2-3: Remove extra empty line.A single empty line after imports is sufficient for readability.
import { WorkbookCapture, parseSheet } from '@flatfile/util-extractor' - export function parseBuffer(buffer: Buffer): WorkbookCapture {
38-41: Enhance error context for debugging.The error handling should provide more context about where the failure occurred.
} catch (error) { - console.error('An error occurred:', error) - throw error + const wrappedError = new Error( + `Failed to parse JSON multi-sheet data: ${error.message}` + ) + wrappedError.cause = error + throw wrappedError }plugins/json-multisheet-extractor/src/parser.spec.ts (3)
6-8: Consider using a more robust way to handle test data paths.The hardcoded path to the test data file could break if the directory structure changes. Consider using environment variables or a test configuration file to manage test data paths.
- const buffer: Buffer = fs.readFileSync( - path.join(__dirname, '../ref/test-multisheet.json') - ) + const TEST_DATA_DIR = process.env.TEST_DATA_DIR || path.join(__dirname, '../ref') + const buffer: Buffer = fs.readFileSync( + path.join(TEST_DATA_DIR, 'test-multisheet.json') + )
10-25: Add type definitions for header constants.Consider adding type definitions to improve maintainability and ensure consistency with the actual implementation.
+type HeaderType = readonly string[] + - const CONTACT_HEADERS = [ + const CONTACT_HEADERS: HeaderType = [ 'First Name', // ... ]; - const ORDER_HEADERS = [ + const ORDER_HEADERS: HeaderType = [ 'ID', 'Amount', ]
85-92: Consider removing redundant header tests.These header validation tests are redundant as they're already covered in the main 'JSON to WorkbookCapture' test. Consider removing them or testing specific header-related edge cases instead.
If you want to keep header-specific tests, consider testing edge cases like:
test('handles headers with special characters', () => { // Test with headers containing spaces, dots, etc. }) test('maintains header order', () => { // Verify that header order matches the input JSON structure })plugins/json-multisheet-extractor/README.md (6)
1-35: Replace hard tabs with spaces in the JSON example.The JSON example effectively illustrates the plugin's functionality. However, for better markdown consistency, replace hard tabs with spaces.
{ - "contacts": [ // sheet 1 - { // record 1 - "firstName": "John", // header 1, value + "contacts": [ // sheet 1 + { // record 1 + "firstName": "John", // header 1, value🧰 Tools
🪛 Markdownlint
8-8: Column: 1
Hard tabs(MD010, no-hard-tabs)
9-9: Column: 1
Hard tabs(MD010, no-hard-tabs)
10-10: Column: 1
Hard tabs(MD010, no-hard-tabs)
13-13: Column: 1
Hard tabs(MD010, no-hard-tabs)
14-14: Column: 1
Hard tabs(MD010, no-hard-tabs)
15-15: Column: 1
Hard tabs(MD010, no-hard-tabs)
18-18: Column: 1
Hard tabs(MD010, no-hard-tabs)
19-19: Column: 1
Hard tabs(MD010, no-hard-tabs)
20-20: Column: 1
Hard tabs(MD010, no-hard-tabs)
21-21: Column: 1
Hard tabs(MD010, no-hard-tabs)
22-22: Column: 1
Hard tabs(MD010, no-hard-tabs)
24-24: Column: 1
Hard tabs(MD010, no-hard-tabs)
25-25: Column: 1
Hard tabs(MD010, no-hard-tabs)
41-51: Enhance parameter documentation with specific recommendations.Consider adding:
- Recommended ranges for
chunkSizebased on typical JSON file sizes- Guidelines for selecting
parallelvalues based on server resources- Memory usage estimates for different parameter combinations
54-66: Add descriptions and sequence information for API calls.The API calls list would be more helpful with:
- Brief descriptions of each API's purpose
- The sequence in which these APIs are called during the extraction process
- Example responses or potential error scenarios
89-89: Fix grammatical error in the description.Add the missing preposition "in":
-When a JSON file is uploaded, the plugin will extract the structured data and process it the extractor's parser. +When a JSON file is uploaded, the plugin will extract the structured data and process it in the extractor's parser.🧰 Tools
🪛 LanguageTool
[uncategorized] ~89-~89: Possible missing preposition found.
Context: ...ract the structured data and process it the extractor's parser. ```javascript impo...(AI_HYDRA_LEO_MISSING_IN)
91-101: Enhance the code example with error handling.The example would be more complete with:
- Error handling for file processing failures
- Configuration examples for the optional parameters
- Comments explaining the expected outcomes
1-104: Overall documentation provides good coverage but could be enhanced.The README effectively documents the new plugin's functionality and aligns well with the PR objectives. However, consider adding:
- A troubleshooting section for common issues
- Performance benchmarks or recommendations
- Limitations or constraints of the plugin
🧰 Tools
🪛 LanguageTool
[uncategorized] ~89-~89: Possible missing preposition found.
Context: ...ract the structured data and process it the extractor's parser. ```javascript impo...(AI_HYDRA_LEO_MISSING_IN)
🪛 Markdownlint
8-8: Column: 1
Hard tabs(MD010, no-hard-tabs)
9-9: Column: 1
Hard tabs(MD010, no-hard-tabs)
10-10: Column: 1
Hard tabs(MD010, no-hard-tabs)
13-13: Column: 1
Hard tabs(MD010, no-hard-tabs)
14-14: Column: 1
Hard tabs(MD010, no-hard-tabs)
15-15: Column: 1
Hard tabs(MD010, no-hard-tabs)
18-18: Column: 1
Hard tabs(MD010, no-hard-tabs)
19-19: Column: 1
Hard tabs(MD010, no-hard-tabs)
20-20: Column: 1
Hard tabs(MD010, no-hard-tabs)
21-21: Column: 1
Hard tabs(MD010, no-hard-tabs)
22-22: Column: 1
Hard tabs(MD010, no-hard-tabs)
24-24: Column: 1
Hard tabs(MD010, no-hard-tabs)
25-25: Column: 1
Hard tabs(MD010, no-hard-tabs)
utils/extractor/src/index.spec.ts (2)
1-8: LGTM! Consider adding error handling for file reading.The test setup is well-structured. However, it would be beneficial to handle potential file reading errors, especially since this is a critical setup step for the tests.
const buffer: Buffer = fs.readFileSync( path.join(__dirname, '../ref/test-array-to-sheetcapture.json') ) + +// Ensure the file exists and can be parsed +expect(buffer).toBeTruthy()
13-97: Improve test data maintainability.The test data is comprehensive but contains repetitive structures. Consider:
- Extracting common data into constants
- Using test data factories
- Adding specific assertions for different aspects (structure, types, undefined handling)
Example refactor:
const COMMON_FATHER_HIERARCHY = { 'Father.First Name': { value: 'Father_First_1' }, 'Father.Last Name': { value: 'Father_Last_1' }, 'Father.Father.First Name': { value: 'Father_First_2' }, 'Father.Father.Last Name': { value: 'Father_Last_2' }, 'Father.Father.Father.First Name': { value: 'Father_First_3' }, 'Father.Father.Father.Last Name': { value: 'Father_Last_3' }, }; const COMMON_COORDINATES = { 'Address.Coordinates.Latitude': { value: '40.7128° N' }, 'Address.Coordinates.Longitude': { value: '74.0060° W' }, }; // Then use spread operator in test data { 'First Name': { value: 'Tony' }, ...COMMON_FATHER_HIERARCHY, ...COMMON_COORDINATES, // ... other unique fields }utils/extractor/src/index.ts (2)
332-334: Consider preserving array structure instead of stringifying.Converting arrays to strings loses the data structure and makes it harder for users to work with array data.
Consider preserving arrays or providing configuration options:
filteredRow[header] = { value: Array.isArray(cell) ? cell : cell, type: Array.isArray(cell) ? 'array' : typeof cell };
297-347: Consider optimizing for large datasets and improving type safety.While the function integrates well with the existing codebase, there are a few suggestions for improvement:
- For large datasets, consider implementing streaming or chunked processing to avoid memory issues
- Add input validation for the workbook structure
Consider these improvements:
- Add batch processing:
const BATCH_SIZE = 1000; export async function* parseSheetInBatches( jsonArray: any[], batchSize: number = BATCH_SIZE ): AsyncGenerator<Partial<SheetCapture>> { // Process headers from first item const firstBatch = jsonArray.slice(0, 1); const { headers } = parseSheet(firstBatch); // Process data in batches for (let i = 0; i < jsonArray.length; i += batchSize) { const batch = jsonArray.slice(i, i + batchSize); const { data } = parseSheet(batch); yield { headers, data }; } }
- Add input validation:
function validateWorkbookStructure(data: any[]): void { if (!Array.isArray(data)) { throw new Error('Input must be an array'); } if (data.some(item => typeof item !== 'object' || item === null)) { throw new Error('All items must be non-null objects'); } }plugins/json-extractor/src/parser.ts (1)
31-39: Simplify the 'isEmpty' function using 'Object.keys'.The custom
isEmptyfunction can be simplified by utilizingObject.keys(obj).length === 0, which is a more idiomatic and concise way to check if an object is empty in JavaScript.Consider updating the code as follows:
function isEmpty(obj) { return Object.keys(obj).length === 0; }Alternatively, you can remove the
isEmptyfunction altogether and directly modify the condition at line 18:- if (isEmpty(sheetCapture)) { + if (Object.keys(sheetCapture).length === 0) {This approach reduces code complexity and leverages built-in JavaScript capabilities.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (7)
package-lock.jsonis excluded by!**/package-lock.json,!**/*.jsonplugins/json-extractor/package.jsonis excluded by!**/*.jsonplugins/json-extractor/ref/test-basic.jsonis excluded by!**/*.jsonplugins/json-multisheet-extractor/package.jsonis excluded by!**/*.jsonplugins/json-multisheet-extractor/ref/test-multisheet.jsonis excluded by!**/*.jsonutils/extractor/package.jsonis excluded by!**/*.jsonutils/extractor/ref/test-array-to-sheetcapture.jsonis excluded by!**/*.json
📒 Files selected for processing (12)
.changeset/slow-bottles-roll.md(1 hunks)plugins/json-extractor/src/parser.spec.ts(0 hunks)plugins/json-extractor/src/parser.ts(2 hunks)plugins/json-multisheet-extractor/CHANGELOG.md(1 hunks)plugins/json-multisheet-extractor/README.md(1 hunks)plugins/json-multisheet-extractor/jest.config.cjs(1 hunks)plugins/json-multisheet-extractor/src/index.ts(1 hunks)plugins/json-multisheet-extractor/src/parser.spec.ts(1 hunks)plugins/json-multisheet-extractor/src/parser.ts(1 hunks)plugins/json-multisheet-extractor/tsup.config.mjs(1 hunks)utils/extractor/src/index.spec.ts(1 hunks)utils/extractor/src/index.ts(1 hunks)
💤 Files with no reviewable changes (1)
- plugins/json-extractor/src/parser.spec.ts
✅ Files skipped from review due to trivial changes (4)
- .changeset/slow-bottles-roll.md
- plugins/json-multisheet-extractor/CHANGELOG.md
- plugins/json-multisheet-extractor/jest.config.cjs
- plugins/json-multisheet-extractor/tsup.config.mjs
🧰 Additional context used
🪛 LanguageTool
plugins/json-multisheet-extractor/README.md
[uncategorized] ~89-~89: Possible missing preposition found.
Context: ...ract the structured data and process it the extractor's parser. ```javascript impo...
(AI_HYDRA_LEO_MISSING_IN)
🪛 Markdownlint
plugins/json-multisheet-extractor/README.md
8-8: Column: 1
Hard tabs
(MD010, no-hard-tabs)
9-9: Column: 1
Hard tabs
(MD010, no-hard-tabs)
10-10: Column: 1
Hard tabs
(MD010, no-hard-tabs)
13-13: Column: 1
Hard tabs
(MD010, no-hard-tabs)
14-14: Column: 1
Hard tabs
(MD010, no-hard-tabs)
15-15: Column: 1
Hard tabs
(MD010, no-hard-tabs)
18-18: Column: 1
Hard tabs
(MD010, no-hard-tabs)
19-19: Column: 1
Hard tabs
(MD010, no-hard-tabs)
20-20: Column: 1
Hard tabs
(MD010, no-hard-tabs)
21-21: Column: 1
Hard tabs
(MD010, no-hard-tabs)
22-22: Column: 1
Hard tabs
(MD010, no-hard-tabs)
24-24: Column: 1
Hard tabs
(MD010, no-hard-tabs)
25-25: Column: 1
Hard tabs
(MD010, no-hard-tabs)
🔇 Additional comments (4)
plugins/json-multisheet-extractor/src/index.ts (2)
1-2: LGTM! Clean and well-organized imports.
The imports demonstrate good separation of concerns by utilizing the shared extraction utility while keeping parsing logic local to the plugin.
4-10: Verify the usage of configuration parameters.
Let's verify how these configuration parameters affect the extraction process and ensure they're properly documented in tests.
✅ Verification successful
Let me gather more specific information about how these configuration parameters are used.
Let me check one more thing to verify the actual implementation of these parameters.
Let me check one final thing to understand how these parameters are used in the extractor utility.
Configuration parameters are properly implemented and documented
The configuration parameters are well-implemented and documented:
chunkSize(default: 5,000): Controls the number of records processed in each chunkparallel(default: 1): Controls parallel processing of chunksdebug(default: false): Enables debug logging
The parameters are:
- Properly documented in the README.md
- Correctly implemented in the Extractor utility (utils/extractor/src/index.ts)
- Used for performance optimization and debugging purposes
- Have sensible default values
The implementation shows these parameters are effectively used during the extraction process, particularly during job execution and record processing. No issues were found with their implementation or documentation.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check test coverage for configuration parameters
# Test 1: Look for tests covering different configuration options
echo "Searching for test cases with different configurations..."
rg -l "JSONMultiSheetExtractor.*chunkSize|parallel|debug" --type ts
# Test 2: Check if there are examples or documentation
echo "Searching for documentation or examples..."
fd -e md -e txt . | xargs rg -l "JSONMultiSheetExtractor.*configuration|options"
Length of output: 2975
Script:
#!/bin/bash
# Check implementation details of the configuration parameters
# Look for the actual implementation of parseBuffer function
echo "Checking parseBuffer implementation..."
ast-grep --pattern 'function parseBuffer($$$)'
# Check the Extractor implementation to see how options are used
echo "Checking Extractor implementation..."
rg -A 10 "function Extractor.*options" utils/extractor/src/index.ts
# Check specific tests for json-multisheet-extractor
echo "Checking specific tests..."
fd ".*\.spec\.ts|.*\.test\.ts" plugins/json-multisheet-extractor/
# Check documentation content
echo "Checking documentation content..."
cat plugins/json-multisheet-extractor/README.md
Length of output: 3508
Script:
#!/bin/bash
# Check the parser implementation and tests
# Check parser implementation
echo "Checking parser implementation..."
cat plugins/json-multisheet-extractor/src/parser.spec.ts
# Check if parameters are used in the actual parser
echo "Checking parser usage of parameters..."
rg -A 5 "chunkSize|parallel|debug" plugins/json-multisheet-extractor/src/parser.ts
Length of output: 3180
Script:
#!/bin/bash
# Check the extractor utility implementation
echo "Checking extractor utility implementation..."
cat utils/extractor/src/index.ts
echo "Checking for any related test files..."
fd -e spec.ts -e test.ts . utils/extractor/
Length of output: 10147
plugins/json-multisheet-extractor/src/parser.spec.ts (1)
6-8: Verify the existence of test data file.
Let's ensure the reference test file exists in the correct location.
✅ Verification successful
Test data file exists and has the expected structure
The verification confirms that:
- The test file exists at the correct path:
plugins/json-multisheet-extractor/ref/test-multisheet.json - The JSON file contains the required
contactsandordersproperties
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the existence of test data file
# Test: Check if the test data file exists
fd -t f "test-multisheet.json" plugins/json-multisheet-extractor/ref/
# Test: Verify the JSON structure in the test file
if [ -f "plugins/json-multisheet-extractor/ref/test-multisheet.json" ]; then
echo "Validating JSON structure..."
cat plugins/json-multisheet-extractor/ref/test-multisheet.json | jq 'has("contacts") and has("orders")'
fi
Length of output: 378
utils/extractor/src/index.spec.ts (1)
6-8: Verify test file existence and contents.
Let's ensure the test file exists and contains the expected structure.
✅ Verification successful
Test file verification successful
The test file exists and contains valid JSON data with the expected array structure. The sample data shows a well-formed object with nested properties, suitable for testing array-to-sheet capture functionality.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if the test file exists and validate its structure
if [ -f "utils/extractor/ref/test-array-to-sheetcapture.json" ]; then
echo "Test file exists"
# Validate JSON structure
cat "utils/extractor/ref/test-array-to-sheetcapture.json" | jq 'type'
# Show sample of contents
cat "utils/extractor/ref/test-array-to-sheetcapture.json" | jq '.[0]'
else
echo "Test file not found!"
fi
Length of output: 893
| export function parseBuffer(buffer: Buffer): WorkbookCapture { | ||
| try { | ||
| const fileContents = buffer.toString('utf8') |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Add input validation for buffer parameter.
The function should validate that the buffer parameter is not null or undefined.
export function parseBuffer(buffer: Buffer): WorkbookCapture {
try {
+ if (!Buffer.isBuffer(buffer)) {
+ throw new Error('Input must be a valid Buffer')
+ }
const fileContents = buffer.toString('utf8')📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| export function parseBuffer(buffer: Buffer): WorkbookCapture { | |
| try { | |
| const fileContents = buffer.toString('utf8') | |
| export function parseBuffer(buffer: Buffer): WorkbookCapture { | |
| try { | |
| if (!Buffer.isBuffer(buffer)) { | |
| throw new Error('Input must be a valid Buffer') | |
| } | |
| const fileContents = buffer.toString('utf8') |
| if (!fileContents) { | ||
| console.error('Invalid file contents') | ||
| return {} as WorkbookCapture | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Improve empty content validation.
The current error handling could be more informative and type-safe.
if (!fileContents) {
- console.error('Invalid file contents')
- return {} as WorkbookCapture
+ console.error('File contents are empty or invalid')
+ return Object.create(null) as WorkbookCapture
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (!fileContents) { | |
| console.error('Invalid file contents') | |
| return {} as WorkbookCapture | |
| } | |
| if (!fileContents) { | |
| console.error('File contents are empty or invalid') | |
| return Object.create(null) as WorkbookCapture | |
| } |
| try { | ||
| const result = parseSheet(value); | ||
| results[sheet] = result; // Store result in the results object | ||
| } catch (error) { | ||
| console.error(`Error processing sheet "${sheet}":`, error); | ||
| } | ||
| } else { |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Improve error handling in sheet processing.
The current error handling swallows errors silently. Consider:
- Collecting errors for all sheets
- Providing more context about the failure
- try {
- const result = parseSheet(value);
- results[sheet] = result; // Store result in the results object
- } catch (error) {
- console.error(`Error processing sheet "${sheet}":`, error);
- }
+ try {
+ if (value.length === 0) {
+ console.warn(`Sheet "${sheet}" is empty`);
+ results[sheet] = { data: [], fields: [] };
+ return;
+ }
+ const result = parseSheet(value);
+ results[sheet] = result;
+ } catch (error) {
+ const errorMessage = `Failed to process sheet "${sheet}": ${error.message}`;
+ console.error(errorMessage);
+ throw new Error(errorMessage);
+ }Committable suggestion skipped: line range outside the PR's diff.
| return {} as WorkbookCapture | ||
| } | ||
|
|
||
| const parsedData = JSON.parse(fileContents) |
There was a problem hiding this comment.
💡 Codebase verification
Add input validation and size limits to prevent JSON-based attacks
The current implementation lacks crucial security measures:
- No file size limits before parsing
- No JSON depth validation
- Basic type checking but no structural validation
- No limits on number of sheets or records
Specific locations requiring attention:
plugins/json-multisheet-extractor/src/parser.ts:13: Add size check before parsingplugins/json-multisheet-extractor/src/parser.ts:22: Add depth validation and sheet count limitsplugins/json-multisheet-extractor/src/parser.ts:25: Add array size limits for each sheet
🔗 Analysis chain
Add safeguards against malicious JSON input.
Consider implementing:
- JSON size limits to prevent memory exhaustion
- Depth limits to prevent stack overflow attacks
- Validation of sheet names to prevent injection
Let's check if there are any size limits implemented elsewhere:
Let me gather more information about any existing validation or security measures in the JSON plugin:
Let me check one more thing to see if there are any size limits in the extractor utility or parent classes:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for size limit implementations in the codebase
rg -i "size.*limit|maxSize|max.*length" --type ts
Length of output: 887
Script:
#!/bin/bash
# Check for validation or security measures in the JSON plugin
rg -A 5 "parse|validate|check" plugins/json-multisheet-extractor/
# Look for any configuration or options that might include limits
ast-grep --pattern 'interface $_Config {
$$$
}' plugins/json-multisheet-extractor/
# Check if there's any error handling around the JSON parsing
rg -B 2 -A 2 "JSON.parse" plugins/json-multisheet-extractor/
Length of output: 7726
Script:
#!/bin/bash
# Check the extractor utility for any size limits
rg -B 2 -A 5 "Extractor|Buffer" "@flatfile/util-extractor"
# Also check the direct file contents
cat plugins/json-multisheet-extractor/src/parser.ts
Length of output: 1424
| test('JSON to WorkbookCapture', () => { | ||
| const capture = parseBuffer(buffer) | ||
| expect(capture).toEqual({ | ||
| contacts: { | ||
| headers: CONTACT_HEADERS, | ||
| data: [ | ||
| { | ||
| 'First Name': { value: 'Tony' }, | ||
| 'Last Name': { value: 'Lamb' }, | ||
| Email: { value: 'me@opbaj.tp' }, | ||
| 'Address.Street': { value: '123 Main Street' }, | ||
| 'Address.City': { value: 'Springfield' }, | ||
| 'Address.State': { value: 'ST' }, | ||
| 'Address.Zip': { value: '12345' }, | ||
| 'Address.Coordinates.Latitude': { value: '40.7128° N' }, | ||
| 'Address.Coordinates.Longitude': { value: '74.0060° W' }, | ||
| }, | ||
| { | ||
| 'First Name': { value: 'Christian' }, | ||
| 'Last Name': { value: 'Ramos' }, | ||
| Email: { value: 'uw@ag.tg' }, | ||
| 'Address.Street': { value: '456 Elm Street' }, | ||
| 'Address.City': { value: 'Greenville' }, | ||
| 'Address.State': { value: 'GT' }, | ||
| 'Address.Zip': { value: '67890' }, | ||
| 'Address.Coordinates.Latitude': { value: '40.7128° N' }, | ||
| 'Address.Coordinates.Longitude': { value: '74.0060° W' }, | ||
| }, | ||
| { | ||
| 'First Name': { value: 'Frederick' }, | ||
| 'Last Name': { value: 'Boyd' }, | ||
| Email: { value: 'kempur@ascebec.gs' }, | ||
| 'Address.Street': { value: '789 Oak Street' }, | ||
| 'Address.City': { value: 'Rivertown' }, | ||
| 'Address.State': { value: 'RT' }, | ||
| 'Address.Zip': { value: '10112' }, | ||
| 'Address.Coordinates.Latitude': { value: '40.7128° N' }, | ||
| 'Address.Coordinates.Longitude': { value: '74.0060° W' }, | ||
| }, | ||
| ], | ||
| metadata: undefined, | ||
| }, | ||
| orders: { | ||
| headers: ORDER_HEADERS, | ||
| data: [ | ||
| { | ||
| ID: { value: "1234" }, | ||
| Amount: { value: 5678 } | ||
| }, | ||
| { | ||
| ID: { value: "9876" }, | ||
| Amount: { value: 5432 } | ||
| } | ||
| ], | ||
| metadata: undefined, | ||
| }, | ||
| }) | ||
| }) |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Enhance test coverage with edge cases and type validation.
The current test case could be improved in several ways:
- Add test cases for empty arrays and null values
- Validate that numeric values (Amount) are properly typed
- Use more diverse test data (coordinates are identical for all contacts)
Also, consider splitting this large test into smaller, focused test cases:
test('handles empty arrays', () => {
const emptyData = { contacts: [], orders: [] }
const capture = parseBuffer(Buffer.from(JSON.stringify(emptyData)))
expect(capture.contacts.data).toHaveLength(0)
expect(capture.orders.data).toHaveLength(0)
})
test('preserves numeric types', () => {
const capture = parseBuffer(buffer)
expect(typeof capture.orders.data[0].Amount.value).toBe('number')
})| test('JSON to SheetCapture', () => { | ||
| const capture = parseSheet(jsonObj) | ||
| expect(capture).toEqual({ |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Add test cases for edge cases and error scenarios.
While the happy path is covered, consider adding test cases for:
- Empty arrays
- Malformed JSON
- Deeply nested objects beyond current depth
- Arrays with mixed data types
Would you like me to help generate additional test cases to improve coverage?
| const flattenObject = (obj: any, parent: string = '', res: any = {}) => { | ||
| for (let key in obj) { | ||
| const propName = parent ? `${parent}.${key}` : key | ||
| if (typeof obj[key] === 'object') { | ||
| flattenObject(obj[key], propName, res) | ||
| } else { | ||
| res[propName] = obj[key] | ||
| } | ||
| } | ||
| return res | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Add protection against circular references and improve performance.
The flattenObject function has several potential issues:
- No protection against circular references
- Mutates the
resparameter which could cause side effects - Doesn't handle arrays properly in nested structures
Consider this improved implementation:
const flattenObject = (
obj: any,
parent: string = '',
seen: WeakSet<object> = new WeakSet()
): Record<string, any> => {
const res: Record<string, any> = {};
if (typeof obj !== 'object' || obj === null || seen.has(obj)) {
return res;
}
seen.add(obj);
for (const [key, value] of Object.entries(obj)) {
const propName = parent ? `${parent}.${key}` : key;
if (Array.isArray(value)) {
res[propName] = value;
} else if (typeof value === 'object' && value !== null) {
Object.assign(res, flattenObject(value, propName, seen));
} else {
res[propName] = value;
}
}
return res;
};| if (filteredResults.length === 0) { | ||
| return {} as SheetCapture | ||
| } |
There was a problem hiding this comment.
Return a properly structured empty SheetCapture.
The empty object return doesn't match the SheetCapture type structure. This could cause runtime errors.
- return {} as SheetCapture
+ return {
+ headers: [],
+ data: [],
+ } as SheetCapture📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (filteredResults.length === 0) { | |
| return {} as SheetCapture | |
| } | |
| if (filteredResults.length === 0) { | |
| return { | |
| headers: [], | |
| data: [], | |
| } as SheetCapture | |
| } |
| } catch (error) { | ||
| console.error('An error occurred:', error) | ||
| throw error | ||
| } |
There was a problem hiding this comment.
Improve error handling to avoid leaking sensitive data.
The current error handling logs the entire error object which might contain sensitive information.
} catch (error) {
- console.error('An error occurred:', error)
+ console.error('Error parsing sheet:', error instanceof Error ? error.message : 'Unknown error')
throw new Error('Failed to parse sheet data')
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| } catch (error) { | |
| console.error('An error occurred:', error) | |
| throw error | |
| } | |
| } catch (error) { | |
| console.error('Error parsing sheet:', error instanceof Error ? error.message : 'Unknown error') | |
| throw new Error('Failed to parse sheet data') | |
| } |
|
@carlbrugger I think we're ready for a review whenever you are! Let us know how you prefer to handle the AI review bot's suggestions. Some are already addressed here: aprimetechnology#1 Thanks! |
|
@amarbakir thanks for the PR! I will look it over. |
|
@amarbakir I really like the feature you're adding here. 🙌🏼 I integrated this functionality into the existing JSON extractor plugin in #698. Would you take a look at it? |
|
Also, are you on Flatfile Developers Slack? |
|
Hey @carlbrugger ! Just seeing your comments. I just took a look now and it's super straightforward. Overall I'm a fan of having this as an extra function of the existing extractor Working on Slack access now! I should be able to get that through APrime soon |
|
Awesome! We'll get #698 released in short order so you can start using it. Thanks for your contribution! |
|
Functionality added through #698 |
Please explain how to summarize this PR for the Changelog:
Tell code reviewer how and what to test:
The
plugins/json-extractorandplugins/json-multisheet-extractorplugins. Existing behavior should not have been changedRequirements:
{ "contacts": [ // sheet 1 { // record 1 "firstName": "John", // header 1, value "lastName": "Doe", // header 2, value "email": "john.doe@example.com" }, { // record 2 "firstName": "Jane", // header 1, value "lastName": "Smith", "email": "jane.smith@example.com" }, ], "orders": [ // sheet 2 { // record 1 "id": "ORD123456", // header 1, value "amount": 250 // header 2, value }, ] }Resulting UI somewhat supports this, but a nice next step would be to allow for a multisheet import in one step through the UI, or at the very least not require one to go back to the files list to reimport the second sheet (load data from sheet

contacts-> import data tocontacts& load data from sheetorder-> import data toorderall in one step):As part of this effort, we surfaced this issue that has now been resolved: #695