Skip to content

importer: rss feed#641

Merged
bangarang merged 9 commits intomainfrom
feat/RSSFeedImporter
Oct 10, 2024
Merged

importer: rss feed#641
bangarang merged 9 commits intomainfrom
feat/RSSFeedImporter

Conversation

@bangarang
Copy link
Collaborator

  • chore: add /validators folder
  • koala: initial commit

Please explain how to summarize this PR for the Changelog:

Tell code reviewer how and what to test:

@carlbrugger carlbrugger force-pushed the feat/RSSFeedImporter branch from c63314b to 222b4ec Compare October 2, 2024 15:54
@carlbrugger carlbrugger force-pushed the feat/RSSFeedImporter branch from 723ffb0 to 1486a7c Compare October 9, 2024 19:46
@bangarang bangarang marked this pull request as ready for review October 9, 2024 20:40
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 9, 2024

Walkthrough

The changes in this pull request introduce new functionality to the index.ts file in the flatfilers/sandbox/src directory, incorporating RSS feed imports and configuring a workbook with two sheets. The rssImport function is added to facilitate the import of RSS feed data, while the configureSpace function sets up the workbook structure. Additionally, new documentation is provided for the @flatfile/plugin-import-rss plugin, along with Jest and Rollup configuration files for testing and building. Overall, the changes enhance the listener's capabilities and provide necessary setup for importing RSS feeds.

Changes

File Change Summary
flatfilers/sandbox/src/index.ts Added imports for rssImport and configureSpace, configured two RSS feeds, and set up a workbook with two sheets. Exported rssImport.
import/rss/README.MD Introduced documentation for the @flatfile/plugin-import-rss plugin, detailing its functionality and usage.
import/rss/jest.config.js Created a Jest configuration file for testing, specifying environment and transformation rules.
import/rss/rollup.config.mjs Added Rollup configuration file to manage the build process.
import/rss/src/import.rss.plugin.ts Introduced functionality for importing RSS feed data, including RSSImportConfig interface and rssImport function.
import/rss/src/import.rss.utils.ts Added functions for parsing RSS feeds and mapping data to Flatfile sheets.
plugins/markdown-extractor/jest.config.cjs Created a Jest configuration file for the markdown extractor plugin, specifying test environment and setup files.

Possibly related PRs

  • validate: isbn #618: The main PR introduces new functionality for importing RSS feeds, which may relate to the event handling improvements in this PR that enhance the Flatfile API's event management and listener capabilities.
  • validator: phone #619: The changes in the main PR regarding the configuration of sheets and fields for RSS feeds could connect with the modifications made in this PR that involve updating the structure of sheets and fields for contact information.
  • validator: sentiment analyzer #626: The main PR's enhancements to the listener's capabilities for importing structured data could be related to the sentiment analysis functionality introduced in this PR, which also processes and analyzes data within a Flatfile context.
  • validator: string #650: The main PR's focus on structured data handling and validation aligns with the string validation functionalities introduced in this PR, which also emphasizes data integrity and processing.
  • feat: run unit tests in parallel and e2e tests in sequence #664: The main PR's improvements to the testing framework may relate to the changes in this PR that enhance the testing process by allowing unit tests to run in parallel, which could improve the overall testing strategy for the new functionalities introduced in the main PR.

Suggested reviewers

  • carlbrugger

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🧹 Outside diff range and nitpick comments (8)
import/rss/jest.config.js (2)

7-11: Consider using absolute paths for setup files.

The setup configuration looks good, with dotenv config loaded before the environment and other setups after. However, the relative paths might be fragile if the project structure changes.

Consider using path.resolve(__dirname, '..', '..') to create absolute paths, which would be more robust:

const path = require('path');
const rootDir = path.resolve(__dirname, '..', '..');

module.exports = {
  // ...
  setupFiles: [path.join(rootDir, 'test', 'dotenv-config.js')],
  setupFilesAfterEnv: [
    path.join(rootDir, 'test', 'betterConsoleLog.js'),
    path.join(rootDir, 'test', 'unit.cleanup.js'),
  ],
  // ...
}

1-16: Summary of Jest configuration review

The Jest configuration file is generally well-structured for testing an RSS import plugin. However, there are a few areas that could be improved:

  1. Consider using absolute paths for setup files and global setup to improve robustness against project structure changes.
  2. The 60-second test timeout seems long for unit tests. Clarify if this is necessary or if it can be reduced.
  3. Reconsider the use of forceExit: true and passWithNoTests: true as they may mask potential issues in the test suite.

Addressing these points will lead to a more robust and maintainable test configuration.

import/rss/README.MD (3)

13-24: Consider improving the formatting of the feeds array structure.

The introduction and parameters are well-explained. However, the structure of the feeds array could be formatted more clearly for better readability. Consider using a nested list format to highlight the properties of each feed configuration.

Here's a suggested improvement for the feeds array description:

- `feeds` (array): An array of feed configurations, each containing:
  - `sheetSlug` (string): The slug of the sheet to import RSS data into.
  - `rssFeedUrl` (string): The URL of the RSS feed to import data from.

This format makes it easier for users to understand the structure of each feed configuration object.


26-100: LGTM: Usage and examples are clear and comprehensive.

The usage section provides clear installation instructions and well-structured code examples in both JavaScript and TypeScript. The additional example with multiple RSS feeds is particularly helpful in demonstrating the plugin's flexibility.

Consider adding a brief comment in the multiple RSS feeds example to explain the purpose of each feed configuration. This would provide more context for users who might be adapting the example to their specific use case. For instance:

feeds: [
  {
    sheetSlug: "tech_news",
    rssFeedUrl: "https://techcrunch.com/feed/"
    // Import technology news from TechCrunch
  },
  {
    sheetSlug: "world_news",
    rssFeedUrl: "https://rss.nytimes.com/services/xml/rss/nyt/World.xml"
    // Import world news from New York Times
  }
]

112-122: LGTM: Contributing guidelines are clear and encourage community involvement.

The Contributing section provides clear steps for contributing to the plugin and follows standard open-source practices. This encourages community involvement in the plugin's development.

Consider adding a link to a more detailed contribution guide or code of conduct if one exists for the project. If not, you might want to create one. This could include more specific information about coding standards, commit message formats, and the review process. For example:

For more detailed information on contributing, please refer to our [Contribution Guidelines](link-to-contribution-guide.md).

This addition would provide contributors with a more comprehensive resource for participating in the project's development.

import/rss/src/import.rss.plugin.ts (2)

29-32: Incrementally update job progress to reflect the current state

Currently, the job progress is set to 10% at the start but isn't updated throughout the job's execution. Incrementally updating the progress can provide better visibility into the job's status, especially for longer-running tasks.

Consider updating the progress after significant steps:

 await api.jobs.ack(jobId, {
   info: 'Starting job to import RSS feed data',
   progress: 10,
 })

+// After parsing the RSS feed
+await api.jobs.update(jobId, {
+  info: 'RSS feed parsed successfully',
+  progress: 50,
+})
 
 const records = await parseRSSFeed(sheetConfig!.rssFeedUrl)
 await mapToSheetColumns(sheetId, records)

+// After mapping records to sheet columns
+await api.jobs.update(jobId, {
+  info: 'Data mapped to sheet columns',
+  progress: 90,
+})

20-20: Specify event type for better type safety

Provide a specific type for the event parameter to enhance type safety and code clarity.

 async (event: FlatfileEvent) => {
+  // If there's a more specific event type available, use it here

Consider using a more specific event type if available, such as JobReadyEvent.

import/rss/src/import.rss.utils.ts (1)

18-22: Avoid exposing sensitive information in error logs.

When logging errors, including error.message might expose sensitive information. Consider sanitizing error messages or logging only necessary details to prevent potential information leakage.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between a8e7ca9 and 72bcdad.

⛔ Files ignored due to path filters (4)
  • import/rss/metadata.json is excluded by !**/*.json
  • import/rss/package.json is excluded by !**/*.json
  • package-lock.json is excluded by !**/package-lock.json, !**/*.json
  • package.json is excluded by !**/*.json
📒 Files selected for processing (7)
  • flatfilers/sandbox/src/index.ts (1 hunks)
  • import/rss/README.MD (1 hunks)
  • import/rss/jest.config.js (1 hunks)
  • import/rss/rollup.config.mjs (1 hunks)
  • import/rss/src/import.rss.plugin.ts (1 hunks)
  • import/rss/src/import.rss.utils.ts (1 hunks)
  • import/rss/src/index.ts (1 hunks)
✅ Files skipped from review due to trivial changes (2)
  • import/rss/rollup.config.mjs
  • import/rss/src/index.ts
🧰 Additional context used
🔇 Additional comments (10)
import/rss/jest.config.js (3)

2-2: LGTM: Appropriate test environment.

Setting testEnvironment to 'node' is correct for a server-side JavaScript project like an RSS importer.


4-6: LGTM: Correct TypeScript transformation setup.

The transform configuration correctly uses 'ts-jest' for TypeScript files, which is essential for testing TypeScript projects.


12-12: Clarify the need for a 60-second test timeout.

The 60-second test timeout is quite long for unit tests. While it might be necessary for time-consuming RSS import operations, it could potentially hide performance issues.

Could you provide more context on why this extended timeout is needed? If it's for specific long-running tests, consider using Jest's test.timeout() for those tests instead of a global setting.

import/rss/README.MD (3)

1-11: LGTM: Info card is concise and informative.

The info card provides a clear and concise overview of the plugin's functionality, event type, and supported RSS feed items. It effectively communicates the key features of the plugin to potential users.


102-110: LGTM: API Calls section is informative and concise.

The API Calls section provides a clear and concise list of the Flatfile API calls used by the plugin. This information is valuable for developers who need to understand the plugin's interactions with the Flatfile API.


1-122: Overall, excellent documentation for the RSS import plugin.

This README provides comprehensive and well-structured documentation for the @flatfile/plugin-import-rss plugin. It covers all necessary aspects including an overview, installation, usage examples, API calls, and contribution guidelines. The content is informative and user-friendly, making it easy for developers to understand and implement the plugin.

Great job on creating this documentation! It will greatly assist users in understanding and utilizing the RSS import functionality in their Flatfile projects.

import/rss/src/import.rss.plugin.ts (3)

34-34: Handle potential errors from parseRSSFeed

If parseRSSFeed throws an error, it will be caught by the catch block, but any specific information about parsing errors will be lost. Consider adding error handling around parseRSSFeed to provide more detailed error messages.

 const records = await parseRSSFeed(sheetConfig!.rssFeedUrl)
+if (!records || records.length === 0) {
+  throw new Error('No records found in the RSS feed')
+}

Alternatively, ensure that parseRSSFeed provides meaningful error messages.


18-19: Clarify the event filter criteria

The event filter { job: \sheet:${config.operation}` }` assumes that the job name follows this specific pattern. Ensure that this aligns with how jobs are named elsewhere in the application.

Confirm that jobs are named following the sheet:operation convention.


3-3: Ensure utility functions are correctly imported

Verify that the utility functions mapToSheetColumns and parseRSSFeed are correctly implemented and imported from './import.rss.utils'.

Run the following script to confirm the existence of these functions:

flatfilers/sandbox/src/index.ts (1)

1-116: Overall implementation looks good

The code correctly integrates RSS feed imports and configures the workbook and sheets as intended.

'../../test/unit.cleanup.js',
],
testTimeout: 60_000,
globalSetup: '../../test/setup-global.js',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use absolute path for globalSetup.

For consistency with the earlier suggestion and to improve robustness, consider using an absolute path for the globalSetup file.

Update the globalSetup configuration to use an absolute path:

const path = require('path');
const rootDir = path.resolve(__dirname, '..', '..');

module.exports = {
  // ...
  globalSetup: path.join(rootDir, 'test', 'setup-global.js'),
  // ...
}

Comment on lines 14 to 15
forceExit: true,
passWithNoTests: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Reconsider forceExit and passWithNoTests settings.

Both forceExit: true and passWithNoTests: true can potentially mask issues in your test suite:

  1. forceExit: true forces Jest to exit after all tests complete. While sometimes necessary, it can hide cleanup issues in your tests.
  2. passWithNoTests: true allows the test suite to pass when no tests are found, which might hide accidental test skipping or non-detection.

Consider the following alternatives:

  1. Instead of forceExit: true, identify and fix any hanging handles in your tests. This ensures proper cleanup and can reveal hidden issues.
  2. For passWithNoTests, it's often better to fail when no tests are found. This catches configuration issues early. If you need this during initial setup, consider making it a CLI flag instead of a permanent configuration.

Example implementation:

module.exports = {
  // ...
  forceExit: false, // Remove this line entirely if not needed
  passWithNoTests: process.env.CI !== 'true', // Only pass with no tests in non-CI environments
  // ...
}

This setup will fail in CI if no tests are found, helping to catch configuration issues early.

import type { FlatfileEvent, FlatfileListener } from '@flatfile/listener'
import { mapToSheetColumns, parseRSSFeed } from './import.rss.utils'

const api = new FlatfileClient()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider instantiating FlatfileClient within the function for better encapsulation

While creating a single instance of FlatfileClient at the module level works, instantiating it within the rssImport function can enhance encapsulation and prevent potential side effects if multiple instances of this plugin are used concurrently.

-const api = new FlatfileClient()
 
 export function rssImport(config: RSSImportConfig) {
+  const api = new FlatfileClient()
   return (listener: FlatfileListener) => {

Committable suggestion was skipped due to low confidence.

Comment on lines +25 to +27
const sheetConfig = config.feeds.find(
(c) => c.sheetSlug === sheet.slug
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle the case when sheetConfig is undefined

Currently, the code assumes that a matching sheetConfig will always be found. If sheetConfig is undefined, accessing sheetConfig!.rssFeedUrl will cause a runtime error. It's safer to add a check to handle the scenario when no matching configuration is found.

Apply this diff to handle the undefined sheetConfig:

 const sheetConfig = config.feeds.find(
   (c) => c.sheetSlug === sheet.slug
 )

+if (!sheetConfig) {
+  throw new Error(`No RSS feed configuration found for sheet slug: ${sheet.slug}`)
+}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const sheetConfig = config.feeds.find(
(c) => c.sheetSlug === sheet.slug
)
const sheetConfig = config.feeds.find(
(c) => c.sheetSlug === sheet.slug
)
if (!sheetConfig) {
throw new Error(`No RSS feed configuration found for sheet slug: ${sheet.slug}`)
}

Comment on lines +41 to +45
await api.jobs.fail(jobId, {
outcome: {
message: `Failed to import RSS feed data: ${error.message}`,
},
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ensure error.message is available

The usage of error.message assumes that the caught error has a message property, which may not always be the case. To ensure that the error information is correctly conveyed, you might want to handle different types of errors or use String(error) as a fallback.

Apply this diff to improve error handling:

 await api.jobs.fail(jobId, {
   outcome: {
-    message: `Failed to import RSS feed data: ${error.message}`,
+    message: `Failed to import RSS feed data: ${error instanceof Error ? error.message : String(error)}`,
   },
 })
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
await api.jobs.fail(jobId, {
outcome: {
message: `Failed to import RSS feed data: ${error.message}`,
},
})
await api.jobs.fail(jobId, {
outcome: {
message: `Failed to import RSS feed data: ${error instanceof Error ? error.message : String(error)}`,
},
})

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
plugins/markdown-extractor/jest.config.cjs (1)

7-11: LGTM: Comprehensive setup files configuration.

The setup files configuration looks good, including both pre-environment and post-environment setups. The use of shared configuration files (../../) is a good practice for maintaining consistency across plugins.

Consider using path aliases or a helper function to import these shared configurations, which could make the paths more maintainable if the directory structure changes in the future.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 72bcdad and 889a407.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json, !**/*.json
📒 Files selected for processing (1)
  • plugins/markdown-extractor/jest.config.cjs (1 hunks)
🧰 Additional context used
🔇 Additional comments (6)
plugins/markdown-extractor/jest.config.cjs (6)

2-2: LGTM: Appropriate test environment setting.

Setting the test environment to 'node' is correct for a Node.js project.


4-6: LGTM: Correct TypeScript transformation setup.

The transform configuration correctly uses ts-jest for TypeScript files, which is the standard approach for Jest with TypeScript projects.


13-13: LGTM: Global setup configuration.

The global setup file is correctly specified. This is a good practice for performing one-time setup operations that are needed across all tests.


14-15: Use forceExit and passWithNoTests settings judiciously.

The forceExit: true and passWithNoTests: true configurations are set. While these can be useful in certain scenarios, they should be used cautiously:

  1. forceExit: true can mask issues with asynchronous operations not being properly cleaned up.
  2. passWithNoTests: true might hide the fact that expected tests are missing.

To ensure these settings are necessary and not masking any issues:

  1. For forceExit, try running tests without this option and address any hanging handles:

  2. For passWithNoTests, verify that tests exist for all expected functionality:

If these commands reveal any issues, consider addressing them rather than relying on these configuration options.


12-12: Verify the necessity of the 60-second test timeout.

The 60-second test timeout is set, which is quite long for typical unit tests. While this may be necessary for some complex operations, it's worth verifying if all tests require this extended timeout.

To help determine if this timeout is necessary, you can run the following script to check the duration of your tests:

This will show the time taken for each test. If most tests complete well under 60 seconds, consider reducing the timeout or applying longer timeouts only to specific tests that need them.


1-16: Overall, a well-structured Jest configuration with some points to verify.

This Jest configuration file is comprehensive and follows good practices for a TypeScript project. The use of shared configuration files promotes consistency across plugins. However, there are a few points that would benefit from verification:

  1. The necessity of the 60-second test timeout.
  2. The use of forceExit: true, which might mask cleanup issues.
  3. The use of passWithNoTests: true, which could hide missing tests.

Consider running the verification scripts provided in the previous comments to ensure these settings are necessary for your specific use case. Additionally, using path aliases for shared configurations could improve maintainability.

To get an overview of how this configuration is used, you can run:

This will help you understand the structure of your tests and verify if the current configuration aligns well with your testing needs.

@bangarang bangarang enabled auto-merge (squash) October 10, 2024 12:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants