Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 2 additions & 42 deletions flatfilers/sandbox/src/index.ts
Original file line number Diff line number Diff line change
@@ -1,46 +1,6 @@
import type { FlatfileListener } from '@flatfile/listener'
import { summarize } from '@flatfile/plugin-enrich-summarize'
import { configureSpace } from '@flatfile/plugin-space-configure'
import { MarkdownExtractor } from '@flatfile/plugin-markdown-extractor'

export default async function (listener: FlatfileListener) {
listener.use(
summarize({
sheetSlug: 'summarization',
contentField: 'content',
summaryField: 'summary',
keyPhrasesField: 'keyPhrases',
})
)
listener.use(
configureSpace({
workbooks: [
{
name: 'Sandbox',
sheets: [
{
name: 'Summarization',
slug: 'summarization',
fields: [
{
key: 'content',
type: 'string',
label: 'Content',
},
{
key: 'summary',
type: 'string',
label: 'Summary',
},
{
key: 'keyPhrases',
type: 'string',
label: 'Key Phrases',
},
],
},
],
},
],
})
)
listener.use(MarkdownExtractor())
}
19 changes: 19 additions & 0 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

69 changes: 69 additions & 0 deletions plugins/markdown-extractor/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
<!-- START_INFOCARD -->

The `@flatfile/plugin-markdown-extractor` plugin parses Markdown files and extracts tabular data, creating sheets in Flatfile for each table found.

**Event Type:**
`listener.on('file:created')`

**Supported file types:**
`.md`

<!-- END_INFOCARD -->

> When embedding Flatfile, this plugin should be deployed in a server-side listener. [Learn more](/docs/orchestration/listeners#listener-types)

## Parameters



#### `options.maxTables` - `default: Infinity` - `number` - (optional)
The `maxTables` parameter allows you to limit the number of tables extracted from a single Markdown file.

#### `options.errorHandling` - `default: "lenient"` - `"strict" | "lenient"` - (optional)
The `errorHandling` parameter determines how the plugin handles parsing errors. In 'strict' mode, it throws errors, while in 'lenient' mode, it logs warnings and skips problematic tables.

#### `options.debug` - `default: false` - `boolean` - (optional)
The `debug` parameter enables additional logging for troubleshooting.

## Usage

Listen for a Markdown file to be uploaded to Flatfile. The platform will then extract the file automatically. Once complete, the file will be ready for import in the Files area.

```bash Install
npm i @flatfile/plugin-markdown-extractor
```

```js import
import { MarkdownExtractor } from "@flatfile/plugin-markdown-extractor";
```

```js listener.js
listener.use(MarkdownExtractor());
```

### Full Example

In this example, the `MarkdownExtractor` is initialized with custom options, and then registered as middleware with the Flatfile listener. When a Markdown file is uploaded, the plugin will extract the tabular data and process it using the extractor's parser.

```javascript
import { MarkdownExtractor } from "@flatfile/plugin-markdown-extractor";

export default async function (listener) {
// Define optional options for the extractor
const options = {
maxTables: 5,
errorHandling: 'strict',
debug: true
};

// Initialize the Markdown extractor
const markdownExtractor = MarkdownExtractor(options);

// Register the extractor as a middleware for the Flatfile listener
listener.use(markdownExtractor);

// When a Markdown file is uploaded, the tabular data will be extracted and processed using the extractor's parser.
}
```

This plugin will create a new sheet for each table found in the Markdown file, with the table headers becoming field names and the rows becoming records.
16 changes: 16 additions & 0 deletions plugins/markdown-extractor/jest.config.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
module.exports = {
testEnvironment: 'node',

transform: {
'^.+\\.tsx?$': 'ts-jest',
},
setupFiles: ['../../test/dotenv-config.js'],
setupFilesAfterEnv: [
'../../test/betterConsoleLog.js',
'../../test/unit.cleanup.js',
],
testTimeout: 60_000,
globalSetup: '../../test/setup-global.js',
forceExit: true,
passWithNoTests: true,
}
62 changes: 62 additions & 0 deletions plugins/markdown-extractor/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
{
"name": "@flatfile/plugin-markdown-extractor",
"version": "0.0.1",
"url": "https://github.com/FlatFilers/flatfile-plugins/tree/main/plugins/markdown-extractor",
"description": "A plugin for parsing markdown files in Flatfile.",
"registryMetadata": {
"category": "extractors"
},
"engines": {
"node": ">= 16"
},
"type": "module",
"browser": {
"./dist/index.cjs": "./dist/index.browser.cjs",
"./dist/index.mjs": "./dist/index.browser.mjs"
},
"exports": {
"types": "./dist/index.d.ts",
"node": {
"import": "./dist/index.mjs",
"require": "./dist/index.cjs"
},
"browser": {
"require": "./dist/index.browser.cjs",
"import": "./dist/index.browser.mjs"
},
"default": "./dist/index.mjs"
},
"main": "./dist/index.cjs",
"module": "./dist/index.mjs",
"source": "./src/index.ts",
"types": "./dist/index.d.ts",
"files": [
"dist/**"
],
"scripts": {
"build": "rollup -c",
"build:watch": "rollup -c --watch",
"build:prod": "NODE_ENV=production rollup -c",
"check": "tsc ./**/*.ts --noEmit --esModuleInterop",
"test": "jest src/*.spec.ts --detectOpenHandles",
"test:unit": "jest src/*.spec.ts --testPathIgnorePatterns=.*\\.e2e\\.spec\\.ts$ --detectOpenHandles",
"test:e2e": "jest src/*.e2e.spec.ts --detectOpenHandles"
},
"keywords": [
"flatfile-plugins",
"category-extractors"
],
"author": "FlatFilers",
"repository": {
"type": "git",
"url": "https://github.com/FlatFilers/flatfile-plugins.git",
"directory": "plugins/markdown-extractor"
},
"license": "ISC",
"dependencies": {
"@flatfile/util-extractor": "^2.1.2"
},
"devDependencies": {
"@flatfile/rollup-config": "0.1.1"
}
}
5 changes: 5 additions & 0 deletions plugins/markdown-extractor/rollup.config.mjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
import { buildConfig } from '@flatfile/rollup-config'

const config = buildConfig({})

export default config
21 changes: 21 additions & 0 deletions plugins/markdown-extractor/samples/complex_table.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Complex Table Example

This Markdown file contains a more complex table with various data types and potential parsing challenges.

| Product | Price | Stock | Last Updated | Features | On Sale |
|---------|-------|-------|--------------|----------|--------|
| Laptop | $999.99 | 50 | 2023-05-01 | 15" screen, 16GB RAM | true |
| Smartphone | $599.99 | 100 | 2023-05-02 | 6.5" display, 128GB storage | false |
| Tablet | $399.99 | 75 | 2023-05-03 | 10" screen, 64GB storage | true |
| Headphones | $149.99 | 200 | 2023-05-04 | Noise-cancelling, Bluetooth 5.0 | false |
| Smart Watch | $249.99 | 30 | 2023-05-05 | Heart rate monitor, GPS | true |
| External SSD | $89.99 | 150 | 2023-05-06 | 1TB, USB 3.1 | false |

This table includes:
- Currency values
- Integers
- Dates
- Booleans
- Strings with commas

It should test the parser's ability to handle various data types and potential edge cases.
28 changes: 28 additions & 0 deletions plugins/markdown-extractor/samples/lenient_tables.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Lenient Tables Example

This Markdown file contains multiple tables with mismatched column counts.

## Table 1: Employees

| ID | Name | Department |
|----|------|------------|
| 1 | John Doe | HR |
| 2 | Jane Smith |
| 3 | Mike Johnson | Finance |

## Table 2: Projects

| Project Name | Start Date | End Date |
|--------------|------------|----------|
| Website Redesign | 2023-01-01 | 2023-06-30 |
| Mobile App | 2023-03-15 | 2023-12-31 | extra column |

## Table 3: Budget

| Category | Q1 | Q2 | Q3 | Q4 |
|----------|----|----|----|----|----|
| Marketing | $10,000 | $15,000 | $20,000 | $25,000 |
| R&D | $50,000 | $60,000 | $70,000 | $80,000 | extra column |
| Operations | $100,000 | $110,000 | $120,000 | $130,000 |

End of the file.
28 changes: 28 additions & 0 deletions plugins/markdown-extractor/samples/multiple_tables.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Multiple Tables Example

This Markdown file contains multiple tables.

## Table 1: Employees

| ID | Name | Department |
|----|------|------------|
| 1 | John Doe | HR |
| 2 | Jane Smith | IT |
| 3 | Mike Johnson | Finance |

## Table 2: Projects

| Project Name | Start Date | End Date |
|--------------|------------|----------|
| Website Redesign | 2023-01-01 | 2023-06-30 |
| Mobile App | 2023-03-15 | 2023-12-31 |

## Table 3: Budget

| Category | Q1 | Q2 | Q3 | Q4 |
|----------|----|----|----|----|
| Marketing | $10,000 | $15,000 | $20,000 | $25,000 |
| R&D | $50,000 | $60,000 | $70,000 | $80,000 |
| Operations | $100,000 | $110,000 | $120,000 | $130,000 |

End of the file.
11 changes: 11 additions & 0 deletions plugins/markdown-extractor/samples/simple_table.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Simple Table Example

This is a simple Markdown file with a single table.

| Name | Age | City |
|------|-----|------|
| John | 30 | New York |
| Alice | 25 | London |
| Bob | 35 | Paris |

End of the file.
14 changes: 14 additions & 0 deletions plugins/markdown-extractor/src/index.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
import { Extractor } from '@flatfile/util-extractor'
import { parseBuffer } from './parser'

export interface MarkdownExtractorOptions {
maxTables?: number
errorHandling?: 'strict' | 'lenient'
debug?: boolean
}

export const MarkdownExtractor = (options: MarkdownExtractorOptions = {}) => {
return Extractor('.md', 'markdown', parseBuffer, options)
}

export const markdownParser = parseBuffer
Loading