Extract the main article content from any URL using Mozilla Readability and JSDOM — available as a CLI and a minimal web UI.
- Clean article extraction (removes ads/nav/sidebar cruft)
- Outputs title, optional byline, plain text, and original HTML
- CLI with
--jsonand--htmlmodes - Simple Express web UI to paste a URL and view the result
- Node.js ≥ 18 (fetch, WHATWG APIs)
npm cito install exact dependencies
- Dev (TypeScript via tsx):
npm run dev -- <url> [--json] [--html]
- Build + run:
npm run buildnpm start -- <url> [--json] [--html]
- Installed binary (after build):
pure-article <url> [--json] [--html]
Examples:
- Text:
npm run dev -- https://example.com/article - JSON:
npm run dev -- https://example.com/article --json - HTML fragment:
npm run dev -- https://example.com/article --html
- Dev server:
npm run webthen openhttp://localhost:3000 - Change port:
PORT=3001 npm run webor pick a random free port withPORT=0 npm run web - Built server:
npm run buildthennpm run web:start
import { extractArticle } from './src/index.js';
const article = await extractArticle('https://example.com/post', {
userAgent: 'MyBot/1.0',
timeoutMs: 15000,
});
console.log(article.title);
console.log(article.byline);
console.log(article.contentText);
// article.contentHtml contains the Readability HTML fragmentReturned shape:
url: stringtitle: stringbyline?: stringcontentText: string(plain text, paragraphs preserved)contentHtml?: string(original Readability HTML)excerpt?: string | null,length?: number | null,siteName?: string | null
npm run dev— CLI in watch/dev modenpm run web— start web UI in dev (setPORTas needed)npm run build— type‑check and compile todist/npm start— run compiled CLI (node dist/cli.js)npm run web:start— run compiled web server (node dist/server.js)npm test— run Vitestnpm run lint/npm run format— ESLint / Prettier
src/— TypeScript source (CLI, server, extractor)tests/— Vitest specsdist/— compiled JS output (generated)
- Extraction quality depends on page markup; some sites may not parse perfectly.
- Respect target sites’ Terms of Service and robots policies. Use responsibly.
- Network timeouts and user‑agent can be adjusted via
ExtractOptions.
- Install deps:
npm ci - Lint/format:
npm run lint/npm run format - Tests:
npm test(add-- --coveragefor coverage)