diff --git a/.gitignore b/.gitignore
index 265f17f..e69de29 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1 +0,0 @@
-DS_Store
diff --git a/api-reference/endpoint/smartcrawler/start.mdx b/api-reference/endpoint/smartcrawler/start.mdx
index cee1feb..c28c7a6 100644
--- a/api-reference/endpoint/smartcrawler/start.mdx
+++ b/api-reference/endpoint/smartcrawler/start.mdx
@@ -229,7 +229,7 @@ sha256={your_webhook_secret}
To verify that a webhook request is authentic:
-1. Retrieve your webhook secret from the [dashboard](https://scrapegraphai.com/dashboard)
+1. Retrieve your webhook secret from the [dashboard](https://dashboard.scrapegraphai.com)
2. Compare the `X-Webhook-Signature` header value with `sha256={your_secret}`
@@ -305,5 +305,5 @@ The webhook POST request contains the following JSON payload:
| result | string | The crawl result data (null if failed) |
-Make sure to configure your webhook secret in the [dashboard](https://scrapegraphai.com/dashboard) before using webhooks. Each user has a unique webhook secret for secure verification.
+Make sure to configure your webhook secret in the [dashboard](https://dashboard.scrapegraphai.com) before using webhooks. Each user has a unique webhook secret for secure verification.
diff --git a/api-reference/errors.mdx b/api-reference/errors.mdx
index 2a2c0cd..5933a08 100644
--- a/api-reference/errors.mdx
+++ b/api-reference/errors.mdx
@@ -139,17 +139,17 @@ except APIError as e:
```
```javascript JavaScript
-import { extract } from 'scrapegraph-js';
+import { smartScraper } from 'scrapegraph-js';
-const result = await extract('your-api-key', {
- url: 'https://example.com',
- prompt: 'Extract data',
+const apiKey = 'your-api-key';
+
+const response = await smartScraper(apiKey, {
+ website_url: 'https://example.com',
+ user_prompt: 'Extract data',
});
-if (result.status === 'success') {
- console.log('Data:', result.data);
-} else {
- console.error('Error:', result.error);
+if (response.status === 'error') {
+ console.error('Error:', response.error);
}
```
diff --git a/api-reference/introduction.mdx b/api-reference/introduction.mdx
index 872bcb1..bee4eb5 100644
--- a/api-reference/introduction.mdx
+++ b/api-reference/introduction.mdx
@@ -9,7 +9,7 @@ The ScrapeGraphAI API provides powerful endpoints for AI-powered web scraping an
## Authentication
-All API requests require authentication using an API key. You can get your API key from the [dashboard](https://scrapegraphai.com/dashboard).
+All API requests require authentication using an API key. You can get your API key from the [dashboard](https://dashboard.scrapegraphai.com).
```bash
SGAI-APIKEY: your-api-key-here
diff --git a/cookbook/examples/pagination.mdx b/cookbook/examples/pagination.mdx
index de5710f..e078401 100644
--- a/cookbook/examples/pagination.mdx
+++ b/cookbook/examples/pagination.mdx
@@ -349,16 +349,21 @@ if __name__ == "__main__":
## JavaScript SDK Example
```javascript
-import { extract } from 'scrapegraph-js';
+import { smartScraper } from 'scrapegraph-js';
import 'dotenv/config';
-const result = await extract(process.env.SGAI_APIKEY, {
- url: 'https://www.amazon.in/s?k=tv&crid=1TEF1ZFVLU8R8&sprefix=t%2Caps%2C390&ref=nb_sb_noss_2',
- prompt: 'Extract all product info including name, price, rating, and image_url',
+const apiKey = process.env.SGAI_APIKEY;
+
+const response = await smartScraper(apiKey, {
+ website_url: 'https://www.amazon.in/s?k=tv&crid=1TEF1ZFVLU8R8&sprefix=t%2Caps%2C390&ref=nb_sb_noss_2',
+ user_prompt: 'Extract all product info including name, price, rating, and image_url',
+ total_pages: 3,
});
-if (result.status === 'success') {
- console.log('Response:', JSON.stringify(result.data?.json, null, 2));
+if (response.status === 'error') {
+ console.error('Error:', response.error);
+} else {
+ console.log('Response:', JSON.stringify(response.data, null, 2));
}
```
diff --git a/cookbook/introduction.mdx b/cookbook/introduction.mdx
index ca0c3e8..b888df7 100644
--- a/cookbook/introduction.mdx
+++ b/cookbook/introduction.mdx
@@ -87,7 +87,7 @@ Each example is available in multiple implementations:
4. Experiment and adapt the code for your needs
-Make sure to have your ScrapeGraphAI API key ready. Get one from the [dashboard](https://scrapegraphai.com/dashboard) if you haven't already.
+Make sure to have your ScrapeGraphAI API key ready. Get one from the [dashboard](https://dashboard.scrapegraphai.com) if you haven't already.
## Additional Resources
diff --git a/dashboard/overview.mdx b/dashboard/overview.mdx
index 4c26957..df173d8 100644
--- a/dashboard/overview.mdx
+++ b/dashboard/overview.mdx
@@ -19,6 +19,21 @@ The ScrapeGraphAI dashboard is your central hub for managing all your web scrapi
- **Last Used**: Timestamp of your most recent API request
- **Quick Actions**: Buttons to start new scraping jobs or access common features
+## Usage Analytics
+
+Track your API usage patterns with our detailed analytics view:
+
+
+
+
+
+The usage graph provides:
+- **Service-specific metrics**: Track usage for SmartScraper, SearchScraper, and Markdownify separately
+- **Time-based analysis**: View usage patterns over different time periods
+- **Interactive tooltips**: Hover over data points to see detailed information
+- **Trend analysis**: Identify usage patterns and optimize your API consumption
+
+
## Key Features
- **Usage Statistics**: Monitor your API usage and remaining credits
@@ -28,7 +43,7 @@ The ScrapeGraphAI dashboard is your central hub for managing all your web scrapi
## Getting Started
-1. Log in to your [dashboard](https://scrapegraphai.com/dashboard)
+1. Log in to your [dashboard](https://dashboard.scrapegraphai.com)
2. View your API key in the settings section
3. Check your available credits
4. Start your first scraping job
diff --git a/developer-guides/llm-sdks-and-frameworks/anthropic.mdx b/developer-guides/llm-sdks-and-frameworks/anthropic.mdx
index 2f5e767..f01b8e5 100644
--- a/developer-guides/llm-sdks-and-frameworks/anthropic.mdx
+++ b/developer-guides/llm-sdks-and-frameworks/anthropic.mdx
@@ -27,24 +27,24 @@ If using Node < 20, install `dotenv` and add `import 'dotenv/config'` to your co
This example demonstrates a simple workflow: scrape a website and summarize the content using Claude.
```typescript
-import { extract } from 'scrapegraph-js';
+import { smartScraper } from 'scrapegraph-js';
import Anthropic from '@anthropic-ai/sdk';
+const apiKey = process.env.SGAI_APIKEY;
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
-const result = await extract(process.env.SGAI_APIKEY!, {
- url: 'https://scrapegraphai.com',
- prompt: 'Extract all content from this page',
+const scrapeResult = await smartScraper(apiKey, {
+ website_url: 'https://scrapegraphai.com',
+ user_prompt: 'Extract all content from this page',
});
-const data = result.data?.json;
-console.log('Scraped content length:', JSON.stringify(data).length);
+console.log('Scraped content length:', JSON.stringify(scrapeResult.data.result).length);
const message = await anthropic.messages.create({
model: 'claude-haiku-4-5',
max_tokens: 1024,
messages: [
- { role: 'user', content: `Summarize in 100 words: ${JSON.stringify(data)}` }
+ { role: 'user', content: `Summarize in 100 words: ${JSON.stringify(scrapeResult.data.result)}` }
]
});
@@ -56,11 +56,12 @@ console.log('Response:', message);
This example shows how to use Claude's tool use feature to let the model decide when to scrape websites based on user requests.
```typescript
-import { extract } from 'scrapegraph-js';
+import { smartScraper } from 'scrapegraph-js';
import { Anthropic } from '@anthropic-ai/sdk';
import { z } from 'zod';
import { zodToJsonSchema } from 'zod-to-json-schema';
+const apiKey = process.env.SGAI_APIKEY;
const anthropic = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY
});
@@ -90,13 +91,12 @@ if (toolUse && toolUse.type === 'tool_use') {
const input = toolUse.input as { url: string };
console.log(`Calling tool: ${toolUse.name} | URL: ${input.url}`);
- const result = await extract(process.env.SGAI_APIKEY!, {
- url: input.url,
- prompt: 'Extract all content from this page',
+ const result = await smartScraper(apiKey, {
+ website_url: input.url,
+ user_prompt: 'Extract all content from this page',
});
- const data = result.data?.json;
- console.log(`Scraped content preview: ${JSON.stringify(data)?.substring(0, 300)}...`);
+ console.log(`Scraped content preview: ${JSON.stringify(result.data.result)?.substring(0, 300)}...`);
// Continue with the conversation or process the scraped content as needed
}
```
@@ -106,10 +106,11 @@ if (toolUse && toolUse.type === 'tool_use') {
This example demonstrates how to use Claude to extract structured data from scraped website content.
```typescript
-import { extract } from 'scrapegraph-js';
+import { smartScraper } from 'scrapegraph-js';
import Anthropic from '@anthropic-ai/sdk';
import { z } from 'zod';
+const apiKey = process.env.SGAI_APIKEY;
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const CompanyInfoSchema = z.object({
@@ -118,11 +119,10 @@ const CompanyInfoSchema = z.object({
description: z.string().optional()
});
-const result = await extract(process.env.SGAI_APIKEY!, {
- url: 'https://stripe.com',
- prompt: 'Extract all content from this page',
+const scrapeResult = await smartScraper(apiKey, {
+ website_url: 'https://stripe.com',
+ user_prompt: 'Extract all content from this page',
});
-const data = result.data?.json;
const prompt = `Extract company information from this website content.
@@ -135,7 +135,7 @@ Output ONLY valid JSON in this exact format (no markdown, no explanation):
}
Website content:
-${JSON.stringify(data)}`;
+${JSON.stringify(scrapeResult.data.result)}`;
const message = await anthropic.messages.create({
model: 'claude-haiku-4-5',
diff --git a/developer-guides/llm-sdks-and-frameworks/gemini.mdx b/developer-guides/llm-sdks-and-frameworks/gemini.mdx
index 663c6a7..0e710c3 100644
--- a/developer-guides/llm-sdks-and-frameworks/gemini.mdx
+++ b/developer-guides/llm-sdks-and-frameworks/gemini.mdx
@@ -27,22 +27,22 @@ If using Node < 20, install `dotenv` and add `import 'dotenv/config'` to your co
This example demonstrates a simple workflow: scrape a website and summarize the content using Gemini.
```typescript
-import { extract } from 'scrapegraph-js';
+import { smartScraper } from 'scrapegraph-js';
import { GoogleGenAI } from '@google/genai';
+const apiKey = process.env.SGAI_APIKEY;
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
-const result = await extract(process.env.SGAI_APIKEY!, {
- url: 'https://scrapegraphai.com',
- prompt: 'Extract all content from this page',
+const scrapeResult = await smartScraper(apiKey, {
+ website_url: 'https://scrapegraphai.com',
+ user_prompt: 'Extract all content from this page',
});
-const data = result.data?.json;
-console.log('Scraped content length:', JSON.stringify(data).length);
+console.log('Scraped content length:', JSON.stringify(scrapeResult.data.result).length);
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
- contents: `Summarize: ${JSON.stringify(data)}`,
+ contents: `Summarize: ${JSON.stringify(scrapeResult.data.result)}`,
});
console.log('Summary:', response.text);
@@ -53,18 +53,18 @@ console.log('Summary:', response.text);
This example shows how to analyze website content using Gemini's multi-turn conversation capabilities.
```typescript
-import { extract } from 'scrapegraph-js';
+import { smartScraper } from 'scrapegraph-js';
import { GoogleGenAI } from '@google/genai';
+const apiKey = process.env.SGAI_APIKEY;
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
-const result = await extract(process.env.SGAI_APIKEY!, {
- url: 'https://news.ycombinator.com/',
- prompt: 'Extract all content from this page',
+const scrapeResult = await smartScraper(apiKey, {
+ website_url: 'https://news.ycombinator.com/',
+ user_prompt: 'Extract all content from this page',
});
-const data = result.data?.json;
-console.log('Scraped content length:', JSON.stringify(data).length);
+console.log('Scraped content length:', JSON.stringify(scrapeResult.data.result).length);
const chat = ai.chats.create({
model: 'gemini-2.5-flash'
@@ -72,7 +72,7 @@ const chat = ai.chats.create({
// Ask for the top 3 stories on Hacker News
const result1 = await chat.sendMessage({
- message: `Based on this website content from Hacker News, what are the top 3 stories right now?\n\n${JSON.stringify(data)}`
+ message: `Based on this website content from Hacker News, what are the top 3 stories right now?\n\n${JSON.stringify(scrapeResult.data.result)}`
});
console.log('Top 3 Stories:', result1.text);
@@ -88,22 +88,22 @@ console.log('4th and 5th Stories:', result2.text);
This example demonstrates how to extract structured data using Gemini's JSON mode from scraped website content.
```typescript
-import { extract } from 'scrapegraph-js';
+import { smartScraper } from 'scrapegraph-js';
import { GoogleGenAI, Type } from '@google/genai';
+const apiKey = process.env.SGAI_APIKEY;
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
-const result = await extract(process.env.SGAI_APIKEY!, {
- url: 'https://stripe.com',
- prompt: 'Extract all content from this page',
+const scrapeResult = await smartScraper(apiKey, {
+ website_url: 'https://stripe.com',
+ user_prompt: 'Extract all content from this page',
});
-const data = result.data?.json;
-console.log('Scraped content length:', JSON.stringify(data).length);
+console.log('Scraped content length:', JSON.stringify(scrapeResult.data.result).length);
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
- contents: `Extract company information: ${JSON.stringify(data)}`,
+ contents: `Extract company information: ${JSON.stringify(scrapeResult.data.result)}`,
config: {
responseMimeType: 'application/json',
responseSchema: {
diff --git a/docs.json b/docs.json
index 68da0db..a15a999 100644
--- a/docs.json
+++ b/docs.json
@@ -3,313 +3,250 @@
"theme": "mint",
"name": "ScrapeGraphAI",
"colors": {
- "primary": "#AC6DFF",
- "light": "#AC6DFF",
- "dark": "#AC6DFF"
+ "primary": "#9333ea",
+ "light": "#9f52eb",
+ "dark": "#1f2937"
},
"favicon": "/favicon.svg",
"navigation": {
- "versions": [
+ "tabs": [
{
- "version": "v2",
- "default": true,
- "tabs": [
+ "tab": "Home",
+ "groups": [
{
- "tab": "Home",
- "groups": [
+ "group": "Get Started",
+ "pages": [
+ "introduction",
+ "install",
{
- "group": "Get Started",
+ "group": "Use Cases",
"pages": [
- "introduction",
- "install",
- "transition-from-v1-to-v2",
- {
- "group": "Use Cases",
- "pages": [
- "use-cases/overview",
- "use-cases/ai-llm",
- "use-cases/lead-generation",
- "use-cases/market-intelligence",
- "use-cases/content-aggregation",
- "use-cases/research-analysis",
- "use-cases/seo-analytics"
- ]
- },
- {
- "group": "Dashboard",
- "pages": [
- "dashboard/overview",
- "dashboard/settings"
- ]
- }
+ "use-cases/overview",
+ "use-cases/ai-llm",
+ "use-cases/lead-generation",
+ "use-cases/market-intelligence",
+ "use-cases/content-aggregation",
+ "use-cases/research-analysis",
+ "use-cases/seo-analytics"
]
},
{
- "group": "Services",
+ "group": "Dashboard",
"pages": [
- "services/scrape",
- "services/extract",
- "services/search",
- "services/crawl",
- "services/monitor",
- {
- "group": "Additional Parameters",
- "pages": [
- "services/additional-parameters/headers",
- "services/additional-parameters/pagination",
- "services/additional-parameters/proxy",
- "services/additional-parameters/wait-ms"
- ]
- }
- ]
- },
- {
- "group": "Official SDKs",
- "pages": [
- "sdks/python",
- "sdks/javascript",
- "sdks/mocking",
- {
- "group": "CLI",
- "icon": "terminal",
- "pages": [
- "services/cli/introduction",
- "services/cli/commands",
- "services/cli/json-mode",
- "services/cli/ai-agent-skill",
- "services/cli/examples"
- ]
- },
- {
- "group": "MCP Server",
- "icon": "/logo/mcp.svg",
- "pages": [
- "services/mcp-server/introduction",
- "services/mcp-server/cursor",
- "services/mcp-server/claude",
- "services/mcp-server/smithery"
- ]
- },
- "services/toonify"
- ]
- },
- {
- "group": "LLM SDKs & Frameworks",
- "pages": [
- "developer-guides/llm-sdks-and-frameworks/gemini",
- "developer-guides/llm-sdks-and-frameworks/anthropic"
- ]
- },
- {
- "group": "Contribute",
- "pages": [
- "contribute/opensource"
+ "dashboard/overview",
+ "dashboard/playground",
+ "dashboard/settings"
]
}
]
},
{
- "tab": "Knowledge Base",
- "groups": [
- {
- "group": "Knowledge Base",
- "pages": [
- "knowledge-base/introduction"
- ]
- },
- {
- "group": "Scraping Tools",
- "pages": [
- "knowledge-base/ai-tools/lovable",
- "knowledge-base/ai-tools/v0",
- "knowledge-base/ai-tools/bolt",
- "knowledge-base/ai-tools/cursor"
- ]
- },
+ "group": "Services",
+ "pages": [
+ "services/smartscraper",
+ "services/searchscraper",
+ "services/markdownify",
+ "services/scrape",
+ "services/smartcrawler",
+ "services/sitemap",
+ "services/agenticscraper",
{
"group": "CLI",
+ "icon": "terminal",
"pages": [
- "knowledge-base/cli/getting-started",
- "knowledge-base/cli/json-mode",
- "knowledge-base/cli/ai-agent-skill",
- "knowledge-base/cli/command-examples"
- ]
- },
- {
- "group": "Troubleshooting",
- "pages": [
- "knowledge-base/troubleshooting/cors-error",
- "knowledge-base/troubleshooting/empty-results",
- "knowledge-base/troubleshooting/rate-limiting",
- "knowledge-base/troubleshooting/timeout-errors"
+ "services/cli/introduction",
+ "services/cli/commands",
+ "services/cli/json-mode",
+ "services/cli/ai-agent-skill",
+ "services/cli/examples"
]
},
{
- "group": "Scraping Guides",
+ "group": "MCP Server",
+ "icon": "/logo/mcp.svg",
"pages": [
- "knowledge-base/scraping/javascript-rendering",
- "knowledge-base/scraping/pagination",
- "knowledge-base/scraping/custom-headers",
- "knowledge-base/scraping/proxy"
+ "services/mcp-server/introduction",
+ "services/mcp-server/cursor",
+ "services/mcp-server/claude",
+ "services/mcp-server/smithery"
]
},
+ "services/toonify",
{
- "group": "Account & Credits",
+ "group": "Additional Parameters",
"pages": [
- "knowledge-base/account/pricing",
- "knowledge-base/account/api-keys",
- "knowledge-base/account/credits",
- "knowledge-base/account/rate-limits"
+ "services/additional-parameters/headers",
+ "services/additional-parameters/pagination",
+ "services/additional-parameters/proxy",
+ "services/additional-parameters/wait-ms"
]
}
]
},
{
- "tab": "Cookbook",
- "groups": [
- {
- "group": "Cookbook",
- "pages": [
- "cookbook/introduction"
- ]
- },
- {
- "group": "Examples",
- "pages": [
- "cookbook/examples/company-info",
- "cookbook/examples/github-trending",
- "cookbook/examples/wired",
- "cookbook/examples/homes",
- "cookbook/examples/research-agent",
- "cookbook/examples/chat-webpage",
- "cookbook/examples/pagination"
- ]
- }
+ "group": "Official SDKs",
+ "pages": [
+ "sdks/python",
+ "sdks/javascript",
+ "sdks/mocking"
]
},
{
- "tab": "API Reference",
- "groups": [
- {
- "group": "API Documentation",
- "pages": [
- "api-reference/introduction",
- "api-reference/errors"
- ]
- },
- {
- "group": "SmartScraper",
- "pages": [
- "api-reference/endpoint/smartscraper/start",
- "api-reference/endpoint/smartscraper/get-status"
- ]
- },
- {
- "group": "SearchScraper",
- "pages": [
- "api-reference/endpoint/searchscraper/start",
- "api-reference/endpoint/searchscraper/get-status"
- ]
- },
- {
- "group": "Markdownify",
- "pages": [
- "api-reference/endpoint/markdownify/start",
- "api-reference/endpoint/markdownify/get-status"
- ]
- },
- {
- "group": "SmartCrawler",
- "pages": [
- "api-reference/endpoint/smartcrawler/start",
- "api-reference/endpoint/smartcrawler/get-status"
- ]
- },
- {
- "group": "Sitemap",
- "pages": [
- "api-reference/endpoint/sitemap/start",
- "api-reference/endpoint/sitemap/get-status"
- ]
- },
- {
- "group": "User",
- "pages": [
- "api-reference/endpoint/user/get-credits",
- "api-reference/endpoint/user/submit-feedback"
- ]
- }
+ "group": "Integrations",
+ "pages": [
+ "integrations/langchain",
+ "integrations/llamaindex",
+ "integrations/crewai",
+ "integrations/agno",
+ "integrations/langflow",
+ "integrations/vercel_ai",
+ "integrations/google-adk",
+ "integrations/x402"
+ ]
+ },
+ {
+ "group": "LLM SDKs & Frameworks",
+ "pages": [
+ "developer-guides/llm-sdks-and-frameworks/gemini",
+ "developer-guides/llm-sdks-and-frameworks/anthropic"
+ ]
+ },
+ {
+ "group": "Contribute",
+ "pages": [
+ "contribute/opensource"
]
}
]
},
{
- "version": "v1",
- "tabs": [
+ "tab": "Knowledge Base",
+ "groups": [
{
- "tab": "Home",
- "groups": [
- {
- "group": "Get Started",
- "pages": [
- "v1/introduction",
- "v1/quickstart"
- ]
- },
- {
- "group": "Services",
- "pages": [
- "v1/smartscraper",
- "v1/searchscraper",
- "v1/markdownify",
- "v1/scrape",
- "v1/smartcrawler",
- "v1/sitemap",
- "v1/agenticscraper",
- {
- "group": "CLI",
- "icon": "terminal",
- "pages": [
- "v1/cli/introduction",
- "v1/cli/commands",
- "v1/cli/json-mode",
- "v1/cli/ai-agent-skill",
- "v1/cli/examples"
- ]
- },
- {
- "group": "MCP Server",
- "icon": "/logo/mcp.svg",
- "pages": [
- "v1/mcp-server/introduction",
- "v1/mcp-server/cursor",
- "v1/mcp-server/claude",
- "v1/mcp-server/smithery"
- ]
- },
- "v1/toonify",
- {
- "group": "Additional Parameters",
- "pages": [
- "v1/additional-parameters/headers",
- "v1/additional-parameters/pagination",
- "v1/additional-parameters/proxy",
- "v1/additional-parameters/wait-ms"
- ]
- }
- ]
- }
+ "group": "Knowledge Base",
+ "pages": [
+ "knowledge-base/introduction"
]
},
{
- "tab": "API Reference",
- "groups": [
- {
- "group": "API Documentation",
- "pages": [
- "v1/api-reference/introduction"
- ]
- }
+ "group": "Scraping Tools",
+ "pages": [
+ "knowledge-base/ai-tools/lovable",
+ "knowledge-base/ai-tools/v0",
+ "knowledge-base/ai-tools/bolt",
+ "knowledge-base/ai-tools/cursor"
+ ]
+ },
+ {
+ "group": "CLI",
+ "pages": [
+ "knowledge-base/cli/getting-started",
+ "knowledge-base/cli/json-mode",
+ "knowledge-base/cli/ai-agent-skill",
+ "knowledge-base/cli/command-examples"
+ ]
+ },
+ {
+ "group": "Troubleshooting",
+ "pages": [
+ "knowledge-base/troubleshooting/cors-error",
+ "knowledge-base/troubleshooting/empty-results",
+ "knowledge-base/troubleshooting/rate-limiting",
+ "knowledge-base/troubleshooting/timeout-errors"
+ ]
+ },
+ {
+ "group": "Scraping Guides",
+ "pages": [
+ "knowledge-base/scraping/javascript-rendering",
+ "knowledge-base/scraping/pagination",
+ "knowledge-base/scraping/custom-headers",
+ "knowledge-base/scraping/proxy"
+ ]
+ },
+ {
+ "group": "Account & Credits",
+ "pages": [
+ "knowledge-base/account/api-keys",
+ "knowledge-base/account/credits",
+ "knowledge-base/account/rate-limits"
+ ]
+ }
+ ]
+ },
+ {
+ "tab": "Cookbook",
+ "groups": [
+ {
+ "group": "Cookbook",
+ "pages": [
+ "cookbook/introduction"
+ ]
+ },
+ {
+ "group": "Examples",
+ "pages": [
+ "cookbook/examples/company-info",
+ "cookbook/examples/github-trending",
+ "cookbook/examples/wired",
+ "cookbook/examples/homes",
+ "cookbook/examples/research-agent",
+ "cookbook/examples/chat-webpage",
+ "cookbook/examples/pagination"
+ ]
+ }
+ ]
+ },
+ {
+ "tab": "API Reference",
+ "groups": [
+ {
+ "group": "API Documentation",
+ "pages": [
+ "api-reference/introduction",
+ "api-reference/errors"
+ ]
+ },
+ {
+ "group": "SmartScraper",
+ "pages": [
+ "api-reference/endpoint/smartscraper/start",
+ "api-reference/endpoint/smartscraper/get-status"
+ ]
+ },
+ {
+ "group": "SearchScraper",
+ "pages": [
+ "api-reference/endpoint/searchscraper/start",
+ "api-reference/endpoint/searchscraper/get-status"
+ ]
+ },
+ {
+ "group": "Markdownify",
+ "pages": [
+ "api-reference/endpoint/markdownify/start",
+ "api-reference/endpoint/markdownify/get-status"
+ ]
+ },
+ {
+ "group": "SmartCrawler",
+ "pages": [
+ "api-reference/endpoint/smartcrawler/start",
+ "api-reference/endpoint/smartcrawler/get-status"
+ ]
+ },
+ {
+ "group": "Sitemap",
+ "pages": [
+ "api-reference/endpoint/sitemap/start",
+ "api-reference/endpoint/sitemap/get-status"
+ ]
+ },
+ {
+ "group": "User",
+ "pages": [
+ "api-reference/endpoint/user/get-credits",
+ "api-reference/endpoint/user/submit-feedback"
]
}
]
@@ -322,7 +259,12 @@
"href": "https://scrapegraphai.com/",
"icon": "globe"
},
-{
+ {
+ "anchor": "Community",
+ "href": "https://discord.gg/uJN7TYcpNa",
+ "icon": "discord"
+ },
+ {
"anchor": "Blog",
"href": "https://scrapegraphai.com/blog",
"icon": "newspaper"
@@ -331,24 +273,13 @@
}
},
"logo": {
- "light": "/logos/logo-light.svg",
- "dark": "/logos/logo-dark.svg",
+ "light": "https://raw.githubusercontent.com/ScrapeGraphAI/docs-mintlify/main/logo/light.svg",
+ "dark": "https://raw.githubusercontent.com/ScrapeGraphAI/docs-mintlify/main/logo/dark.svg",
"href": "https://docs.scrapegraphai.com"
},
"background": {
"color": {
- "dark": "#242424",
- "light": "#EFEFEF"
- }
- },
- "fonts": {
- "heading": {
- "family": "IBM Plex Sans",
- "weight": 500
- },
- "body": {
- "family": "IBM Plex Sans",
- "weight": 400
+ "dark": "#101725"
}
},
"navbar": {
@@ -369,7 +300,7 @@
"primary": {
"type": "button",
"label": "Dashboard",
- "href": "https://scrapegraphai.com/dashboard"
+ "href": "https://dashboard.scrapegraphai.com"
}
},
"footer": {
@@ -391,4 +322,4 @@
"vscode"
]
}
-}
+}
\ No newline at end of file
diff --git a/favicon.svg b/favicon.svg
index 6fb828b..33285d6 100644
--- a/favicon.svg
+++ b/favicon.svg
@@ -1,15 +1,145 @@
-
+
+
+
+
diff --git a/images/dashboard/dashboard-1.png b/images/dashboard/dashboard-1.png
index 2249c72..1120f7e 100644
Binary files a/images/dashboard/dashboard-1.png and b/images/dashboard/dashboard-1.png differ
diff --git a/images/dashboard/settings-1.png b/images/dashboard/settings-1.png
index 16f93b3..87ea08e 100644
Binary files a/images/dashboard/settings-1.png and b/images/dashboard/settings-1.png differ
diff --git a/images/introduction/docs-banner-dark.png b/images/introduction/docs-banner-dark.png
deleted file mode 100644
index b153c33..0000000
Binary files a/images/introduction/docs-banner-dark.png and /dev/null differ
diff --git a/images/introduction/docs-banner-ligth.png b/images/introduction/docs-banner-ligth.png
deleted file mode 100644
index e8abe91..0000000
Binary files a/images/introduction/docs-banner-ligth.png and /dev/null differ
diff --git a/images/introduction/docs-banner.png b/images/introduction/docs-banner.png
new file mode 100644
index 0000000..05e3164
Binary files /dev/null and b/images/introduction/docs-banner.png differ
diff --git a/install.md b/install.md
index 07cb1d5..1f1165d 100644
--- a/install.md
+++ b/install.md
@@ -1,11 +1,11 @@
---
title: Installation
-description: 'Install and get started with ScrapeGraphAI v2 SDKs'
+description: 'Install and get started with ScrapeGraphAI SDKs'
---
## Prerequisites
-- Obtain your **API key** by signing up on the [ScrapeGraphAI Dashboard](https://scrapegraphai.com/dashboard)
+- Obtain your **API key** by signing up on the [ScrapeGraphAI Dashboard](https://dashboard.scrapegraphai.com)
---
@@ -22,10 +22,10 @@ from scrapegraph_py import Client
client = Client(api_key="your-api-key-here")
-# Extract data from a website
-response = client.extract(
- url="https://scrapegraphai.com",
- prompt="Extract information about the company"
+# Scrape a website
+response = client.smartscraper(
+ website_url="https://scrapegraphai.com",
+ user_prompt="Extract information about the company"
)
print(response)
```
@@ -40,8 +40,6 @@ For more advanced usage, see the [Python SDK documentation](/sdks/python).
## JavaScript SDK
-Requires **Node.js >= 22**.
-
Install using npm, pnpm, yarn, or bun:
```bash
@@ -61,16 +59,20 @@ bun add scrapegraph-js
**Usage:**
```javascript
-import scrapegraphai from "scrapegraph-js";
+import { smartScraper } from "scrapegraph-js";
-const sgai = scrapegraphai({ apiKey: "your-api-key-here" });
+const apiKey = "your-api-key-here";
-const { data } = await sgai.extract(
- "https://scrapegraphai.com",
- { prompt: "What does the company do?" }
-);
+const response = await smartScraper(apiKey, {
+ website_url: "https://scrapegraphai.com",
+ user_prompt: "What does the company do?",
+});
-console.log(data);
+if (response.status === "error") {
+ console.error("Error:", response.error);
+} else {
+ console.log(response.data.result);
+}
```
@@ -83,20 +85,17 @@ For more advanced usage, see the [JavaScript SDK documentation](/sdks/javascript
## Key Concepts
-### Scrape (formerly Markdownify)
-Convert any webpage into markdown, HTML, screenshot, or branding format. [Learn more](/services/scrape)
-
-### Extract (formerly SmartScraper)
-Extract specific information from any webpage using AI. Provide a URL and a prompt describing what you want to extract. [Learn more](/services/extract)
+### SmartScraper
+Extract specific information from any webpage using AI. Provide a URL and a prompt describing what you want to extract. [Learn more](/services/smartscraper)
-### Search (formerly SearchScraper)
-Search and extract information from multiple web sources using AI. Start with just a query - Search will find relevant websites and extract the information you need. [Learn more](/services/search)
+### SearchScraper
+Search and extract information from multiple web sources using AI. Start with just a prompt - SearchScraper will find relevant websites and extract the information you need. [Learn more](/services/searchscraper)
-### Crawl (formerly SmartCrawler)
-Multi-page website crawling with flexible output formats. Traverse multiple pages, follow links, and return content in your preferred format. [Learn more](/services/crawl)
+### SmartCrawler
+AI-powered extraction for any webpage with crawl capabilities. Automatically navigate and extract data from multiple pages. [Learn more](/services/smartcrawler)
-### Monitor
-Scheduled web monitoring with AI-powered extraction. Set up recurring scraping jobs that automatically extract data on a cron schedule. [Learn more](/services/monitor)
+### Markdownify
+Convert any webpage into clean, formatted markdown. Perfect for content aggregation and processing. [Learn more](/services/markdownify)
### Structured Output with Schemas
Both SDKs support structured output using schemas:
@@ -120,37 +119,34 @@ class CompanyInfo(BaseModel):
industry: str = Field(description="Industry sector")
client = Client(api_key="your-api-key")
-response = client.extract(
- url="https://scrapegraphai.com",
- prompt="Extract company information",
+result = client.smartscraper(
+ website_url="https://scrapegraphai.com",
+ user_prompt="Extract company information",
output_schema=CompanyInfo
)
-print(response)
+print(result)
```
### JavaScript Example
```javascript
-import scrapegraphai from "scrapegraph-js";
+import { smartScraper } from "scrapegraph-js";
import { z } from "zod";
-const sgai = scrapegraphai({ apiKey: "your-api-key" });
-
const CompanySchema = z.object({
- companyName: z.string().describe("The company name"),
+ company_name: z.string().describe("The company name"),
description: z.string().describe("Company description"),
website: z.string().url().describe("Company website URL"),
industry: z.string().describe("Industry sector"),
});
-const { data } = await sgai.extract(
- "https://scrapegraphai.com",
- {
- prompt: "Extract company information",
- schema: CompanySchema,
- }
-);
-console.log(data);
+const apiKey = "your-api-key";
+const response = await smartScraper(apiKey, {
+ website_url: "https://scrapegraphai.com",
+ user_prompt: "Extract company information",
+ output_schema: CompanySchema,
+});
+console.log(response.data.result);
```
---
diff --git a/integrations/claude-code-skill.mdx b/integrations/claude-code-skill.mdx
index 74768c3..4f85e3c 100644
--- a/integrations/claude-code-skill.mdx
+++ b/integrations/claude-code-skill.mdx
@@ -6,16 +6,14 @@ icon: '/logo/claude-color.svg'
## Overview
-The ScrapeGraphAI Claude Code Skill ships with [just-scrape](https://github.com/ScrapeGraphAI/just-scrape), the official CLI for the **v2 API**. Once installed, agents like Claude Code, Cursor, Copilot, Cline, and Windsurf can scrape websites, extract structured data, search the web, crawl sites, and set up page-change monitors — all from natural language prompts.
-
-The skill wires `just-scrape` into your agent's skill directory so the agent knows when and how to invoke the CLI.
+The ScrapeGraphAI [Claude Code Skill](https://github.com/ScrapeGraphAI/skill) gives AI coding agents full access to ScrapeGraphAI's web scraping, search, and crawling APIs. Once installed, agents like Claude Code, Cursor, Copilot, and Cline can scrape websites, extract structured data, and crawl pages — all from natural language prompts.
- Browse the CLI and skill source
+ View the skill source code and documentation
## Installation
@@ -24,25 +22,21 @@ The skill wires `just-scrape` into your agent's skill directory so the agent kno
### Option 1: Install via skills.sh (Recommended)
-The fastest way to install. Requires [Node.js](https://nodejs.org) or [Bun](https://bun.sh).
+The fastest way to install. Requires [Node.js](https://nodejs.org).
```bash
-bunx skills add https://github.com/ScrapeGraphAI/just-scrape
-# or
-npx skills add https://github.com/ScrapeGraphAI/just-scrape
+npx skills add ScrapeGraphAI/skill
```
-This symlinks `skills/just-scrape/SKILL.md` into your `~/.claude/skills/` directory automatically.
-
-You can also browse the published skill at [skills.sh/scrapegraphai/just-scrape/just-scrape](https://skills.sh/scrapegraphai/just-scrape/just-scrape).
+This clones the skill and symlinks it into your `~/.claude/skills/` directory automatically.
### Option 2: Manual install
Clone the repository and create the symlink yourself:
```bash
-git clone https://github.com/ScrapeGraphAI/just-scrape.git ~/.claude/skills/just-scrape
-ln -sf ~/.claude/skills/just-scrape/skills/just-scrape/SKILL.md ~/.claude/skills/just-scrape.md
+git clone https://github.com/ScrapeGraphAI/skill.git ~/.claude/skills/scrapegraphai
+ln -sf ~/.claude/skills/scrapegraphai/SKILL.md ~/.claude/skills/scrapegraphai.md
```
### Option 3: Project-level install
@@ -51,54 +45,58 @@ Install the skill for a single project only:
```bash
mkdir -p .claude/skills
-git clone https://github.com/ScrapeGraphAI/just-scrape.git .claude/skills/just-scrape
-ln -sf .claude/skills/just-scrape/skills/just-scrape/SKILL.md .claude/skills/just-scrape.md
+git clone https://github.com/ScrapeGraphAI/skill.git .claude/skills/scrapegraphai
+ln -sf .claude/skills/scrapegraphai/SKILL.md .claude/skills/scrapegraphai.md
```
## Setup
-Install the CLI and set your ScrapeGraphAI API key:
+Set your ScrapeGraphAI API key as an environment variable:
```bash
-npm install -g just-scrape@latest
export SGAI_API_KEY="sgai-..."
```
-Get your API key from the [dashboard](https://scrapegraphai.com/dashboard). The CLI also accepts the key via a `.env` file, `~/.scrapegraphai/config.json`, or an interactive prompt.
+Get your API key from the [dashboard](https://dashboard.scrapegraphai.com).
-## Capabilities
+## What's Included
-The skill maps to the v2 API surface via `just-scrape`:
+The skill installs the following files:
+
+| File | Description |
+|------|-------------|
+| `SKILL.md` | Main skill file with API reference, examples, and decision guide |
+| `references/api-endpoints.md` | Full parameter tables for all endpoints |
+| `references/sdk-examples.md` | Python and JavaScript SDK examples |
+| `references/advanced-features.md` | Stealth mode, schemas, scrolling, pagination, and more |
+
+## Capabilities
-
- Extract structured data from any URL using AI (`just-scrape extract`)
+
+ Extract structured data from any webpage using natural language prompts
-
- Search the web and extract structured results (`just-scrape search`)
+
+ Search the web and extract results with AI or as markdown
-
- Fetch a page in 8 formats: markdown, html, screenshot, branding, links, images, summary, json
+
+ Convert any webpage into clean, formatted markdown
-
- Convert any webpage into clean markdown (wraps `scrape -f markdown`)
+
+ Crawl multiple pages from a website with depth and path controls
-
- Crawl multi-page sites with depth, link, and pattern controls
+
+ Extract all URLs from a website's sitemap
-
- Schedule page-change monitors with cron intervals, webhooks, and activity polling
+
+ Browser automation — login, click, navigate, fill forms, then extract
-
-Removed from v1: `sitemap`, `agentic_scraper`, `generate-schema`, `validate`. There is no direct replacement on v2.
-
-
## Example Prompts
Once the skill is installed, you can use natural language prompts directly in your AI coding agent:
@@ -120,18 +118,14 @@ Crawl https://example.com/blog with depth 2 and extract the title and summary fr
```
```text
-Monitor https://store.example.com/pricing every hour and webhook me when it changes
-```
-
-```text
-Create a 30m monitor on https://example.com and poll its activity feed, printing new ticks as they come in
+Get all URLs from the sitemap of https://example.com
```
```text
-Fetch a full-page screenshot and branding assets for https://example.com
+Log into https://example.com/dashboard, click "Reports", and extract the table data
```
-The agent will automatically select the right `just-scrape` command, handle authentication, poll for async results (crawls), and return structured data.
+The agent will automatically select the right ScrapeGraphAI endpoint, handle authentication, poll for async results, and return structured data.
## Supported Agents
@@ -152,7 +146,7 @@ Need help with the skill?
Report bugs and request features
diff --git a/integrations/crewai.mdx b/integrations/crewai.mdx
index 7288b59..dba500c 100644
--- a/integrations/crewai.mdx
+++ b/integrations/crewai.mdx
@@ -100,7 +100,7 @@ SCRAPEGRAPH_API_KEY=your_api_key_here
```
-Get your API key from the [dashboard](https://scrapegraphai.com/dashboard)
+Get your API key from the [dashboard](https://dashboard.scrapegraphai.com)
## Use Cases
diff --git a/integrations/google-adk.mdx b/integrations/google-adk.mdx
index dc87dee..1d3c7f9 100644
--- a/integrations/google-adk.mdx
+++ b/integrations/google-adk.mdx
@@ -84,7 +84,7 @@ SGAI_API_KEY = "your-api-key-here"
```
-Get your API key from the [dashboard](https://scrapegraphai.com/dashboard)
+Get your API key from the [dashboard](https://dashboard.scrapegraphai.com)
## Tool Filtering
diff --git a/integrations/langchain.mdx b/integrations/langchain.mdx
index ad0a97f..aed504f 100644
--- a/integrations/langchain.mdx
+++ b/integrations/langchain.mdx
@@ -25,20 +25,20 @@ pip install langchain-scrapegraph
## Available Tools
-### ExtractTool
+### SmartScraperTool
Extract structured data from any webpage using natural language prompts:
```python
-from langchain_scrapegraph.tools import ExtractTool
+from langchain_scrapegraph.tools import SmartScraperTool
# Initialize the tool (uses SGAI_API_KEY from environment)
-tool = ExtractTool()
+tool = SmartscraperTool()
# Extract information using natural language
result = tool.invoke({
- "url": "https://www.example.com",
- "prompt": "Extract the main heading and first paragraph"
+ "website_url": "https://www.example.com",
+ "user_prompt": "Extract the main heading and first paragraph"
})
```
@@ -46,51 +46,60 @@ result = tool.invoke({
Define the structure of the output using Pydantic models:
```python
+from typing import List
from pydantic import BaseModel, Field
-from langchain_scrapegraph.tools import ExtractTool
+from langchain_scrapegraph.tools import SmartScraperTool
class WebsiteInfo(BaseModel):
- title: str = Field(description="The main title of the page")
- description: str = Field(description="The main description")
+ title: str = Field(description="The main title of the webpage")
+ description: str = Field(description="The main description or first paragraph")
+ urls: List[str] = Field(description="The URLs inside the webpage")
-# Initialize with output schema
-tool = ExtractTool(llm_output_schema=WebsiteInfo)
+# Initialize with schema
+tool = SmartScraperTool(llm_output_schema=WebsiteInfo)
result = tool.invoke({
- "url": "https://example.com",
- "prompt": "Extract the title and description"
+ "website_url": "https://www.example.com",
+ "user_prompt": "Extract the website information"
})
```
-### SearchTool
+### SearchScraperTool
-Search the web and extract structured results using AI:
+Process HTML content directly with AI extraction:
```python
-from langchain_scrapegraph.tools import SearchTool
+from langchain_scrapegraph.tools import SearchScraperTool
-tool = SearchTool()
+
+tool = SearchScraperTool()
result = tool.invoke({
- "query": "Find the best restaurants in San Francisco",
+ "user_prompt": "Find the best restaurants in San Francisco",
})
+
```
-### ScrapeTool
+
+```python
+from typing import Optional
+from pydantic import BaseModel, Field
+from langchain_scrapegraph.tools import SearchScraperTool
-Scrape a webpage and return it in the desired format:
+class RestaurantInfo(BaseModel):
+ name: str = Field(description="The restaurant name")
+ address: str = Field(description="The restaurant address")
+ rating: float = Field(description="The restaurant rating")
-```python
-from langchain_scrapegraph.tools import ScrapeTool
-tool = ScrapeTool()
+tool = SearchScraperTool(llm_output_schema=RestaurantInfo)
-# Scrape as markdown (default)
-result = tool.invoke({"url": "https://example.com"})
+result = tool.invoke({
+ "user_prompt": "Find the best restaurants in San Francisco"
+})
-# Scrape as HTML
-result = tool.invoke({"url": "https://example.com", "format": "html"})
```
+
### MarkdownifyTool
@@ -103,146 +112,34 @@ tool = MarkdownifyTool()
markdown = tool.invoke({"website_url": "https://example.com"})
```
-### Crawl Tools
-
-Start and manage crawl jobs with `CrawlStartTool`, `CrawlStatusTool`, `CrawlStopTool`, and `CrawlResumeTool`:
-
-```python
-import time
-from langchain_scrapegraph.tools import CrawlStartTool, CrawlStatusTool
-
-start_tool = CrawlStartTool()
-status_tool = CrawlStatusTool()
-
-# Start a crawl job
-result = start_tool.invoke({
- "url": "https://example.com",
- "depth": 2,
- "max_pages": 5,
- "format": "markdown",
-})
-print("Crawl started:", result)
-
-# Check status
-crawl_id = result.get("id")
-if crawl_id:
- time.sleep(5)
- status = status_tool.invoke({"crawl_id": crawl_id})
- print("Crawl status:", status)
-```
-
-### Monitor Tools
-
-Create and manage monitors (replaces scheduled jobs) with `MonitorCreateTool`, `MonitorListTool`, `MonitorGetTool`, `MonitorPauseTool`, `MonitorResumeTool`, and `MonitorDeleteTool`:
-
-```python
-from langchain_scrapegraph.tools import MonitorCreateTool, MonitorListTool
-
-create_tool = MonitorCreateTool()
-list_tool = MonitorListTool()
-
-# Create a monitor
-result = create_tool.invoke({
- "name": "Price Monitor",
- "url": "https://example.com/products",
- "prompt": "Extract current product prices",
- "cron": "0 9 * * *", # Daily at 9 AM
-})
-print("Monitor created:", result)
-
-# List all monitors
-monitors = list_tool.invoke({})
-print("All monitors:", monitors)
-```
-
-### HistoryTool
-
-Retrieve request history:
-
-```python
-from langchain_scrapegraph.tools import HistoryTool
-
-tool = HistoryTool()
-history = tool.invoke({})
-```
-
-### GetCreditsTool
-
-Check your remaining API credits:
-
-```python
-from langchain_scrapegraph.tools import GetCreditsTool
-
-tool = GetCreditsTool()
-credits = tool.invoke({})
-```
-
## Example Agent
Create a research agent that can gather and analyze web data:
```python
-from langchain.agents import AgentExecutor, create_openai_functions_agent
-from langchain_core.messages import SystemMessage
-from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
+from langchain.agents import initialize_agent, AgentType
+from langchain_scrapegraph.tools import SmartScraperTool
from langchain_openai import ChatOpenAI
-from langchain_scrapegraph.tools import ExtractTool, GetCreditsTool, SearchTool
-# Initialize the tools
+# Initialize tools
tools = [
- ExtractTool(),
- GetCreditsTool(),
- SearchTool(),
+ SmartScraperTool(),
]
-# Create the prompt template
-prompt = ChatPromptTemplate.from_messages([
- SystemMessage(
- content=(
- "You are a helpful AI assistant that can analyze websites and extract information. "
- "You have access to tools that can help you scrape and process web content. "
- "Always explain what you're doing before using a tool."
- )
- ),
- MessagesPlaceholder(variable_name="chat_history", optional=True),
- ("user", "{input}"),
- MessagesPlaceholder(variable_name="agent_scratchpad"),
-])
-
-# Initialize the LLM
-llm = ChatOpenAI(temperature=0)
-
-# Create the agent
-agent = create_openai_functions_agent(llm, tools, prompt)
-agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
-
-# Example usage
-response = agent_executor.invoke({
- "input": "Extract the main products from https://www.scrapegraphai.com/"
-})
-print(response["output"])
+# Create an agent
+agent = initialize_agent(
+ tools=tools,
+ llm=ChatOpenAI(temperature=0),
+ agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
+ verbose=True
+)
+
+# Use the agent
+response = agent.run("""
+ Visit example.com, make a summary of the content and extract the main heading and first paragraph
+""")
```
-## Migration from v1
-
-If you're upgrading from v1, here are the key changes:
-
-| v1 Tool | v2 Tool |
-|---------|---------|
-| `SmartScraperTool` | `ExtractTool` |
-| `SearchScraperTool` | `SearchTool` |
-| `SmartCrawlerTool` | `CrawlStartTool` / `CrawlStatusTool` / `CrawlStopTool` / `CrawlResumeTool` |
-| `CreateScheduledJobTool` | `MonitorCreateTool` |
-| `GetScheduledJobsTool` | `MonitorListTool` |
-| `GetScheduledJobTool` | `MonitorGetTool` |
-| `PauseScheduledJobTool` | `MonitorPauseTool` |
-| `ResumeScheduledJobTool` | `MonitorResumeTool` |
-| `DeleteScheduledJobTool` | `MonitorDeleteTool` |
-| `MarkdownifyTool` | `MarkdownifyTool` (unchanged) |
-| `GetCreditsTool` | `GetCreditsTool` (unchanged) |
-| `AgenticScraperTool` | Removed |
-| -- | `HistoryTool` (new) |
-
## Configuration
Set your ScrapeGraph API key in your environment:
@@ -259,7 +156,7 @@ os.environ["SGAI_API_KEY"] = "your-api-key-here"
```
-Get your API key from the [dashboard](https://scrapegraphai.com/dashboard)
+Get your API key from the [dashboard](https://dashboard.scrapegraphai.com)
## Use Cases
diff --git a/integrations/vercel_ai.mdx b/integrations/vercel_ai.mdx
index 9a4e3f9..889df6b 100644
--- a/integrations/vercel_ai.mdx
+++ b/integrations/vercel_ai.mdx
@@ -5,19 +5,19 @@ description: "Integrate ScrapeGraphAI into Vercel AI"
## Overview
-[Vercel AI SDK](https://ai-sdk.dev/) is a popular JavaScript/TypeScript framework to interact with various LLM providers. This page shows how to integrate it with ScrapeGraph.
+[Vercel AI sdk](https://ai-sdk.dev/) is a very populate javascript/typescript framework to interact with various LLMs providers. This page shows how to integrate it with ScrapeGraph
- View the Vercel AI SDK documentation
+ View the integration on LlamaHub
## Installation
-Follow our [JavaScript SDK installation steps](/sdks/javascript) using your favourite package manager:
+Follow out [javascript sdk installation steps](/sdks/javascript) using your favourite package manager:
```bash
# Using npm
@@ -33,7 +33,7 @@ yarn add scrapegraph-js
bun add scrapegraph-js
```
-Then, install [Vercel AI](https://ai-sdk.dev/docs/getting-started) with their [OpenAI provider](https://ai-sdk.dev/providers/ai-sdk-providers/openai):
+Then, install [vercel ai](https://ai-sdk.dev/docs/getting-started) with their [openai provider](https://ai-sdk.dev/providers/ai-sdk-providers/openai)
```bash
# Using npm
@@ -51,46 +51,43 @@ bun add ai @ai-sdk/openai
## Usage
-The ScrapeGraph SDK can be used like any other tool. See [Vercel AI tool calling docs](https://ai-sdk.dev/docs/ai-sdk-core/tools-and-tool-calling).
+ScrapeGraph sdk can be used like any other tools, see [vercel ai tool calling doc](https://ai-sdk.dev/docs/ai-sdk-core/tools-and-tool-calling)
```ts
import { z } from "zod";
import { generateText, tool } from "ai";
import { openai } from "@ai-sdk/openai";
-import { extract } from "scrapegraph-js";
+import { smartScraper } from "scrapegraph-js";
+
+const apiKey = process.env.SGAI_APIKEY;
+
+const ArticleSchema = z.object({
+ title: z.string().describe("The article title"),
+ author: z.string().describe("The author's name"),
+ publishDate: z.string().describe("Article publication date"),
+ content: z.string().describe("Main article content"),
+ category: z.string().describe("Article category"),
+});
+
+const ArticlesArraySchema = z
+ .array(ArticleSchema)
+ .describe("Array of articles");
const result = await generateText({
model: openai("gpt-4.1-mini"),
tools: {
scrape: tool({
- description: "Extract articles information from a given URL.",
+ description: "Get articles information for a given url.",
parameters: z.object({
- url: z.string().describe("The exact URL."),
+ url: z.string().describe("The exact url."),
}),
execute: async ({ url }) => {
- const response = await extract(process.env.SGAI_API_KEY!, {
- url,
- prompt: "Extract the article information",
- schema: {
- type: "object",
- properties: {
- articles: {
- type: "array",
- items: {
- type: "object",
- properties: {
- title: { type: "string" },
- author: { type: "string" },
- publishDate: { type: "string" },
- content: { type: "string" },
- category: { type: "string" },
- },
- },
- },
- },
- },
+ const response = await smartScraper(apiKey, {
+ website_url: url,
+ user_prompt: "Extract the article information",
+ output_schema: ArticlesArraySchema,
});
- return response.data?.json;
+ return response.data;
},
}),
},
@@ -100,6 +97,8 @@ const result = await generateText({
console.log(result);
```
+**TODO ADD THE LOGS**
+
## Support
Need help with the integration?
@@ -108,7 +107,7 @@ Need help with the integration?
Report bugs and request features
diff --git a/introduction.mdx b/introduction.mdx
index 6792aa2..d848a3c 100644
--- a/introduction.mdx
+++ b/introduction.mdx
@@ -4,16 +4,8 @@ description: 'Welcome to ScrapeGraphAI - AI-Powered Web Data Extraction'
---
-
## Overview
@@ -41,7 +33,7 @@ description: 'Welcome to ScrapeGraphAI - AI-Powered Web Data Extraction'
- Sign up and access your API key from the [dashboard](https://scrapegraphai.com/dashboard)
+ Sign up and access your API key from the [dashboard](https://dashboard.scrapegraphai.com)
Select from our specialized extraction services based on your needs
diff --git a/knowledge-base/account/api-keys.mdx b/knowledge-base/account/api-keys.mdx
index 5bbb48d..71d593d 100644
--- a/knowledge-base/account/api-keys.mdx
+++ b/knowledge-base/account/api-keys.mdx
@@ -7,7 +7,7 @@ Your API key authenticates every request you make to the ScrapeGraphAI API. Keep
## Finding your API key
-1. Log in to the [ScrapeGraphAI dashboard](https://scrapegraphai.com/dashboard).
+1. Log in to the [ScrapeGraphAI dashboard](https://dashboard.scrapegraphai.com).
2. Navigate to **Settings**.
3. Your API key is displayed in the **API Key** section.
@@ -24,12 +24,9 @@ client = Client(api_key="your-api-key")
```
```javascript JavaScript
-import { extract } from "scrapegraph-js";
+import { smartScraper } from "scrapegraph-js";
-const result = await extract("your-api-key", {
- url: "https://example.com",
- prompt: "Extract the title",
-});
+const result = await smartScraper("your-api-key", url, prompt);
```
```bash cURL
@@ -64,7 +61,7 @@ client = Client(api_key=os.getenv("SGAI_API_KEY"))
If your key has been exposed or you want to rotate it for security:
-1. Go to **Settings** in the [dashboard](https://scrapegraphai.com/dashboard).
+1. Go to **Settings** in the [dashboard](https://dashboard.scrapegraphai.com).
2. Click **Regenerate API Key**.
3. Copy the new key immediately — it will only be shown once.
4. Update all services and environment variables that use the old key.
diff --git a/knowledge-base/account/credits.mdx b/knowledge-base/account/credits.mdx
index 2fd375d..72d38f0 100644
--- a/knowledge-base/account/credits.mdx
+++ b/knowledge-base/account/credits.mdx
@@ -7,29 +7,14 @@ ScrapeGraphAI uses a credit system to measure API usage. Each successful API cal
## Credit costs per service
-| Service | Credits per request | Details |
-|---|---|---|
-| **Scrape** (markdown) | 1 | Basic page scrape returning markdown |
-| **Scrape** (screenshot) | 2 | Page scrape with a screenshot |
-| **Scrape** (branding analysis) | 25 | Full branding analysis of a page |
-| **Extract** | 5 | Structured data extraction |
-| **Search** (no prompt) | 2 per result | Search results without LLM processing |
-| **Search** (with prompt) | 5 per result | Search results processed by an LLM |
-| **Crawl** | 2 startup + per-page scrape cost | Startup fee plus scrape cost for each page |
-| **Monitor** | +5 | Additional credits when a change is detected |
-
-### Proxy modifiers
-
-Using a proxy adds extra credits on top of the base service cost:
-
-| Proxy mode | Additional credits |
+| Service | Credits per request |
|---|---|
-| Fast / JS rendering | +0 |
-| Stealth | +4 |
-| JS + Stealth | +5 |
-| Auto (worst case) | +9 |
-
-For a full breakdown of plans and monthly credit allowances, see [Plans & Pricing](/knowledge-base/account/pricing).
+| SmartScraper | 1 |
+| SearchScraper | 5 |
+| Markdownify | 1 |
+| SmartCrawler | 1 per page crawled |
+| Sitemap | 1 |
+| AgenticScraper | Variable |
Failed requests and requests that return an error are not charged.
@@ -37,7 +22,7 @@ For a full breakdown of plans and monthly credit allowances, see [Plans & Pricin
## Checking your credit balance
-Log in to the [dashboard](https://scrapegraphai.com/dashboard) to see:
+Log in to the [dashboard](https://dashboard.scrapegraphai.com) to see:
- **Remaining credits** for your current billing period
- **Usage history** broken down by service and date
@@ -69,7 +54,7 @@ When your credits are exhausted, the API returns an HTTP `402 Payment Required`
}
```
-Upgrade your plan or purchase additional credits from the [dashboard](https://scrapegraphai.com/dashboard).
+Upgrade your plan or purchase additional credits from the [dashboard](https://dashboard.scrapegraphai.com).
## Tips to reduce credit usage
diff --git a/knowledge-base/account/pricing.mdx b/knowledge-base/account/pricing.mdx
deleted file mode 100644
index 77124a4..0000000
--- a/knowledge-base/account/pricing.mdx
+++ /dev/null
@@ -1,109 +0,0 @@
----
-title: Plans & Pricing
-description: 'Overview of ScrapeGraphAI plans, pricing, and what each tier includes'
----
-
-ScrapeGraphAI offers flexible plans to fit teams of every size — from hobbyists to enterprises. All plans include access to every service; higher tiers unlock more credits, throughput, and support.
-
-## Plans
-
-
-
- **$0 / month**
-
- - 500 API credits / month
- - 10 requests / min
- - 1 monitor
- - 1 concurrent crawl
-
-
-
- **$17 / month** (or $204 / year — save $36)
-
- - 10,000 API credits / month
- - 100 requests / min
- - 5 monitors
- - 3 concurrent crawls
-
-
-
- **$85 / month** (or $1,020 / year — save $180)
-
- - 100,000 API credits / month
- - 500 requests / min
- - 25 monitors
- - 15 concurrent crawls
- - Basic Proxy Rotation
-
-
-
- **$425 / month** (or $5,100 / year — save $900)
-
- - 750,000 API credits / month
- - 5,000 requests / min
- - 100 monitors
- - 50 concurrent crawls
- - Advanced Proxy Rotation
- - Priority support
-
-
-
-Need more? **Enterprise** plans offer custom credit volumes, custom rate limits, dedicated support, and SLA guarantees. [Contact us](mailto:contact@scrapegraphai.com) for details.
-
-## Credit costs per service
-
-Every API call consumes credits. The exact cost depends on the service and the options you use.
-
-| Service | Base cost | Details |
-|---|---|---|
-| **Scrape** (markdown) | 1 credit | Basic page scrape returning markdown |
-| **Scrape** (screenshot) | 2 credits | Page scrape with a screenshot |
-| **Scrape** (branding analysis) | 25 credits | Full branding analysis of a page |
-| **Extract** | 5 credits | Structured data extraction |
-| **Search** (no prompt) | 2 credits / result | Search results without LLM processing |
-| **Search** (with prompt) | 5 credits / result | Search results processed by an LLM |
-| **Crawl** | 2 credits startup + per-page scrape cost | Startup fee plus scrape cost for each page |
-| **Monitor** | +5 credits | Additional credits charged when a change is detected |
-
-### Proxy modifiers
-
-Using a proxy adds extra credits on top of the base service cost:
-
-| Proxy mode | Additional credits |
-|---|---|
-| Fast / JS rendering | +0 |
-| Stealth | +4 |
-| JS + Stealth | +5 |
-| Auto (worst case) | +9 |
-
-
- Failed requests and requests that return an error are **not** charged.
-
-
-## Comparing plans at a glance
-
-| | Free | Starter | Growth | Pro | Enterprise |
-|---|---|---|---|---|---|
-| **Monthly price** | $0 | $17 | $85 | $425 | Custom |
-| **Annual price** | $0 | $204 | $1,020 | $5,100 | Custom |
-| **Credits / month** | 500 | 10,000 | 100,000 | 750,000 | Custom |
-| **Requests / min** | 10 | 100 | 500 | 5,000 | Custom |
-| **Monitors** | 1 | 5 | 25 | 100 | Custom |
-| **Concurrent crawls** | 1 | 3 | 15 | 50 | Custom |
-| **Proxy rotation** | — | — | Basic | Advanced | Custom |
-| **Priority support** | — | — | — | Yes | Yes |
-| **SLA guarantee** | — | — | — | — | Yes |
-
-## Upgrading or downgrading
-
-You can change your plan at any time from the [dashboard](https://scrapegraphai.com/dashboard). When upgrading mid-cycle, you receive the additional credits immediately. Downgrades take effect at the start of the next billing period.
-
-## Annual billing
-
-All paid plans offer an annual billing option with significant savings:
-
-- **Starter** — save $36 / year
-- **Growth** — save $180 / year
-- **Pro** — save $900 / year
-
-Switch to annual billing from the [dashboard](https://scrapegraphai.com/dashboard).
diff --git a/knowledge-base/account/rate-limits.mdx b/knowledge-base/account/rate-limits.mdx
index 8c54205..5a495d5 100644
--- a/knowledge-base/account/rate-limits.mdx
+++ b/knowledge-base/account/rate-limits.mdx
@@ -7,15 +7,12 @@ ScrapeGraphAI enforces rate limits to ensure reliable performance for all users.
## Limits overview
-| Plan | Requests per minute | Concurrent crawls | Monitors | Monthly credits |
-|---|---|---|---|---|
-| Free | 10 | 1 | 1 | 500 |
-| Starter | 100 | 3 | 5 | 10,000 |
-| Growth | 500 | 15 | 25 | 100,000 |
-| Pro | 5,000 | 50 | 100 | 750,000 |
-| Enterprise | Custom | Custom | Custom | Custom |
-
-For full pricing details, see [Plans & Pricing](/knowledge-base/account/pricing).
+| Plan | Requests per minute | Concurrent jobs | Monthly credits |
+|---|---|---|---|
+| Free | 5 | 1 | 100 |
+| Starter | 30 | 5 | 5,000 |
+| Pro | 100 | 20 | 50,000 |
+| Enterprise | Custom | Custom | Custom |
Contact [support](mailto:contact@scrapegraphai.com) for custom limits or high-volume plans.
@@ -62,5 +59,5 @@ def scrape_with_backoff(client, url, prompt, max_retries=5):
## Increasing your limits
-- **Upgrade your plan** from the [dashboard](https://scrapegraphai.com/dashboard) to get higher limits immediately.
+- **Upgrade your plan** from the [dashboard](https://dashboard.scrapegraphai.com) to get higher limits immediately.
- **Enterprise customers** can request custom rate limit configurations by contacting [support](mailto:contact@scrapegraphai.com).
diff --git a/knowledge-base/ai-tools/cursor.mdx b/knowledge-base/ai-tools/cursor.mdx
index b725b2e..017d321 100644
--- a/knowledge-base/ai-tools/cursor.mdx
+++ b/knowledge-base/ai-tools/cursor.mdx
@@ -53,13 +53,14 @@ Ask Cursor:
> Write a JavaScript function using scrapegraph-js that extracts product details from an e-commerce page.
```javascript
-import { extract } from "scrapegraph-js";
+import { smartScraper } from "scrapegraph-js";
-async function extractProduct(apiKey, url) {
- return await extract(apiKey, {
+async function extractProduct(url) {
+ return await smartScraper(
+ "your-api-key",
url,
- prompt: "Extract the product name, price, and availability",
- });
+ "Extract the product name, price, and availability"
+ );
}
```
diff --git a/knowledge-base/ai-tools/lovable.mdx b/knowledge-base/ai-tools/lovable.mdx
index 4d8b928..ab252ff 100644
--- a/knowledge-base/ai-tools/lovable.mdx
+++ b/knowledge-base/ai-tools/lovable.mdx
@@ -13,7 +13,7 @@ Because Lovable apps run in the browser, API calls to ScrapeGraphAI must be made
### 1. Get your API key
-Log in to the [ScrapeGraphAI dashboard](https://scrapegraphai.com/dashboard) and copy your API key from the Settings page.
+Log in to the [ScrapeGraphAI dashboard](https://dashboard.scrapegraphai.com) and copy your API key from the Settings page.
### 2. Create a Supabase Edge Function
diff --git a/knowledge-base/cli/getting-started.mdx b/knowledge-base/cli/getting-started.mdx
index d68c913..cb64ee3 100644
--- a/knowledge-base/cli/getting-started.mdx
+++ b/knowledge-base/cli/getting-started.mdx
@@ -39,7 +39,7 @@ Package: [just-scrape](https://www.npmjs.com/package/just-scrape) on npm | [GitH
## Setting up your API key
-The CLI needs a ScrapeGraphAI API key. Get one from the [dashboard](https://scrapegraphai.com/dashboard). The CLI checks for it in this order:
+The CLI needs a ScrapeGraphAI API key. Get one from the [dashboard](https://dashboard.scrapegraphai.com). The CLI checks for it in this order:
1. **Environment variable** — `export SGAI_API_KEY="sgai-..."`
2. **`.env` file** — `SGAI_API_KEY=sgai-...` in the project root
@@ -53,14 +53,19 @@ The easiest approach for a new machine is to just run any command — the CLI wi
| Variable | Description | Default |
|---|---|---|
| `SGAI_API_KEY` | ScrapeGraphAI API key | — |
-| `SGAI_API_URL` | Override the API base URL | `https://api.scrapegraphai.com` |
-| `SGAI_TIMEOUT_S` | Request timeout in seconds | `30` |
-
-Legacy variables (`JUST_SCRAPE_API_URL`, `JUST_SCRAPE_TIMEOUT_S`, `JUST_SCRAPE_DEBUG`) are still bridged.
+| `JUST_SCRAPE_API_URL` | Override the API base URL | `https://api.scrapegraphai.com/v1` |
+| `JUST_SCRAPE_TIMEOUT_S` | Request/polling timeout in seconds | `120` |
+| `JUST_SCRAPE_DEBUG` | Set to `1` to enable debug logging to stderr | `0` |
## Verify your setup
-Check your credit balance to confirm the key is valid:
+Run a quick health check to confirm the key is valid:
+
+```bash
+just-scrape validate
+```
+
+Check your credit balance:
```bash
just-scrape credits
@@ -69,7 +74,7 @@ just-scrape credits
## Your first scrape
```bash
-just-scrape extract https://news.ycombinator.com \
+just-scrape smart-scraper https://news.ycombinator.com \
-p "Extract the top 5 story titles and their URLs"
```
diff --git a/knowledge-base/scraping/custom-headers.mdx b/knowledge-base/scraping/custom-headers.mdx
index 53c1505..fd4b482 100644
--- a/knowledge-base/scraping/custom-headers.mdx
+++ b/knowledge-base/scraping/custom-headers.mdx
@@ -26,18 +26,19 @@ response = client.smartscraper(
```
```javascript
-import { extract } from "scrapegraph-js";
+import { smartScraper } from "scrapegraph-js";
-const result = await extract("your-api-key", {
- url: "https://example.com/protected-page",
- prompt: "Extract the main content",
- fetchConfig: {
+const result = await smartScraper(
+ "your-api-key",
+ "https://example.com/protected-page",
+ "Extract the main content",
+ {
headers: {
Authorization: "Bearer your-token-here",
Cookie: "session=abc123",
},
- },
-});
+ }
+);
```
See the [headers parameter documentation](/services/additional-parameters/headers) for the full reference.
diff --git a/knowledge-base/scraping/javascript-rendering.mdx b/knowledge-base/scraping/javascript-rendering.mdx
index a96738a..5ab0afe 100644
--- a/knowledge-base/scraping/javascript-rendering.mdx
+++ b/knowledge-base/scraping/javascript-rendering.mdx
@@ -26,13 +26,14 @@ response = client.smartscraper(
```
```javascript
-import { extract } from "scrapegraph-js";
-
-const result = await extract("your-api-key", {
- url: "https://example.com/products",
- prompt: "Extract all product names and prices",
- fetchConfig: { wait: 2000 },
-});
+import { smartScraper } from "scrapegraph-js";
+
+const result = await smartScraper(
+ "your-api-key",
+ "https://example.com/products",
+ "Extract all product names and prices",
+ { wait_ms: 2000 }
+);
```
See the [wait_ms parameter documentation](/services/additional-parameters/wait-ms) for more details.
diff --git a/knowledge-base/scraping/pagination.mdx b/knowledge-base/scraping/pagination.mdx
index d291d3d..b086aaa 100644
--- a/knowledge-base/scraping/pagination.mdx
+++ b/knowledge-base/scraping/pagination.mdx
@@ -43,19 +43,19 @@ print(f"Total products extracted: {len(all_results)}")
```
```javascript
-import { extract } from "scrapegraph-js";
+import { smartScraper } from "scrapegraph-js";
+const apiKey = "your-api-key";
const allResults = [];
for (let page = 1; page <= 5; page++) {
const url = `https://example.com/products?page=${page}`;
- const result = await extract("your-api-key", {
+ const result = await smartScraper(
+ apiKey,
url,
- prompt: "Extract all product names and prices on this page",
- });
- if (result.status === "success") {
- allResults.push(...(result.data?.json?.products ?? []));
- }
+ "Extract all product names and prices on this page"
+ );
+ allResults.push(...(result?.products ?? []));
}
```
diff --git a/knowledge-base/scraping/proxy.mdx b/knowledge-base/scraping/proxy.mdx
index 15ca750..1350b71 100644
--- a/knowledge-base/scraping/proxy.mdx
+++ b/knowledge-base/scraping/proxy.mdx
@@ -1,138 +1,88 @@
---
-title: Proxy & Fetch Configuration
-description: 'Control proxy routing, stealth mode, and geo-targeting with FetchConfig'
+title: Scraping behind a proxy
+description: 'Route requests through your own proxy for geo-targeting or privacy'
---
-In v2, all proxy and fetch behaviour is controlled through the `FetchConfig` object. You can set the proxy strategy (`mode`), country-based geotargeting (`country`), wait times, scrolling, custom headers, and more.
+Using a proxy lets you route ScrapeGraphAI requests through a specific IP address or geographic location. This is useful for accessing geo-restricted content, bypassing IP-based blocks, or testing region-specific pages.
-See the [full proxy reference](/services/additional-parameters/proxy) for all available options.
+## How to pass a proxy
-## Choosing a fetch mode
+Use the `proxy` parameter available in SmartScraper, SearchScraper, and Markdownify:
-The `mode` parameter controls how pages are retrieved:
-
-| Mode | Description |
-|------|-------------|
-| `auto` | Automatically selects the best strategy (default) |
-| `fast` | Direct HTTP fetch, no JS rendering — fastest option |
-| `js` | Headless browser for JavaScript-heavy pages |
-
-Set `stealth: true` alongside any mode to enable residential proxy with anti-bot headers.
-
-## Examples
-
-### Geo-targeted content
-
-Access content from a specific country using the `country` parameter:
-
-
-
-```python Python
-from scrapegraph_py import Client, FetchConfig
+```python
+from scrapegraph_py import Client
client = Client(api_key="your-api-key")
-response = client.extract(
- url="https://example.com",
- prompt="Extract the main content",
- fetch_config=FetchConfig(country="de"), # Route through Germany
+response = client.smartscraper(
+ website_url="https://example.com",
+ user_prompt="Extract the main content",
+ proxy="http://username:password@proxy-host:8080",
)
```
-```javascript JavaScript
-import { extract } from 'scrapegraph-js';
-
-const result = await extract('your-api-key', {
- url: 'https://example.com',
- prompt: 'Extract the main content',
- fetchConfig: { country: 'de' },
-});
+```javascript
+import { smartScraper } from "scrapegraph-js";
+
+const result = await smartScraper(
+ "your-api-key",
+ "https://example.com",
+ "Extract the main content",
+ {
+ proxy: "http://username:password@proxy-host:8080",
+ }
+);
```
-
+See the [proxy parameter documentation](/services/additional-parameters/proxy) for the full reference.
-### Stealth mode for protected sites
+## Proxy URL format
-Use stealth modes to bypass anti-bot protections:
-
-
-
-```python Python
-from scrapegraph_py import Client, FetchConfig
+```
+http://username:password@host:port
+socks5://username:password@host:port
+```
-client = Client(api_key="your-api-key")
+If the proxy does not require authentication:
-response = client.scrape(
- url="https://protected-site.com",
- format="markdown",
- fetch_config=FetchConfig(
- mode="js",
- stealth=True,
- wait=3000,
- scrolls=3,
- country="us",
- ),
-)
```
-
-```javascript JavaScript
-import { scrape } from 'scrapegraph-js';
-
-const result = await scrape('your-api-key', {
- url: 'https://protected-site.com',
- formats: [{ type: 'markdown' }],
- fetchConfig: {
- mode: 'js',
- stealth: true,
- wait: 3000,
- scrolls: 3,
- country: 'us',
- },
-});
+http://host:port
```
-
+## Common use cases
-### Custom headers and cookies
+### Geo-targeted content
-Pass custom HTTP headers or cookies with your requests:
+Access content that is only available in a specific country:
-
+```python
+# Using a proxy located in Germany
+proxy = "http://user:pass@de-proxy.example.com:8080"
+```
-```python Python
-from scrapegraph_py import Client, FetchConfig
+### Bypassing IP-based rate limits
-client = Client(api_key="your-api-key")
+If the target website blocks your IP after too many requests, rotate through a pool of proxy IPs:
-response = client.extract(
- url="https://example.com",
- prompt="Extract product details",
- fetch_config=FetchConfig(
- headers={"Accept-Language": "en-US"},
- cookies={"session": "abc123"},
- ),
-)
-```
+```python
+import itertools
-```javascript JavaScript
-import { extract } from 'scrapegraph-js';
-
-const result = await extract('your-api-key', {
- url: 'https://example.com',
- prompt: 'Extract product details',
- fetchConfig: {
- headers: { 'Accept-Language': 'en-US' },
- cookies: { session: 'abc123' },
- },
-});
-```
+proxies = itertools.cycle([
+ "http://user:pass@proxy1.example.com:8080",
+ "http://user:pass@proxy2.example.com:8080",
+ "http://user:pass@proxy3.example.com:8080",
+])
-
+for url in urls_to_scrape:
+ response = client.smartscraper(
+ website_url=url,
+ user_prompt="Extract the product details",
+ proxy=next(proxies),
+ )
+```
## Tips
-- Start with `mode: "auto"` and only switch to a specific mode if you need to.
-- Set `stealth: true` for sites with strong anti-bot protections (combine with `mode: "js"` for dynamic sites).
-- Add `wait` time for pages that load content dynamically after the initial render.
-- Use `scrolls` to trigger lazy-loaded content on infinite-scroll pages.
-- The `country` parameter doesn't affect pricing — credits are charged the same regardless of proxy location.
+- Use a reputable proxy provider for reliable uptime and performance.
+- Test your proxy connection independently before passing it to ScrapeGraphAI to rule out proxy-side issues.
+- Do not use public/free proxies for sensitive data — they may log or modify your traffic.
diff --git a/knowledge-base/troubleshooting/empty-results.mdx b/knowledge-base/troubleshooting/empty-results.mdx
index 0163d04..b74e467 100644
--- a/knowledge-base/troubleshooting/empty-results.mdx
+++ b/knowledge-base/troubleshooting/empty-results.mdx
@@ -47,10 +47,10 @@ If you define an `output_schema` with required fields, the LLM will return `null
If you have exhausted your credits or are being rate-limited, the API may return an empty or error response.
-**Fix:** Check your [dashboard](https://scrapegraphai.com/dashboard) for remaining credits and current usage.
+**Fix:** Check your [dashboard](https://dashboard.scrapegraphai.com) for remaining credits and current usage.
## Debugging tips
- Log the full API response — the `result` key contains the extracted data; `status` and `error` keys may contain useful information.
- Test the URL with a simple prompt like `"What is the main heading of this page?"` to verify that extraction works at all.
-- Use the [interactive playground](https://scrapegraphai.com/dashboard) to test your URL and prompt before integrating.
+- Use the [interactive playground](https://dashboard.scrapegraphai.com) to test your URL and prompt before integrating.
diff --git a/knowledge-base/troubleshooting/rate-limiting.mdx b/knowledge-base/troubleshooting/rate-limiting.mdx
index d5460c1..6a5732f 100644
--- a/knowledge-base/troubleshooting/rate-limiting.mdx
+++ b/knowledge-base/troubleshooting/rate-limiting.mdx
@@ -28,7 +28,7 @@ When you exceed the rate limit, the API returns an HTTP `429 Too Many Requests`
| Enterprise | Custom | Custom |
- Check the [dashboard](https://scrapegraphai.com/dashboard) for up-to-date limits for your current plan.
+ Check the [dashboard](https://dashboard.scrapegraphai.com) for up-to-date limits for your current plan.
## How to handle rate limits in code
@@ -56,16 +56,21 @@ def scrape_with_retry(url: str, prompt: str, max_retries: int = 3):
### JavaScript — with retry
```javascript
-import { extract } from "scrapegraph-js";
+import { smartScraper } from "scrapegraph-js";
async function scrapeWithRetry(apiKey, url, prompt, retries = 3) {
for (let i = 0; i < retries; i++) {
- const result = await extract(apiKey, { url, prompt });
- if (result.status === "success") return result;
- // Exponential backoff
- const wait = Math.pow(2, i) * 1000;
- console.log(`Attempt ${i + 1} failed: ${result.error}. Retrying in ${wait}ms...`);
- await new Promise((r) => setTimeout(r, wait));
+ try {
+ return await smartScraper(apiKey, url, prompt);
+ } catch (err) {
+ if (err.status === 429) {
+ const wait = Math.pow(2, i) * 1000;
+ console.log(`Rate limited. Retrying in ${wait}ms...`);
+ await new Promise((r) => setTimeout(r, wait));
+ } else {
+ throw err;
+ }
+ }
}
throw new Error("Max retries exceeded");
}
diff --git a/logo/dark.svg b/logo/dark.svg
new file mode 100644
index 0000000..33285d6
--- /dev/null
+++ b/logo/dark.svg
@@ -0,0 +1,145 @@
+
+
+
+
diff --git a/logo/light.svg b/logo/light.svg
new file mode 100644
index 0000000..33285d6
--- /dev/null
+++ b/logo/light.svg
@@ -0,0 +1,145 @@
+
+
+
+
diff --git a/logos/logo-color.svg b/logos/logo-color.svg
deleted file mode 100644
index 6fb828b..0000000
--- a/logos/logo-color.svg
+++ /dev/null
@@ -1,15 +0,0 @@
-
diff --git a/logos/logo-dark-alt.svg b/logos/logo-dark-alt.svg
deleted file mode 100644
index cd47e15..0000000
--- a/logos/logo-dark-alt.svg
+++ /dev/null
@@ -1,15 +0,0 @@
-
diff --git a/logos/logo-dark.svg b/logos/logo-dark.svg
deleted file mode 100644
index 8545571..0000000
--- a/logos/logo-dark.svg
+++ /dev/null
@@ -1,15 +0,0 @@
-
diff --git a/logos/logo-light.svg b/logos/logo-light.svg
deleted file mode 100644
index 6fb828b..0000000
--- a/logos/logo-light.svg
+++ /dev/null
@@ -1,15 +0,0 @@
-
diff --git a/resources/blog.mdx b/resources/blog.mdx
index 2076c5e..0d4aa0a 100644
--- a/resources/blog.mdx
+++ b/resources/blog.mdx
@@ -44,7 +44,7 @@ Master the art of prompt engineering for AI web scraping. This comprehensive gui
## Additional Resources
- **Complete Guide**: [The Art of Prompting](https://scrapegraphai.com/blog/prompt-engineering-guide)
-- **Practice in Playground**: [Test your prompts](https://scrapegraphai.com/dashboard)
+- **Practice in Playground**: [Test your prompts](https://dashboard.scrapegraphai.com/playground)
- **Community Support**: [Discord discussions](https://discord.gg/uJN7TYcpNa)
- **Examples**: Check our [Cookbook](/cookbook/introduction) for real-world implementations
diff --git a/sdks/javascript.mdx b/sdks/javascript.mdx
index 6cdeaa5..ed4fb55 100644
--- a/sdks/javascript.mdx
+++ b/sdks/javascript.mdx
@@ -1,6 +1,6 @@
---
title: "JavaScript SDK"
-description: "Official JavaScript/TypeScript SDK for ScrapeGraphAI v2"
+description: "Official JavaScript/TypeScript SDK for ScrapeGraphAI"
icon: "js"
---
@@ -22,6 +22,8 @@ icon: "js"
## Installation
+Install the package using npm, pnpm, yarn or bun:
+
```bash
# Using npm
npm i scrapegraph-js
@@ -36,32 +38,55 @@ yarn add scrapegraph-js
bun add scrapegraph-js
```
+## Features
+
+- **AI-Powered Extraction**: Smart web scraping with artificial intelligence
+- **Async by Design**: Fully asynchronous architecture
+- **Type Safety**: Built-in TypeScript support with Zod schemas
+- **Zero Exceptions**: All errors wrapped in `ApiResult` — no try/catch needed
+- **Developer Friendly**: Comprehensive error handling and debug logging
+
## Quick Start
+### Basic example
+
+
+ Store your API keys securely in environment variables. Use `.env` files and
+ libraries like `dotenv` to load them into your app.
+
+
```javascript
-import { scrape } from "scrapegraph-js";
+import { smartScraper } from "scrapegraph-js";
+import "dotenv/config";
+
+const apiKey = process.env.SGAI_APIKEY;
-const result = await scrape("your-api-key", {
- url: "https://example.com",
- formats: [{ type: "markdown" }],
+const response = await smartScraper(apiKey, {
+ website_url: "https://example.com",
+ user_prompt: "What does the company do?",
});
-if (result.status === "success") {
- console.log(result.data?.results.markdown?.data);
+if (response.status === "error") {
+ console.error("Error:", response.error);
} else {
- console.error(result.error);
+ console.log(response.data.result);
}
```
-
-Store your API keys securely in environment variables. Use `.env` files and
-libraries like `dotenv` to load them into your app.
-
+## Services
-## Return Type
+### SmartScraper
-All functions return `ApiResult`:
+Extract specific information from any webpage using AI:
+```javascript
+const response = await smartScraper(apiKey, {
+ website_url: "https://example.com",
+ user_prompt: "Extract the main content",
+});
+```
+
+All functions return an `ApiResult` object:
```typescript
type ApiResult = {
status: "success" | "error";
@@ -71,401 +96,330 @@ type ApiResult = {
};
```
-Check `result.status` before accessing `result.data`.
-
-## Services
-
-### scrape
+#### Parameters
-Scrape a webpage in multiple formats (markdown, html, screenshot, json, etc).
+| Parameter | Type | Required | Description |
+| --------------- | ------- | -------- | ----------------------------------------------------------------------------------- |
+| apiKey | string | Yes | The ScrapeGraph API Key (first argument). |
+| user_prompt | string | Yes | A textual description of what you want to extract. |
+| website_url | string | No* | The URL of the webpage to scrape. *One of `website_url`, `website_html`, or `website_markdown` is required. |
+| output_schema | object | No | A Zod schema (converted to JSON) that describes the structure of the response. |
+| number_of_scrolls | number | No | Number of scrolls for infinite scroll pages (0-50). |
+| stealth | boolean | No | Enable anti-detection mode (+4 credits). |
+| headers | object | No | Custom HTTP headers. |
+| mock | boolean | No | Enable mock mode for testing. |
+| wait_ms | number | No | Page load wait time in ms (default: 3000). |
+| country_code | string | No | Proxy routing country code (e.g., "us"). |
+
+
+Define a simple schema using Zod:
```javascript
-import { scrape } from "scrapegraph-js";
-
-const result = await scrape("your-api-key", {
- url: "https://example.com",
- formats: [
- { type: "markdown", mode: "reader" },
- { type: "screenshot", fullPage: true, width: 1440, height: 900 },
- { type: "json", prompt: "Extract product info" },
- ],
- contentType: "text/html", // optional, auto-detected
- fetchConfig: { // optional
- mode: "js",
- stealth: true,
- timeout: 30000,
- wait: 2000,
- scrolls: 3,
- },
+import { z } from "zod";
+
+const ArticleSchema = z.object({
+ title: z.string().describe("The article title"),
+ author: z.string().describe("The author's name"),
+ publishDate: z.string().describe("Article publication date"),
+ content: z.string().describe("Main article content"),
+ category: z.string().describe("Article category"),
});
-```
-#### Parameters
+const ArticlesArraySchema = z
+ .array(ArticleSchema)
+ .describe("Array of articles");
-| Parameter | Type | Required | Description |
-| -------------------- | ------------- | -------- | -------------------------------------------------------- |
-| url | string | Yes | The URL of the webpage to scrape |
-| formats | FormatEntry[] | No | Array of format entries. Defaults to `[{ type: "markdown" }]` |
-| contentType | string | No | Override the detected content type (e.g. `"application/pdf"`) |
-| fetchConfig | FetchConfig | No | Fetch configuration |
-
-**Formats:**
-- `markdown` -- Clean markdown (modes: `normal`, `reader`, `prune`)
-- `html` -- Raw HTML (modes: `normal`, `reader`, `prune`)
-- `links` -- All links on the page
-- `images` -- All image URLs
-- `summary` -- AI-generated summary
-- `json` -- Structured extraction with prompt/schema
-- `branding` -- Brand colors, typography, logos
-- `screenshot` -- Page screenshot (fullPage, width, height, quality)
-
-
-```javascript
-import { scrape } from "scrapegraph-js";
-
-const result = await scrape("your-api-key", {
- url: "https://example.com",
- formats: [
- { type: "markdown", mode: "reader" },
- { type: "links" },
- { type: "images" },
- { type: "screenshot", fullPage: false, width: 1440, height: 900, quality: 90 },
- ],
+const response = await smartScraper(apiKey, {
+ website_url: "https://example.com/blog/article",
+ user_prompt: "Extract the article information",
+ output_schema: ArticlesArraySchema,
});
-if (result.status === "success") {
- const results = result.data?.results;
- console.log("Markdown:", results?.markdown?.data);
- console.log("Links:", results?.links?.data);
- console.log("Screenshot URL:", results?.screenshot?.data.url);
-}
+console.log(`Title: ${response.data.result.title}`);
+console.log(`Author: ${response.data.result.author}`);
+console.log(`Published: ${response.data.result.publishDate}`);
```
-
-### extract
+
-Extract structured data from a URL, HTML, or markdown using AI.
+
+Define a complex schema for nested data structures:
```javascript
-import { extract } from "scrapegraph-js";
+import { z } from "zod";
-const result = await extract("your-api-key", {
- url: "https://example.com",
- prompt: "Extract the main heading and description",
+const EmployeeSchema = z.object({
+ name: z.string().describe("Employee's full name"),
+ position: z.string().describe("Job title"),
+ department: z.string().describe("Department name"),
+ email: z.string().describe("Email address"),
});
-if (result.status === "success") {
- console.log(result.data?.json);
-}
-```
-
-#### Parameters
+const OfficeSchema = z.object({
+ location: z.string().describe("Office location/city"),
+ address: z.string().describe("Full address"),
+ phone: z.string().describe("Contact number"),
+});
-| Parameter | Type | Required | Description |
-| -------------------- | ----------- | -------- | -------------------------------------------------------- |
-| url | string | Yes\* | The URL of the webpage to scrape |
-| prompt | string | Yes | A description of what you want to extract |
-| schema | object | No | JSON schema for structured response |
-| mode | string | No | HTML processing mode: `"normal"`, `"reader"`, `"prune"` |
-| contentType | string | No | Override the detected content type |
-| html | string | No | Raw HTML input (alternative to `url`) |
-| markdown | string | No | Raw markdown input (alternative to `url`) |
-| fetchConfig | FetchConfig | No | Fetch configuration |
+const CompanySchema = z.object({
+ name: z.string().describe("Company name"),
+ description: z.string().describe("Company description"),
+ industry: z.string().describe("Industry sector"),
+ foundedYear: z.number().describe("Year company was founded"),
+ employees: z.array(EmployeeSchema).describe("List of key employees"),
+ offices: z.array(OfficeSchema).describe("Company office locations"),
+ website: z.string().url().describe("Company website URL"),
+});
-
-\*One of `url`, `html`, or `markdown` is required.
-
+const response = await smartScraper(apiKey, {
+ website_url: "https://example.com/about",
+ user_prompt: "Extract detailed company information including employees and offices",
+ output_schema: CompanySchema,
+});
-
-```javascript
-import { extract } from "scrapegraph-js";
-
-const result = await extract("your-api-key", {
- url: "https://example.com/article",
- prompt: "Extract the article information",
- schema: {
- type: "object",
- properties: {
- title: { type: "string" },
- author: { type: "string" },
- publishDate: { type: "string" },
- content: { type: "string" },
- },
- required: ["title"],
- },
+console.log(`Company: ${response.data.result.name}`);
+console.log("\nKey Employees:");
+response.data.result.employees.forEach((employee) => {
+ console.log(`- ${employee.name} (${employee.position})`);
});
-if (result.status === "success") {
- console.log("Extracted:", result.data?.json);
- console.log("Tokens:", result.data?.usage);
-}
+console.log("\nOffice Locations:");
+response.data.result.offices.forEach((office) => {
+ console.log(`- ${office.location}: ${office.address}`);
+});
```
-
-### search
+
-Search the web and optionally extract structured data.
+
+For modern web applications built with React, Vue, Angular, or other JavaScript frameworks:
```javascript
-import { search } from "scrapegraph-js";
+import { smartScraper } from 'scrapegraph-js';
+import { z } from 'zod';
-const result = await search("your-api-key", {
- query: "best programming languages 2024",
- numResults: 5,
-});
+const apiKey = 'your-api-key';
-if (result.status === "success") {
- for (const r of result.data?.results ?? []) {
- console.log(`${r.title} - ${r.url}`);
- }
-}
-```
-
-#### Parameters
+const ProductSchema = z.object({
+ name: z.string().describe('Product name'),
+ price: z.string().describe('Product price'),
+ description: z.string().describe('Product description'),
+ availability: z.string().describe('Product availability status')
+});
-| Parameter | Type | Required | Description |
-| ------------------------ | ----------- | -------- | -------------------------------------------------------- |
-| query | string | Yes | The search query (1-500 chars) |
-| numResults | number | No | Number of results (1-20). Default: 3 |
-| prompt | string | No | Prompt for AI extraction from results |
-| schema | object | No | JSON schema for structured response (requires `prompt`) |
-| format | string | No | `"markdown"` (default) or `"html"` |
-| mode | string | No | HTML processing mode: `"normal"`, `"reader"`, `"prune"` (default) |
-| locationGeoCode | string | No | Country code for localized search (e.g. `"us"`) |
-| timeRange | string | No | Recency filter: `"past_hour"`, `"past_24_hours"`, `"past_week"`, `"past_month"`, `"past_year"` |
-| fetchConfig | FetchConfig | No | Fetch configuration |
-
-
-```javascript
-import { search } from "scrapegraph-js";
-
-const result = await search("your-api-key", {
- query: "typescript best practices",
- numResults: 5,
- prompt: "Extract the main tips and recommendations",
- schema: {
- type: "object",
- properties: {
- tips: { type: "array", items: { type: "string" } },
- },
- },
+const response = await smartScraper(apiKey, {
+ website_url: 'https://example-react-store.com/products/123',
+ user_prompt: 'Extract product details including name, price, description, and availability',
+ output_schema: ProductSchema,
});
-if (result.status === "success") {
- console.log("Results:", result.data?.results.length);
- console.log("Extracted:", result.data?.json);
+if (response.status === 'error') {
+ console.error('Error:', response.error);
+} else {
+ console.log('Product:', response.data.result.name);
+ console.log('Price:', response.data.result.price);
+ console.log('Available:', response.data.result.availability);
}
```
+
-### generateSchema
+### SearchScraper
-Generate a JSON schema from a natural language description.
+Search and extract information from multiple web sources using AI:
```javascript
-import { generateSchema } from "scrapegraph-js";
-
-const result = await generateSchema("your-api-key", {
- prompt: "Schema for a product with name, price, and rating",
- existingSchema: { /* optional, to modify */ },
+const response = await searchScraper(apiKey, {
+ user_prompt: "Find the best restaurants in San Francisco",
+ location_geo_code: "us",
+ time_range: "past_week",
});
-
-if (result.status === "success") {
- console.log("Refined prompt:", result.data?.refinedPrompt);
- console.log("Schema:", result.data?.schema);
-}
```
-### crawl
+#### Parameters
+
+| Parameter | Type | Required | Description |
+| ------------------ | ------- | -------- | ---------------------------------------------------------------------------------- |
+| apiKey | string | Yes | The ScrapeGraph API Key (first argument). |
+| user_prompt | string | Yes | A textual description of what you want to achieve. |
+| num_results | number | No | Number of websites to search (3-20). Default: 3. |
+| extraction_mode | boolean | No | **true** = AI extraction mode (10 credits/page), **false** = markdown mode (2 credits/page). |
+| output_schema | object | No | Zod schema for structured response format (AI extraction mode only). |
+| location_geo_code | string | No | Geo code for location-based search (e.g., "us"). |
+| time_range | string | No | Time range filter. Options: "past_hour", "past_24_hours", "past_week", "past_month", "past_year". |
-Crawl a website and its linked pages.
+
+Define a simple schema using Zod:
```javascript
-import { crawl } from "scrapegraph-js";
-
-// Start a crawl
-const result = await crawl.start("your-api-key", {
- url: "https://example.com",
- formats: [{ type: "markdown" }],
- maxPages: 50,
- maxDepth: 2,
- maxLinksPerPage: 10,
- includePatterns: ["/blog/*"],
- excludePatterns: ["/admin/*"],
- fetchConfig: { /* ... */ },
+import { z } from "zod";
+
+const ArticleSchema = z.object({
+ title: z.string().describe("The article title"),
+ author: z.string().describe("The author's name"),
+ publishDate: z.string().describe("Article publication date"),
+ content: z.string().describe("Main article content"),
+ category: z.string().describe("Article category"),
});
-console.log("Crawl ID:", result.data?.id);
-
-// Check status
-const status = await crawl.get("your-api-key", result.data?.id);
+const response = await searchScraper(apiKey, {
+ user_prompt: "Find news about the latest trends in AI",
+ output_schema: ArticleSchema,
+ location_geo_code: "us",
+ time_range: "past_week",
+});
-// Control crawl
-await crawl.stop("your-api-key", result.data?.id);
-await crawl.resume("your-api-key", result.data?.id);
-await crawl.delete("your-api-key", result.data?.id);
+console.log(`Title: ${response.data.result.title}`);
+console.log(`Author: ${response.data.result.author}`);
+console.log(`Published: ${response.data.result.publishDate}`);
```
-#### crawl.start() Parameters
+
-| Parameter | Type | Required | Description |
-| --------------------------- | ------------- | -------- | -------------------------------------------------------- |
-| url | string | Yes | The starting URL |
-| formats | FormatEntry[] | No | Output formats per page. Defaults to `[{ type: "markdown" }]` |
-| maxDepth | number | No | Maximum crawl depth. Default: `2` |
-| maxPages | number | No | Maximum pages to crawl (1-1000). Default: `50` |
-| maxLinksPerPage | number | No | Maximum links followed per page. Default: `10` |
-| allowExternal | boolean | No | Allow crossing domains. Default: `false` |
-| includePatterns | string[] | No | URL patterns to include |
-| excludePatterns | string[] | No | URL patterns to exclude |
-| contentTypes | string[] | No | Allowed content types |
-| fetchConfig | FetchConfig | No | Fetch configuration |
+
+Define a complex schema for nested data structures:
-### monitor
+```javascript
+import { z } from "zod";
-Monitor a webpage for changes on a schedule.
+const EmployeeSchema = z.object({
+ name: z.string().describe("Employee's full name"),
+ position: z.string().describe("Job title"),
+ department: z.string().describe("Department name"),
+ email: z.string().describe("Email address"),
+});
-```javascript
-import { monitor } from "scrapegraph-js";
-
-// Create a monitor
-const result = await monitor.create("your-api-key", {
- url: "https://example.com",
- name: "Price Monitor",
- interval: "0 * * * *", // cron expression
- formats: [{ type: "markdown" }],
- webhookUrl: "https://...", // optional
- fetchConfig: { /* ... */ },
+const OfficeSchema = z.object({
+ location: z.string().describe("Office location/city"),
+ address: z.string().describe("Full address"),
+ phone: z.string().describe("Contact number"),
});
-console.log("Monitor ID:", result.data?.cronId);
+const RestaurantSchema = z.object({
+ name: z.string().describe("Restaurant name"),
+ address: z.string().describe("Restaurant address"),
+ rating: z.number().describe("Restaurant rating"),
+ website: z.string().url().describe("Restaurant website URL"),
+});
-// Manage monitors
-const all = await monitor.list("your-api-key");
-const details = await monitor.get("your-api-key", cronId);
-await monitor.update("your-api-key", cronId, { interval: "0 */6 * * *" });
-await monitor.pause("your-api-key", cronId);
-await monitor.resume("your-api-key", cronId);
-await monitor.delete("your-api-key", cronId);
+const response = await searchScraper(apiKey, {
+ user_prompt: "Find the best restaurants in San Francisco",
+ output_schema: RestaurantSchema,
+ location_geo_code: "us",
+ time_range: "past_month",
+});
```
-#### monitor.activity() — poll tick history
+
-Paginate through per-run ticks for a monitor (what changed on each scheduled run).
+
+Use markdown mode for cost-effective content gathering:
```javascript
-import { monitor } from "scrapegraph-js";
-
-const activity = await monitor.activity("your-api-key", cronId, { limit: 20 });
-
-if (activity.status === "success") {
- for (const tick of activity.data?.ticks ?? []) {
- const changed = tick.changed ? "CHANGED" : "no change";
- console.log(`[${tick.createdAt}] ${tick.status} - ${changed} (${tick.elapsedMs}ms)`);
- }
-
- if (activity.data?.nextCursor) {
- const next = await monitor.activity("your-api-key", cronId, {
- limit: 20,
- cursor: activity.data.nextCursor,
- });
- }
+import { searchScraper } from 'scrapegraph-js';
+
+const apiKey = 'your-api-key';
+
+const response = await searchScraper(apiKey, {
+ user_prompt: 'Latest developments in artificial intelligence',
+ num_results: 3,
+ extraction_mode: false,
+ location_geo_code: "us",
+ time_range: "past_week",
+});
+
+if (response.status === 'error') {
+ console.error('Error:', response.error);
+} else {
+ const markdownContent = response.data.markdown_content;
+ console.log('Markdown content length:', markdownContent.length);
+ console.log('Reference URLs:', response.data.reference_urls);
+ console.log('Content preview:', markdownContent.substring(0, 500) + '...');
}
```
-Params: `limit` (1–100, default `20`), `cursor` (opaque pagination token). Each tick has `id`, `createdAt`, `status`, `changed`, `elapsedMs`, and `diffs` with per-format deltas.
+**Markdown Mode Benefits:**
+- **Cost-effective**: Only 2 credits per page (vs 10 credits for AI extraction)
+- **Full content**: Get complete page content in markdown format
+- **Faster**: No AI processing overhead
+- **Perfect for**: Content analysis, bulk data collection, building datasets
-### getCredits
+
-Check your account credit balance.
+
+Filter search results by date range to get only recent information:
```javascript
-import { getCredits } from "scrapegraph-js";
+import { searchScraper } from 'scrapegraph-js';
-const result = await getCredits("your-api-key");
+const apiKey = 'your-api-key';
-if (result.status === "success") {
- console.log(`Remaining: ${result.data?.remaining}`);
- console.log(`Used: ${result.data?.used}`);
- console.log(`Plan: ${result.data?.plan}`);
+const response = await searchScraper(apiKey, {
+ user_prompt: 'Latest news about AI developments',
+ num_results: 5,
+ time_range: 'past_week', // Options: 'past_hour', 'past_24_hours', 'past_week', 'past_month', 'past_year'
+});
+
+if (response.status === 'error') {
+ console.error('Error:', response.error);
+} else {
+ console.log('Recent AI news:', response.data.result);
+ console.log('Reference URLs:', response.data.reference_urls);
}
```
-### checkHealth
+**Time Range Options:**
+- `past_hour` - Results from the past hour
+- `past_24_hours` - Results from the past 24 hours
+- `past_week` - Results from the past week
+- `past_month` - Results from the past month
+- `past_year` - Results from the past year
-Check API health status.
+**Use Cases:**
+- Finding recent news and updates
+- Tracking time-sensitive information
+- Getting latest product releases
+- Monitoring recent market changes
-```javascript
-import { checkHealth } from "scrapegraph-js";
-
-const result = await checkHealth("your-api-key");
-// { status: "ok", uptime: 12345 }
-```
+
-### history
+### Markdownify
-Fetch request history.
+Convert any webpage into clean, formatted markdown:
```javascript
-import { history } from "scrapegraph-js";
-
-const list = await history.list("your-api-key", {
- service: "scrape", // optional filter
- page: 1,
- limit: 20,
+const response = await markdownify(apiKey, {
+ website_url: "https://example.com",
});
-
-const entry = await history.get("your-api-key", "request-id");
```
-## Configuration Objects
-
-### FetchConfig
-
-Controls how pages are fetched. See the [proxy configuration guide](/services/additional-parameters/proxy) for details on modes and geotargeting.
-
-```javascript
-{
- mode: 'js', // Fetch mode: auto, fast, js
- stealth: true, // Enable stealth mode (residential proxy, anti-bot headers)
- timeout: 15000, // Request timeout in ms (1000-60000)
- wait: 2000, // Wait after page load in ms (0-30000)
- scrolls: 3, // Number of scrolls (0-100)
- country: 'us', // Proxy country code (ISO 3166-1 alpha-2)
- headers: { 'X-Custom': 'header' },
- cookies: { key: 'value' },
- mock: false, // Enable mock mode for testing
-}
-```
+#### Parameters
+| Parameter | Type | Required | Description |
+| ----------- | ------- | -------- | ---------------------------------------------- |
+| apiKey | string | Yes | The ScrapeGraph API Key (first argument). |
+| website_url | string | Yes | The URL of the webpage to convert to markdown. |
+| wait_ms | number | No | Page load wait time in ms (default: 3000). |
+| stealth | boolean | No | Enable anti-detection mode (+4 credits). |
+| country_code| string | No | Proxy routing country code (e.g., "us"). |
-## Error Handling
+## API Credits
-Functions return `ApiResult` with a `status` field. Check `status` before accessing `data`:
+Check your available API credits:
```javascript
-import { extract } from "scrapegraph-js";
+import { getCredits } from "scrapegraph-js";
-const result = await extract("your-api-key", {
- url: "https://example.com",
- prompt: "Extract the title",
-});
+const credits = await getCredits(apiKey);
-if (result.status === "success") {
- console.log(result.data);
+if (credits.status === "error") {
+ console.error("Error fetching credits:", credits.error);
} else {
- console.error(`Request failed: ${result.error}`);
+ console.log("Remaining credits:", credits.data.remaining_credits);
+ console.log("Total used:", credits.data.total_credits_used);
}
```
-## Environment Variables
-
-| Variable | Description | Default |
-|----------|-------------|---------|
-| `SGAI_API_URL` | Override API base URL | `https://api.scrapegraphai.com/api/v2` |
-| `SGAI_DEBUG` | Enable debug logging (`"1"`) | off |
-| `SGAI_TIMEOUT` | Request timeout in seconds | `120` |
-
## Support
@@ -484,3 +438,9 @@ if (result.status === "success") {
Get help from our development team
+
+
+ This project is licensed under the MIT License. See the
+ [LICENSE](https://github.com/ScrapeGraphAI/scrapegraph-js/blob/main/LICENSE)
+ file for details.
+
diff --git a/sdks/mocking.mdx b/sdks/mocking.mdx
index 7a77981..592de50 100644
--- a/sdks/mocking.mdx
+++ b/sdks/mocking.mdx
@@ -1,264 +1,594 @@
---
title: 'Mocking & Testing'
-description: 'Test ScrapeGraphAI v2 functionality without consuming API credits'
+description: 'Test ScrapeGraphAI functionality in an isolated environment without consuming API credits'
icon: 'test-tube'
---
+
+
-
- Use familiar testing tools for mocking
+
+ Test your code without making real API calls
-
- Test without consuming API credits
+
+ Override responses for specific endpoints
## Overview
-In v2, the built-in mock mode (`mock=True`, `mock_handler`, `mock_responses`) has been removed from the SDKs. Instead, use standard mocking libraries for your language to test ScrapeGraphAI integrations without making real API calls or consuming credits.
+A mock environment is an isolated test environment. You can use mock mode to test ScrapeGraphAI functionality in your application, and experiment with new features without affecting your live integration or consuming API credits. For example, when testing in mock mode, the scraping requests you create aren't processed by our servers or counted against your credit usage.
+
+## Use cases
+
+Mock mode provides an environment for testing various functionalities and scenarios without the implications of real API calls. Below are some common use cases for mocking in your ScrapeGraphAI integrations:
+
+| Scenario | Description |
+|----------|-------------|
+| **Simulate scraping responses to test without real API calls** | Use mock mode to test scraping functionality without real API calls. Create mock responses in your application to test data processing logic or use custom handlers to simulate various response scenarios. |
+| **Scale isolated testing for teams** | Your team can test in separate mock environments to make sure that data and actions are completely isolated from other tests. Changes made in one mock configuration don't interfere with changes in another. |
+| **Test without API key requirements** | You can test your integration without providing real API keys, making it easier for external developers, implementation partners, or design agencies to work with your code without access to your live API credentials. |
+| **Test in development or CI/CD pipelines** | Access mock mode from your development environment or continuous integration pipelines. Test ScrapeGraphAI functionality directly in your code or use familiar testing frameworks and fixtures. |
-
-If you're migrating from v1, replace `Client(mock=True)` with standard mocking patterns shown below.
-
+## Test in mock mode
-## Python SDK Testing
+You can simulate scraping responses and use mock data to test your integration without consuming API credits. Learn more about using mock responses to confirm that your integration works correctly.
-### Using `unittest.mock`
+## Basic Mock Usage
+
+Enable mock mode by setting `mock=True` when initializing the client:
```python
-from unittest.mock import patch, MagicMock
from scrapegraph_py import Client
+from scrapegraph_py.logger import sgai_logger
-def test_extract():
- client = Client(api_key="test-key")
+# Set logging level for better visibility
+sgai_logger.set_logging(level="INFO")
- mock_response = {
- "data": {
- "title": "Test Page",
- "content": "This is test content"
- },
- "request_id": "test-request-123"
- }
+def basic_mock_usage():
+ # Initialize the client with mock mode enabled
+ client = Client.from_env(mock=True)
- with patch.object(client, "extract", return_value=mock_response):
- response = client.extract(
- url="https://example.com",
- prompt="Extract title and content"
- )
+ print("\n-- get_credits (mock) --")
+ print(client.get_credits())
+
+ print("\n-- markdownify (mock) --")
+ md = client.markdownify(website_url="https://example.com")
+ print(md)
+
+ print("\n-- get_markdownify (mock) --")
+ md_status = client.get_markdownify("00000000-0000-0000-0000-000000000123")
+ print(md_status)
+
+ print("\n-- smartscraper (mock) --")
+ ss = client.smartscraper(user_prompt="Extract title", website_url="https://example.com")
+ print(ss)
- assert response["data"]["title"] == "Test Page"
- assert response["request_id"] == "test-request-123"
+if __name__ == "__main__":
+ basic_mock_usage()
```
-### Using `responses` Library
+
+When mock mode is enabled, all API calls return predefined mock responses instead of making real HTTP requests. This ensures your tests run quickly and don't consume API credits.
+
-Mock HTTP requests at the transport layer:
+## Custom Response Overrides
-```python
-import responses
-from scrapegraph_py import Client
+You can override specific endpoint responses using the `mock_responses` parameter:
-@responses.activate
-def test_extract_http():
- responses.post(
- "https://api.scrapegraphai.com/api/v2/extract",
- json={
- "data": {"title": "Mock Title"},
- "request_id": "mock-123"
+```python
+def mock_with_path_overrides():
+ # Initialize the client with mock mode and custom responses
+ client = Client.from_env(
+ mock=True,
+ mock_responses={
+ "/v1/credits": {"remaining_credits": 42, "total_credits_used": 58, "mock": true}
},
- status=200,
)
- client = Client(api_key="test-key")
- response = client.extract(
- url="https://example.com",
- prompt="Extract the title"
- )
+ print("\n-- get_credits with override (mock) --")
+ print(client.get_credits())
+```
- assert response["data"]["title"] == "Mock Title"
+
+You can override responses for any endpoint by providing the path and expected response:
+
+```python
+client = Client.from_env(
+ mock=True,
+ mock_responses={
+ "/v1/credits": {
+ "remaining_credits": 100,
+ "total_credits_used": 0,
+ "mock": true
+ },
+ "/v1/smartscraper/start": {
+ "job_id": "mock-job-123",
+ "status": "processing",
+ "mock": true
+ },
+ "/v1/smartscraper/status/mock-job-123": {
+ "job_id": "mock-job-123",
+ "status": "completed",
+ "result": {
+ "title": "Mock Title",
+ "content": "Mock content from the webpage",
+ "mock": true
+ }
+ },
+ "/v1/markdownify/start": {
+ "job_id": "mock-markdown-456",
+ "status": "processing",
+ "mock": true
+ },
+ "/v1/markdownify/status/mock-markdown-456": {
+ "job_id": "mock-markdown-456",
+ "status": "completed",
+ "result": "# Mock Markdown\n\nThis is mock markdown content.",
+ "mock": true
+ }
+ }
+)
```
+
+
+## Custom Handler Functions
-### Using `pytest` Fixtures
+For more complex mocking scenarios, you can provide a custom handler function:
```python
-import pytest
-from unittest.mock import MagicMock
-from scrapegraph_py import Client
+def mock_with_custom_handler():
+ def handler(method, url, kwargs):
+ return {"handled_by": "custom_handler", "method": method, "url": url}
-@pytest.fixture
-def mock_client():
- client = Client(api_key="test-key")
- client.extract = MagicMock(return_value={
- "data": {"title": "Mock Title"},
- "request_id": "mock-123"
- })
- client.search = MagicMock(return_value={
- "data": {"results": []},
- "request_id": "mock-456"
- })
- client.credits = MagicMock(return_value={
- "remaining_credits": 100,
- "total_credits_used": 0
- })
- return client
-
-def test_extract(mock_client):
- response = mock_client.extract(
- url="https://example.com",
- prompt="Extract the title"
- )
- assert response["data"]["title"] == "Mock Title"
+ # Initialize the client with mock mode and custom handler
+ client = Client.from_env(mock=True, mock_handler=handler)
-def test_credits(mock_client):
- credits = mock_client.credits()
- assert credits["remaining_credits"] == 100
+ print("\n-- searchscraper via custom handler (mock) --")
+ resp = client.searchscraper(user_prompt="Search something")
+ print(resp)
```
-### Async Testing with `aioresponses`
+
+Create sophisticated mock responses based on request parameters:
```python
-import pytest
-import asyncio
-from aioresponses import aioresponses
-from scrapegraph_py import AsyncClient
+def advanced_custom_handler():
+ def smart_handler(method, url, kwargs):
+ # Handle different endpoints with custom logic
+ if "/v1/credits" in url:
+ return {
+ "remaining_credits": 50,
+ "total_credits_used": 50,
+ "mock": true
+ }
+ elif "/v1/smartscraper" in url:
+ # Extract user_prompt from kwargs to create contextual responses
+ user_prompt = kwargs.get("user_prompt", "")
+ if "title" in user_prompt.lower():
+ return {
+ "job_id": "mock-title-job",
+ "status": "completed",
+ "result": {
+ "title": "Extracted Title",
+ "content": "This is the extracted content",
+ "mock": true
+ }
+ }
+ else:
+ return {
+ "job_id": "mock-generic-job",
+ "status": "completed",
+ "result": {
+ "data": "Generic extracted data",
+ "mock": true
+ }
+ }
+ else:
+ return {"error": "Unknown endpoint", "url": url}
+
+ client = Client.from_env(mock=True, mock_handler=smart_handler)
+
+ # Test different scenarios
+ print("Credits:", client.get_credits())
+ print("Title extraction:", client.smartscraper(
+ website_url="https://example.com",
+ user_prompt="Extract the title"
+ ))
+ print("Generic extraction:", client.smartscraper(
+ website_url="https://example.com",
+ user_prompt="Extract some data"
+ ))
+```
+
-@pytest.mark.asyncio
-async def test_async_extract():
- with aioresponses() as mocked:
- mocked.post(
- "https://api.scrapegraphai.com/api/v2/extract",
- payload={
- "data": {"title": "Async Mock"},
- "request_id": "async-123"
- },
+## Testing Best Practices
+
+### Unit Testing with Mocks
+
+```python
+import unittest
+from unittest.mock import patch
+from scrapegraph_py import Client
+
+class TestScrapeGraphAI(unittest.TestCase):
+ def setUp(self):
+ self.client = Client.from_env(mock=True)
+
+ def test_get_credits(self):
+ credits = self.client.get_credits()
+ self.assertIn("remaining_credits", credits)
+ self.assertIn("total_credits_used", credits)
+
+ def test_smartscraper_with_schema(self):
+ from pydantic import BaseModel, Field
+
+ class TestSchema(BaseModel):
+ title: str = Field(description="Page title")
+ content: str = Field(description="Page content")
+
+ response = self.client.smartscraper(
+ website_url="https://example.com",
+ user_prompt="Extract title and content",
+ output_schema=TestSchema
)
+
+ self.assertIsInstance(response, TestSchema)
+ self.assertIsNotNone(response.title)
+ self.assertIsNotNone(response.content)
+
+if __name__ == "__main__":
+ unittest.main()
+```
- async with AsyncClient(api_key="test-key") as client:
- response = await client.extract(
- url="https://example.com",
- prompt="Extract data"
- )
+### Integration Testing
- assert response["data"]["title"] == "Async Mock"
+```python
+def test_integration_flow():
+ """Test a complete workflow using mocks"""
+ client = Client.from_env(
+ mock=True,
+ mock_responses={
+ "/v1/credits": {"remaining_credits": 10, "total_credits_used": 90, "mock": true},
+ "/v1/smartscraper/start": {
+ "job_id": "test-job-123",
+ "status": "processing",
+ "mock": true
+ },
+ "/v1/smartscraper/status/test-job-123": {
+ "job_id": "test-job-123",
+ "status": "completed",
+ "result": {
+ "title": "Test Page",
+ "content": "Test content",
+ "mock": true
+ }
+ }
+ }
+ )
+
+ # Test the complete flow
+ credits = client.get_credits()
+ assert credits["remaining_credits"] == 10
+
+ # Start a scraping job
+ job = client.smartscraper(
+ website_url="https://example.com",
+ user_prompt="Extract title and content"
+ )
+
+ # Check job status
+ status = client.get_smartscraper("test-job-123")
+ assert status["status"] == "completed"
+ assert "title" in status["result"]
```
-## JavaScript SDK Testing
+## Environment Variables
-### Using Jest / Vitest
+You can also control mocking through environment variables:
-```javascript
-import { describe, it, expect, vi } from "vitest";
-import { extract, getCredits, search } from "scrapegraph-js";
-
-// Mock the module
-vi.mock("scrapegraph-js", () => ({
- extract: vi.fn().mockResolvedValue({
- status: "success",
- data: { raw: null, json: { title: "Mock Title" }, usage: {}, metadata: {} },
- elapsedMs: 100,
- }),
- search: vi.fn().mockResolvedValue({
- status: "success",
- data: { results: [] },
- elapsedMs: 100,
- }),
- getCredits: vi.fn().mockResolvedValue({
- status: "success",
- data: { remaining: 100, used: 50, plan: "pro" },
- elapsedMs: 50,
- }),
- };
-}));
-
-describe("ScrapeGraphAI", () => {
- it("should extract data", async () => {
- const result = await extract("test-key", {
- url: "https://example.com",
- prompt: "Extract the title",
- });
- expect(result.data?.json?.title).toBe("Mock Title");
- });
-
- it("should check credits", async () => {
- const result = await getCredits("test-key");
- expect(result.data?.remaining).toBe(100);
- });
-});
+```bash
+# Enable mock mode via environment variable
+export SGAI_MOCK=true
+
+# Set custom mock responses (JSON format)
+export SGAI_MOCK_RESPONSES='{"\/v1\/credits": {"remaining_credits": 100, "mock": true}}'
```
-### Using MSW (Mock Service Worker)
+```python
+# The client will automatically detect mock mode from environment
+client = Client.from_env() # Will use mock mode if SGAI_MOCK=true
+```
-Mock at the network level for more realistic testing:
+## Async Mocking
-```javascript
-import { http, HttpResponse } from "msw";
-import { setupServer } from "msw/node";
-import { extract } from "scrapegraph-js";
-
-const server = setupServer(
- http.post("https://api.scrapegraphai.com/v2/extract", () => {
- return HttpResponse.json({
- raw: null,
- json: { title: "MSW Mock Title" },
- usage: { promptTokens: 100, completionTokens: 50 },
- metadata: { chunker: { chunks: [] } },
- });
- })
-);
-
-beforeAll(() => server.listen());
-afterAll(() => server.close());
-afterEach(() => server.resetHandlers());
-
-test("extract returns mocked data", async () => {
- const result = await extract("test-key", {
- url: "https://example.com",
- prompt: "Extract the title",
- });
- expect(result.data?.json?.title).toBe("MSW Mock Title");
-});
+Mocking works seamlessly with async clients:
+
+```python
+import asyncio
+from scrapegraph_py import AsyncClient
+
+async def async_mock_example():
+ async with AsyncClient(mock=True) as client:
+ # All async methods work with mocks
+ credits = await client.get_credits()
+ print(f"Mock credits: {credits}")
+
+ response = await client.smartscraper(
+ website_url="https://example.com",
+ user_prompt="Extract data"
+ )
+ print(f"Mock response: {response}")
+
+# Run the async example
+asyncio.run(async_mock_example())
```
-## Testing with cURL
+## HTTP Method Mocking with cURL
+
+You can also test ScrapeGraphAI endpoints directly using cURL with mock responses. This is useful for testing API integrations without using SDKs.
+
+### Basic cURL Mock Usage
+
+```bash
+# Enable mock mode via environment variable
+export SGAI_MOCK=true
+
+# Test credits endpoint with mock
+curl -X GET "https://api.scrapegraph.ai/v1/credits" \
+ -H "Authorization: Bearer $SGAI_API_KEY" \
+ -H "Content-Type: application/json"
+```
-Test API endpoints directly using cURL against a local mock server or staging environment:
+### Custom Mock Responses with cURL
```bash
-# Test extract endpoint
-curl -X POST "https://api.scrapegraphai.com/api/v2/extract" \
- -H "Authorization: Bearer your-api-key" \
+# Set custom mock responses via environment variable
+export SGAI_MOCK_RESPONSES='{
+ "/v1/credits": {
+ "remaining_credits": 100,
+ "total_credits_used": 0,
+ "mock": true
+ },
+}'
+
+# Test smartscraper endpoint
+curl -X POST "https://api.scrapegraph.ai/v1/smartscraper/" \
+ -H "Authorization: Bearer $SGAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
- "url": "https://example.com",
- "prompt": "Extract the title"
+ "website_url": "https://example.com",
+ "user_prompt": "Extract title and content"
+ "mock": true
}'
+```
+
+### Testing Different HTTP Methods
+
+```bash
+# POST request - to smartscraper
+curl --location 'https://api.scrapegraphai.com/v1/smartscraper' \
+--data '{
+ "website_url": "https://www.scrapegraphai.com//",
+ "user_prompt": "Extract founder info ",
+ "mock":true
+}'
+```
+
+```bash
+# POST request - to Markdownify
+curl --location 'https://api.scrapegraphai.com/v1/markdownify' \
+--data '{
+ "website_url": "https://www.scrapegraphai.com//",
+ "mock":true
+}'
+```
+
+```bash
+# POST request - to SearchScraper
+curl --location 'https://api.scrapegraphai.com/v1/searchscraper' \
+--data '{
+ "website_url": "https://www.scrapegraphai.com//",
+ "mock":true
+ "output_schema":{},
+ "num_results":3,
+}'
+```
+
+
+## JavaScript SDK Mocking
+
+The JavaScript SDK supports per-request mocking via the `mock` parameter. Pass `mock: true` in the params object of any function to receive mock data instead of making a real API call.
+
+### Per-Request Mock Mode
+
+```javascript
+import { smartScraper, scrape, searchScraper, getCredits } from 'scrapegraph-js';
+
+const API_KEY = 'your-api-key';
+
+// SmartScraper with mock
+const smartResult = await smartScraper(API_KEY, {
+ website_url: 'https://example.com',
+ user_prompt: 'Extract the title',
+ mock: true,
+});
+console.log('SmartScraper mock:', smartResult.data);
+
+// Scrape with mock
+const scrapeResult = await scrape(API_KEY, {
+ website_url: 'https://example.com',
+ mock: true,
+});
+console.log('Scrape mock:', scrapeResult.data);
-# Test credits endpoint
-curl -X GET "https://api.scrapegraphai.com/api/v2/credits" \
- -H "Authorization: Bearer your-api-key"
+// SearchScraper with mock
+const searchResult = await searchScraper(API_KEY, {
+ user_prompt: 'Find AI news',
+ mock: true,
+});
+console.log('SearchScraper mock:', searchResult.data);
```
+
+The JavaScript SDK does not have global mock functions like `enableMock()` or `setMockResponses()`. Mock mode is controlled per-request via the `mock: true` parameter. All functions return `ApiResult` — errors are never thrown.
+
+
## SDK Comparison
-| Feature | Python | JavaScript |
-|---------|--------|------------|
-| **Mock library** | `unittest.mock`, `responses` | Jest/Vitest mocks, MSW |
-| **HTTP-level mocking** | `responses`, `aioresponses` | MSW (Mock Service Worker) |
-| **Async mocking** | `aioresponses`, `unittest.mock` | Native async/await |
-| **Fixture support** | pytest fixtures | beforeEach/afterEach |
+
+
+ - `Client(mock=True)` initialization
+ - `mock_responses` parameter for overrides
+ - `mock_handler` for custom logic
+ - Environment variable: `SGAI_MOCK=true`
+
+
+ - `mock: true` in per-request params
+ - All functions support mock parameter
+ - Native async/await
+
+
+ - Environment variable: `SGAI_MOCK=true`
+ - `SGAI_MOCK_RESPONSES` for custom responses
+ - Direct HTTP method testing
+ - No SDK dependencies required
+
+
+
+### Feature Comparison
+
+| Feature | Python SDK | JavaScript SDK | cURL/HTTP |
+|---------|------------|----------------|-----------|
+| **Global Mock Mode** | `Client(mock=True)` | N/A | `SGAI_MOCK=true` |
+| **Per-Request Mock** | `{mock: True}` in params | `mock: true` in params | N/A |
+| **Custom Responses** | `mock_responses` dict | N/A | `SGAI_MOCK_RESPONSES` |
+| **Custom Handler** | `mock_handler` function | N/A | N/A |
+| **Environment Variable** | `SGAI_MOCK=true` | N/A | `SGAI_MOCK=true` |
+| **Async Support** | `AsyncClient(mock=True)` | Native async/await | N/A |
+| **Dependencies** | Python SDK required | JavaScript SDK required | None |
+
+## Limitations
+
+* You can't test real-time scraping performance in mock mode.
+* Mock responses don't reflect actual website changes or dynamic content.
+* Rate limiting and credit consumption are not simulated in mock mode.
+* Some advanced features may behave differently in mock mode compared to live mode.
+
+## Troubleshooting
-## Best Practices
+
-- Mock at the **client method level** for unit tests (fastest, simplest)
-- Mock at the **HTTP level** for integration tests (validates request/response shapes)
-- Use **fixtures** to share mock configurations across tests
-- Keep mock responses **realistic** - match the actual API response structure
-- Test both **success and error** scenarios
+### Mock responses not working
+- Ensure `mock=True` is set when initializing the client
+- Check that your mock response paths match the actual API endpoints
+- Verify the response format matches the expected schema
+
+### Custom handler not being called
+- Make sure you're passing the `mock_handler` parameter correctly
+- Check that your handler function accepts the correct parameters: `(method, url, kwargs)`
+- Ensure the handler returns a valid response object
+
+### Schema validation errors
+- Mock responses must match the expected Pydantic schema structure
+- Use the same field names and types as defined in your schema
+- Test your mock responses with the actual schema classes
+
+
+
+## Examples
+
+
+Here's a complete example showing all mocking features:
+
+```python
+from scrapegraph_py import Client
+from scrapegraph_py.logger import sgai_logger
+from pydantic import BaseModel, Field
+from typing import List
+
+# Set up logging
+sgai_logger.set_logging(level="INFO")
+
+class ProductInfo(BaseModel):
+ name: str = Field(description="Product name")
+ price: str = Field(description="Product price")
+ features: List[str] = Field(description="Product features")
+
+def complete_mock_demo():
+ # Initialize with comprehensive mock responses
+ client = Client.from_env(
+ mock=True,
+ mock_responses={
+ "/v1/credits": {
+ "remaining_credits": 25,
+ "total_credits_used": 75,
+ "mock": true
+ },
+ "/v1/smartscraper/start": {
+ "job_id": "demo-job-789",
+ "status": "processing",
+ "mock": true
+ },
+ "/v1/smartscraper/status/demo-job-789": {
+ "job_id": "demo-job-789",
+ "status": "completed",
+ "result": {
+ "name": "iPhone 15 Pro",
+ "price": "$999",
+ "features": [
+ "A17 Pro chip",
+ "48MP camera system",
+ "Titanium design",
+ "Action Button"
+ ],
+ "mock": true
+ }
+ }
+ }
+ )
+
+ print("=== ScrapeGraphAI Mock Demo ===\n")
+
+ # Test credits endpoint
+ print("1. Checking credits:")
+ credits = client.get_credits()
+ print(f" Remaining: {credits['remaining_credits']}")
+ print(f" Used: {credits['total_credits_used']}\n")
+
+ # Test smartscraper with schema
+ print("2. Extracting product information:")
+ product = client.smartscraper(
+ website_url="https://apple.com/iphone-15-pro",
+ user_prompt="Extract product name, price, and key features",
+ output_schema=ProductInfo
+ )
+
+ print(f" Product: {product.name}")
+ print(f" Price: {product.price}")
+ print(" Features:")
+ for feature in product.features:
+ print(f" - {feature}")
+
+ print("\n3. Testing markdownify:")
+ markdown = client.markdownify(website_url="https://example.com")
+ print(f" Markdown length: {len(markdown)} characters")
+
+ print("\n=== Demo Complete ===")
+
+if __name__ == "__main__":
+ complete_mock_demo()
+```
+
## Support
-
+
Report bugs or request features
@@ -266,4 +596,4 @@ curl -X GET "https://api.scrapegraphai.com/api/v2/credits" \
-Need help with testing? Join our [Discord community](https://discord.gg/uJN7TYcpNa) for support.
+Need help with mocking? Check out our [Python SDK documentation](/sdks/python) or join our [Discord community](https://discord.gg/uJN7TYcpNa) for support.
diff --git a/sdks/python.mdx b/sdks/python.mdx
index 17a8c7d..43da3f2 100644
--- a/sdks/python.mdx
+++ b/sdks/python.mdx
@@ -1,9 +1,15 @@
---
title: 'Python SDK'
-description: 'Official Python SDK for ScrapeGraphAI v2'
+description: 'Official Python SDK for ScrapeGraphAI'
icon: 'python'
---
+
+
[](https://badge.fury.io/py/scrapegraph-py)
@@ -15,415 +21,369 @@ icon: 'python'
## Installation
+Install the package using pip:
+
```bash
pip install scrapegraph-py
-# or
-uv add scrapegraph-py
```
-## What's New in v2
+## Features
-- **Complete rewrite** built on [Pydantic v2](https://docs.pydantic.dev) + [httpx](https://www.python-httpx.org).
-- **Client rename**: `Client` → `ScrapeGraphAI`, `AsyncClient` → `AsyncScrapeGraphAI`.
-- **Typed request models**: every method takes a Pydantic request (`ScrapeRequest`, `ExtractRequest`, `SearchRequest`, `CrawlRequest`, `MonitorCreateRequest`, …).
-- **`ApiResult[T]` wrapper**: no exceptions on API errors — every call returns `status: "success" | "error"`, `data`, `error`, and `elapsed_ms`.
-- **Nested resources**: `sgai.crawl.*`, `sgai.monitor.*`, `sgai.history.*`.
-- **camelCase on the wire, snake_case in Python**: automatic via Pydantic's `alias_generator`.
-- **Removed**: `markdownify()`, `agenticscraper()`, `sitemap()`, `feedback()` — use `scrape()` with the appropriate format entry instead.
-
-
-v2 is a breaking release. See the [Migration Guide](/transition-from-v1-to-v2) if you're upgrading from v1.
-
+- **AI-Powered Extraction**: Advanced web scraping using artificial intelligence
+- **Flexible Clients**: Both synchronous and asynchronous support
+- **Type Safety**: Structured output with Pydantic schemas
+- **Production Ready**: Detailed logging and automatic retries
+- **Developer Friendly**: Comprehensive error handling
## Quick Start
-```python
-from scrapegraph_py import ScrapeGraphAI, ScrapeRequest
-
-# reads SGAI_API_KEY from env, or pass it explicitly:
-# sgai = ScrapeGraphAI(api_key="sgai-...")
-sgai = ScrapeGraphAI()
-
-result = sgai.scrape(ScrapeRequest(url="https://example.com"))
-
-if result.status == "success":
- print(result.data.results["markdown"]["data"])
-else:
- print(result.error)
-```
-
-### ApiResult
-
-Every method returns `ApiResult[T]` — no try/except needed for API errors:
+Initialize the client with your API key:
```python
-from typing import Generic, Literal, TypeVar
-from pydantic import BaseModel
+from scrapegraph_py import Client
-T = TypeVar("T")
-
-class ApiResult(BaseModel, Generic[T]):
- status: Literal["success", "error"]
- data: T | None
- error: str | None = None
- elapsed_ms: int
+client = Client(api_key="your-api-key-here")
```
-### Environment Variables
-
-| Variable | Description | Default |
-| --------------- | -------------------------------------------- | --------------------------------------- |
-| `SGAI_API_KEY` | Your ScrapeGraphAI API key | — |
-| `SGAI_API_URL` | Override API base URL | `https://api.scrapegraphai.com/api/v2` |
-| `SGAI_TIMEOUT` | Request timeout in seconds | `120` |
-| `SGAI_DEBUG` | Enable debug logging (set to `"1"`) | off |
-
-The client supports context managers for automatic session cleanup:
-
-```python
-with ScrapeGraphAI() as sgai:
- result = sgai.scrape(ScrapeRequest(url="https://example.com"))
-```
+
+You can also set the `SGAI_API_KEY` environment variable and initialize the client without parameters: `client = Client()`
+
## Services
-### Scrape
+### SmartScraper
-Fetch a page in one or more formats (markdown, html, screenshot, json, links, images, summary, branding).
+Extract specific information from any webpage using AI:
```python
-from scrapegraph_py import (
- ScrapeGraphAI, ScrapeRequest, FetchConfig,
- MarkdownFormatConfig, ScreenshotFormatConfig, JsonFormatConfig,
+response = client.smartscraper(
+ website_url="https://example.com",
+ user_prompt="Extract the main heading and description"
)
-
-sgai = ScrapeGraphAI()
-
-res = sgai.scrape(ScrapeRequest(
- url="https://example.com",
- formats=[
- MarkdownFormatConfig(mode="reader"),
- ScreenshotFormatConfig(full_page=True, width=1440, height=900),
- JsonFormatConfig(prompt="Extract product info"),
- ],
- content_type="text/html", # optional, auto-detected
- fetch_config=FetchConfig(
- mode="js",
- stealth=True,
- timeout=30000,
- wait=2000,
- scrolls=3,
- ),
-))
-
-if res.status == "success":
- markdown = res.data.results["markdown"]["data"]
```
-#### `ScrapeRequest` fields
-
-| Field | Type | Required | Description |
-| -------------- | ------------------------ | -------- | --------------------------------------------------------------------------- |
-| `url` | `HttpUrl` | Yes | URL to scrape |
-| `formats` | `list[ScrapeFormatEntry]`| No | Defaults to `[MarkdownFormatConfig()]` |
-| `content_type` | `str` | No | Override detected content type (e.g. `"application/pdf"`, `"text/html"`) |
-| `fetch_config` | `FetchConfig` | No | Fetch configuration (mode, stealth, timeout, cookies, country, …) |
+#### Parameters
-#### Format entries
+| Parameter | Type | Required | Description |
+| ---------------- | ------- | -------- | ---------------------------------------------------------------------------------- |
+| website_url | string | Yes | The URL of the webpage that needs to be scraped. |
+| user_prompt | string | Yes | A textual description of what you want to achieve. |
+| output_schema | object | No | The Pydantic object that describes the structure and format of the response. |
-| Class | Fields |
-| ------------------------- | -------------------------------------------------------------- |
-| `MarkdownFormatConfig` | `mode`: `"normal" \| "reader" \| "prune"` |
-| `HtmlFormatConfig` | `mode`: same as above |
-| `ScreenshotFormatConfig` | `full_page`, `width` (320–3840), `height` (200–2160), `quality`|
-| `JsonFormatConfig` | `prompt` (1–10k chars), `schema`, `mode` |
-| `LinksFormatConfig` | — |
-| `ImagesFormatConfig` | — |
-| `SummaryFormatConfig` | — |
-| `BrandingFormatConfig` | — |
+
+Define a simple schema for basic data extraction:
-
-Duplicate `type` entries in `formats` are rejected by a Pydantic validator.
-
+```python
+from pydantic import BaseModel, Field
+
+class ArticleData(BaseModel):
+ title: str = Field(description="The article title")
+ author: str = Field(description="The author's name")
+ publish_date: str = Field(description="Article publication date")
+ content: str = Field(description="Main article content")
+ category: str = Field(description="Article category")
+
+response = client.smartscraper(
+ website_url="https://example.com/blog/article",
+ user_prompt="Extract the article information",
+ output_schema=ArticleData
+)
-### Extract
+print(f"Title: {response.title}")
+print(f"Author: {response.author}")
+print(f"Published: {response.publish_date}")
+```
+
-Run structured extraction against a URL, HTML, or markdown using AI.
+
+Define a complex schema for nested data structures:
```python
-from scrapegraph_py import ScrapeGraphAI, ExtractRequest
-
-sgai = ScrapeGraphAI()
-
-res = sgai.extract(ExtractRequest(
- url="https://example.com",
- prompt="Extract product names and prices",
- schema={
- "type": "object",
- "properties": {
- "products": {
- "type": "array",
- "items": {
- "type": "object",
- "properties": {
- "name": {"type": "string"},
- "price": {"type": "string"},
- },
- },
- },
- },
- },
-))
-
-if res.status == "success":
- print(res.data.json_data)
- print(f"Tokens: {res.data.usage.prompt_tokens} / {res.data.usage.completion_tokens}")
-```
-
-#### `ExtractRequest` fields
+from typing import List
+from pydantic import BaseModel, Field
+
+class Employee(BaseModel):
+ name: str = Field(description="Employee's full name")
+ position: str = Field(description="Job title")
+ department: str = Field(description="Department name")
+ email: str = Field(description="Email address")
+
+class Office(BaseModel):
+ location: str = Field(description="Office location/city")
+ address: str = Field(description="Full address")
+ phone: str = Field(description="Contact number")
+
+class CompanyData(BaseModel):
+ name: str = Field(description="Company name")
+ description: str = Field(description="Company description")
+ industry: str = Field(description="Industry sector")
+ founded_year: int = Field(description="Year company was founded")
+ employees: List[Employee] = Field(description="List of key employees")
+ offices: List[Office] = Field(description="Company office locations")
+ website: str = Field(description="Company website URL")
+
+# Extract comprehensive company information
+response = client.smartscraper(
+ website_url="https://example.com/about",
+ user_prompt="Extract detailed company information including employees and offices",
+ output_schema=CompanyData
+)
-| Field | Type | Required | Description |
-| -------------- | ------------- | -------- | --------------------------------------------------------------------------------- |
-| `url` | `HttpUrl` | Yes\* | Page URL |
-| `html` | `str` | Yes\* | Raw HTML (alternative to `url`) |
-| `markdown` | `str` | Yes\* | Raw markdown (alternative to `url`) |
-| `prompt` | `str` | Yes | 1–10,000 chars |
-| `schema` | `dict` | No | JSON Schema for the structured output |
-| `mode` | `str` | No | `"normal"` (default), `"reader"`, `"prune"` |
-| `content_type` | `str` | No | Override detected content type |
-| `fetch_config` | `FetchConfig` | No | Fetch configuration |
+# Access nested data
+print(f"Company: {response.name}")
+print("\nKey Employees:")
+for employee in response.employees:
+ print(f"- {employee.name} ({employee.position})")
-
-\*At least one of `url`, `html`, or `markdown` is required — enforced by a Pydantic validator.
-
+print("\nOffice Locations:")
+for office in response.offices:
+ print(f"- {office.location}: {office.address}")
+```
+
-### Search
+### SearchScraper
-Run a web search and optionally extract structured data from the results.
+Search and extract information from multiple web sources using AI:
```python
-from scrapegraph_py import ScrapeGraphAI, SearchRequest
+from scrapegraph_py.models import TimeRange
-sgai = ScrapeGraphAI()
-
-res = sgai.search(SearchRequest(
- query="best programming languages 2024",
- num_results=5,
- prompt="Summarize the top languages and reasons",
- time_range="past_week",
- location_geo_code="us",
-))
-
-if res.status == "success":
- for hit in res.data.results:
- print(hit.title, hit.url)
- print(res.data.json_data) # when prompt/schema are set
+response = client.searchscraper(
+ user_prompt="What are the key features and pricing of ChatGPT Plus?",
+ time_range=TimeRange.PAST_WEEK # Optional: Filter results by time range
+)
```
-#### `SearchRequest` fields
-
-| Field | Type | Required | Description |
-| ------------------- | ------------- | -------- | ------------------------------------------------------------------------- |
-| `query` | `str` | Yes | 1–500 chars |
-| `num_results` | `int` | No | 1–20, default `3` |
-| `format` | `str` | No | `"markdown"` (default) or `"html"` |
-| `mode` | `str` | No | HTML processing: `"prune"` (default), `"normal"`, `"reader"` |
-| `prompt` | `str` | No | Required when `schema` is set |
-| `schema` | `dict` | No | JSON Schema for structured output |
-| `location_geo_code` | `str` | No | Two-letter country code (e.g. `"us"`, `"it"`) |
-| `time_range` | `str` | No | `"past_hour"`, `"past_24_hours"`, `"past_week"`, `"past_month"`, `"past_year"` |
-| `fetch_config` | `FetchConfig` | No | Fetch configuration |
+#### Parameters
-### Crawl
+| Parameter | Type | Required | Description |
+| ---------------- | ------- | -------- | ---------------------------------------------------------------------------------- |
+| user_prompt | string | Yes | A textual description of what you want to achieve. |
+| num_results | number | No | Number of websites to search (3-20). Default: 3. |
+| extraction_mode | boolean | No | **True** = AI extraction mode (10 credits/page), **False** = markdown mode (2 credits/page). Default: True |
+| output_schema | object | No | The Pydantic object that describes the structure and format of the response (AI extraction mode only) |
+| location_geo_code| string | No | Optional geo code for location-based search (e.g., "us") |
+| time_range | TimeRange| No | Optional time range filter for search results. Options: TimeRange.PAST_HOUR, TimeRange.PAST_24_HOURS, TimeRange.PAST_WEEK, TimeRange.PAST_MONTH, TimeRange.PAST_YEAR |
-Crawl a site and its linked pages asynchronously. Access via the `sgai.crawl` resource.
+
+Define a simple schema for structured search results:
```python
-from scrapegraph_py import ScrapeGraphAI, CrawlRequest, MarkdownFormatConfig
-
-sgai = ScrapeGraphAI()
-
-# Start
-start = sgai.crawl.start(CrawlRequest(
- url="https://example.com",
- formats=[MarkdownFormatConfig()],
- max_depth=2,
- max_pages=50,
- max_links_per_page=10,
- include_patterns=["/blog/*"],
- exclude_patterns=["/admin/*"],
-))
-
-crawl_id = start.data.id
-
-# Poll
-status = sgai.crawl.get(crawl_id)
-print(f"{status.data.finished}/{status.data.total} - {status.data.status}")
-
-# Control
-sgai.crawl.stop(crawl_id)
-sgai.crawl.resume(crawl_id)
-sgai.crawl.delete(crawl_id)
-```
-
-#### `CrawlRequest` fields
-
-| Field | Type | Required | Description |
-| -------------------- | ------------------------- | -------- | -------------------------------------------------------- |
-| `url` | `HttpUrl` | Yes | Starting URL |
-| `formats` | `list[ScrapeFormatEntry]` | No | Defaults to `[MarkdownFormatConfig()]` |
-| `max_depth` | `int` | No | `≥ 0`, default `2` |
-| `max_pages` | `int` | No | `1–1000`, default `50` |
-| `max_links_per_page` | `int` | No | `≥ 1`, default `10` |
-| `allow_external` | `bool` | No | Default `False` |
-| `include_patterns` | `list[str]` | No | URL glob patterns to include |
-| `exclude_patterns` | `list[str]` | No | URL glob patterns to exclude |
-| `content_types` | `list[str]` | No | Allowed response content types |
-| `fetch_config` | `FetchConfig` | No | Fetch configuration |
+from pydantic import BaseModel, Field
+from typing import List
+
+class ProductInfo(BaseModel):
+ name: str = Field(description="Product name")
+ description: str = Field(description="Product description")
+ price: str = Field(description="Product price")
+ features: List[str] = Field(description="List of key features")
+ availability: str = Field(description="Availability information")
+
+from scrapegraph_py.models import TimeRange
+
+response = client.searchscraper(
+ user_prompt="Find information about iPhone 15 Pro",
+ output_schema=ProductInfo,
+ location_geo_code="us", # Optional: Geo code for location-based search
+ time_range=TimeRange.PAST_MONTH # Optional: Filter results by time range
+)
-### Monitor
+print(f"Product: {response.name}")
+print(f"Price: {response.price}")
+print("\nFeatures:")
+for feature in response.features:
+ print(f"- {feature}")
+```
+
-Scheduled extraction jobs. Access via the `sgai.monitor` resource.
+
+Define a complex schema for comprehensive market research:
```python
-from scrapegraph_py import (
- ScrapeGraphAI, MonitorCreateRequest, MonitorUpdateRequest, MarkdownFormatConfig,
+from typing import List
+from pydantic import BaseModel, Field
+
+class MarketPlayer(BaseModel):
+ name: str = Field(description="Company name")
+ market_share: str = Field(description="Market share percentage")
+ key_products: List[str] = Field(description="Main products in market")
+ strengths: List[str] = Field(description="Company's market strengths")
+
+class MarketTrend(BaseModel):
+ name: str = Field(description="Trend name")
+ description: str = Field(description="Trend description")
+ impact: str = Field(description="Expected market impact")
+ timeframe: str = Field(description="Trend timeframe")
+
+class MarketAnalysis(BaseModel):
+ market_size: str = Field(description="Total market size")
+ growth_rate: str = Field(description="Annual growth rate")
+ key_players: List[MarketPlayer] = Field(description="Major market players")
+ trends: List[MarketTrend] = Field(description="Market trends")
+ challenges: List[str] = Field(description="Industry challenges")
+ opportunities: List[str] = Field(description="Market opportunities")
+
+from scrapegraph_py.models import TimeRange
+
+# Perform comprehensive market research
+response = client.searchscraper(
+ user_prompt="Analyze the current AI chip market landscape",
+ output_schema=MarketAnalysis,
+ location_geo_code="us", # Optional: Geo code for location-based search
+ time_range=TimeRange.PAST_MONTH # Optional: Filter results by time range
)
-sgai = ScrapeGraphAI()
+# Access structured market data
+print(f"Market Size: {response.market_size}")
+print(f"Growth Rate: {response.growth_rate}")
+
+print("\nKey Players:")
+for player in response.key_players:
+ print(f"\n{player.name}")
+ print(f"Market Share: {player.market_share}")
+ print("Key Products:")
+ for product in player.key_products:
+ print(f"- {product}")
+
+print("\nMarket Trends:")
+for trend in response.trends:
+ print(f"\n{trend.name}")
+ print(f"Impact: {trend.impact}")
+ print(f"Timeframe: {trend.timeframe}")
+```
+
-mon = sgai.monitor.create(MonitorCreateRequest(
- url="https://example.com",
- name="Price Monitor",
- interval="0 * * * *", # cron expression
- formats=[MarkdownFormatConfig()],
- webhook_url="https://example.com/webhook",
-))
+
+Use markdown mode for cost-effective content gathering:
-cron_id = mon.data.cron_id
+```python
+from scrapegraph_py import Client
-sgai.monitor.list()
-sgai.monitor.get(cron_id)
-sgai.monitor.update(cron_id, MonitorUpdateRequest(interval="0 */6 * * *"))
-sgai.monitor.pause(cron_id)
-sgai.monitor.resume(cron_id)
-sgai.monitor.delete(cron_id)
-```
+client = Client(api_key="your-api-key")
-#### `monitor.activity()` — poll tick history
+from scrapegraph_py.models import TimeRange
-Paginate through the per-run ticks a monitor has produced (what changed on each scheduled run).
+# Enable markdown mode for cost-effective content gathering
+response = client.searchscraper(
+ user_prompt="Latest developments in artificial intelligence",
+ num_results=3,
+ extraction_mode=False, # Enable markdown mode (2 credits per page vs 10 credits)
+ location_geo_code="us", # Optional: Geo code for location-based search
+ time_range=TimeRange.PAST_WEEK # Optional: Filter results by time range
+)
-```python
-from scrapegraph_py import MonitorActivityRequest
+# Access the raw markdown content
+markdown_content = response['markdown_content']
+reference_urls = response['reference_urls']
-act = sgai.monitor.activity(cron_id, MonitorActivityRequest(limit=20))
+print(f"Markdown content length: {len(markdown_content)} characters")
+print(f"Reference URLs: {len(reference_urls)}")
-if act.status == "success":
- for tick in act.data.ticks:
- status = "CHANGED" if tick.changed else "no change"
- print(f"[{tick.created_at}] {tick.status} - {status} ({tick.elapsed_ms}ms)")
+# Process the markdown content
+print("Content preview:", markdown_content[:500] + "...")
- if act.data.next_cursor:
- more = sgai.monitor.activity(
- cron_id, MonitorActivityRequest(limit=20, cursor=act.data.next_cursor),
- )
-```
+# Save to file for analysis
+with open('ai_research_content.md', 'w', encoding='utf-8') as f:
+ f.write(markdown_content)
-`MonitorActivityRequest` fields: `limit` (1–100, default `20`) and optional `cursor` for pagination. Each `MonitorTickEntry` exposes `id`, `created_at`, `status`, `changed`, `elapsed_ms`, and a `diffs` model with per-format deltas.
-
-#### `MonitorCreateRequest` fields
+print("Content saved to ai_research_content.md")
+```
-| Field | Type | Required | Description |
-| -------------- | ------------------------- | -------- | ---------------------------------------------- |
-| `url` | `HttpUrl` | Yes | URL to monitor |
-| `interval` | `str` | Yes | Cron expression (1–100 chars) |
-| `name` | `str` | No | ≤ 200 chars |
-| `formats` | `list[ScrapeFormatEntry]` | No | Defaults to `[MarkdownFormatConfig()]` |
-| `webhook_url` | `HttpUrl` | No | Webhook invoked on change detection |
-| `fetch_config` | `FetchConfig` | No | Fetch configuration |
+**Markdown Mode Benefits:**
+- **Cost-effective**: Only 2 credits per page (vs 10 credits for AI extraction)
+- **Full content**: Get complete page content in markdown format
+- **Faster**: No AI processing overhead
+- **Perfect for**: Content analysis, bulk data collection, building datasets
-### History
+
-Fetch recent request history. Access via the `sgai.history` resource.
+
+Filter search results by date range to get only recent information:
```python
-from scrapegraph_py import ScrapeGraphAI, HistoryFilter
+from scrapegraph_py import Client
+from scrapegraph_py.models import TimeRange
-sgai = ScrapeGraphAI()
+client = Client(api_key="your-api-key")
-page = sgai.history.list(HistoryFilter(service="scrape", page=1, limit=20))
-for entry in page.data.data:
- print(entry.id, entry.service, entry.status, entry.elapsed_ms)
+# Search for recent news from the past week
+response = client.searchscraper(
+ user_prompt="Latest news about AI developments",
+ num_results=5,
+ time_range=TimeRange.PAST_WEEK # Options: PAST_HOUR, PAST_24_HOURS, PAST_WEEK, PAST_MONTH, PAST_YEAR
+)
-one = sgai.history.get("request-id")
+print("Recent AI news:", response['result'])
+print("Reference URLs:", response['reference_urls'])
```
-### Credits / Health
+**Time Range Options:**
+- `TimeRange.PAST_HOUR` - Results from the past hour
+- `TimeRange.PAST_24_HOURS` - Results from the past 24 hours
+- `TimeRange.PAST_WEEK` - Results from the past week
+- `TimeRange.PAST_MONTH` - Results from the past month
+- `TimeRange.PAST_YEAR` - Results from the past year
-```python
-credits = sgai.credits()
-# ApiResult[CreditsResponse] with .remaining, .used, .plan, .jobs.crawl, .jobs.monitor
-
-health = sgai.health()
-# ApiResult[HealthResponse] with .status, .uptime, .services
-```
+**Use Cases:**
+- Finding recent news and updates
+- Tracking time-sensitive information
+- Getting latest product releases
+- Monitoring recent market changes
-## Configuration Objects
+
-### FetchConfig
+### Markdownify
-Controls how pages are fetched. See the [proxy configuration guide](/services/additional-parameters/proxy) for details on modes and geotargeting.
+Convert any webpage into clean, formatted markdown:
```python
-from scrapegraph_py import FetchConfig
-
-config = FetchConfig(
- mode="js", # "auto" (default), "fast", "js"
- stealth=True, # Residential proxies / anti-bot headers (+5 credits)
- timeout=30000, # 1,000–60,000 ms
- wait=2000, # 0–30,000 ms
- scrolls=3, # 0–100
- country="us", # ISO 3166-1 alpha-2
- headers={"X-Custom": "header"},
- cookies={"session": "abc"},
- mock=False, # Or a MockConfig object for testing
+response = client.markdownify(
+ website_url="https://example.com"
)
```
## Async Support
-Every sync method has an async equivalent on `AsyncScrapeGraphAI`:
+All endpoints support asynchronous operations:
```python
import asyncio
-from scrapegraph_py import AsyncScrapeGraphAI, ScrapeRequest, CrawlRequest
+from scrapegraph_py import AsyncClient
async def main():
- async with AsyncScrapeGraphAI() as sgai:
- res = await sgai.scrape(ScrapeRequest(url="https://example.com"))
- if res.status == "success":
- print(res.data.results["markdown"]["data"])
+ async with AsyncClient() as client:
+ response = await client.smartscraper(
+ website_url="https://example.com",
+ user_prompt="Extract the main content"
+ )
+ print(response)
- start = await sgai.crawl.start(CrawlRequest(
- url="https://example.com", max_pages=25,
- ))
- status = await sgai.crawl.get(start.data.id)
- print(status.data.status)
+asyncio.run(main())
+```
- credits = await sgai.credits()
- print(credits.data.remaining)
+## Feedback
-asyncio.run(main())
+Help us improve by submitting feedback programmatically:
+
+```python
+client.submit_feedback(
+ request_id="your-request-id",
+ rating=5,
+ feedback_text="Great results!"
+)
```
## Support
-
+
Report issues and contribute to the SDK
Get help from our development team
+
+
+ This project is licensed under the MIT License. See the [LICENSE](https://github.com/ScrapeGraphAI/scrapegraph-sdk/blob/main/LICENSE) file for details.
+
diff --git a/services/additional-parameters/headers.mdx b/services/additional-parameters/headers.mdx
index 0bb5cd1..53446b5 100644
--- a/services/additional-parameters/headers.mdx
+++ b/services/additional-parameters/headers.mdx
@@ -77,7 +77,9 @@ response = client.markdownify(
```
```javascript JavaScript
-import { extract } from 'scrapegraph-js';
+import { smartScraper } from 'scrapegraph-js';
+
+const apiKey = 'your-api-key';
// Define custom headers
const headers = {
@@ -86,11 +88,11 @@ const headers = {
'Sec-Ch-Ua-Platform': '"Windows"',
};
-// Use with extract
-const result = await extract('your-api-key', {
- url: 'https://example.com',
- prompt: 'Extract the main content',
- fetchConfig: { headers },
+// Use with SmartScraper
+const response = await smartScraper(apiKey, {
+ website_url: 'https://example.com',
+ user_prompt: 'Extract the main content',
+ headers: headers,
});
```
@@ -137,7 +139,9 @@ response = client.smartscraper(
```
```javascript JavaScript
-import { extract } from 'scrapegraph-js';
+import { smartScraper } from 'scrapegraph-js';
+
+const apiKey = 'your-api-key';
// Example with session cookies
const headers = {
@@ -145,10 +149,10 @@ const headers = {
'Cookie': 'session_id=abc123; user_id=12345; theme=dark',
};
-const result = await extract('your-api-key', {
- url: 'https://example.com/dashboard',
- prompt: 'Extract user information',
- fetchConfig: { headers },
+const response = await smartScraper(apiKey, {
+ website_url: 'https://example.com/dashboard',
+ user_prompt: 'Extract user information',
+ headers: headers,
});
```
diff --git a/services/additional-parameters/pagination.mdx b/services/additional-parameters/pagination.mdx
index 63cfc5f..207833f 100644
--- a/services/additional-parameters/pagination.mdx
+++ b/services/additional-parameters/pagination.mdx
@@ -65,12 +65,15 @@ response = client.smartscraper(
### JavaScript SDK
```javascript
-import { extract } from 'scrapegraph-js';
+import { smartScraper } from 'scrapegraph-js';
-// Basic extraction
-const result = await extract('your-api-key', {
- url: 'https://example-store.com/products',
- prompt: 'Extract all product information',
+const apiKey = 'your-api-key';
+
+// Basic pagination - scrape 3 pages
+const response = await smartScraper(apiKey, {
+ website_url: 'https://example-store.com/products',
+ user_prompt: 'Extract all product information',
+ total_pages: 3,
});
```
diff --git a/services/additional-parameters/proxy.mdx b/services/additional-parameters/proxy.mdx
index d9bc152..adcedd8 100644
--- a/services/additional-parameters/proxy.mdx
+++ b/services/additional-parameters/proxy.mdx
@@ -1,6 +1,6 @@
---
title: 'Proxy Configuration'
-description: 'Configure proxy settings, fetch modes, and geotargeting for web scraping requests'
+description: 'Configure proxy settings and geotargeting for web scraping requests'
icon: 'globe'
---
@@ -10,12 +10,10 @@ icon: 'globe'
## Overview
-The ScrapeGraphAI API uses an intelligent proxy system that automatically handles web scraping requests through multiple proxy providers. The system uses a fallback strategy to ensure maximum reliability — if one provider fails, it automatically tries the next one.
+The ScrapeGraphAI API uses an intelligent proxy system that automatically handles web scraping requests through multiple proxy providers. The system uses a fallback strategy to ensure maximum reliability - if one provider fails, it automatically tries the next one.
**No configuration required**: The proxy system is fully automatic and transparent to API users. You don't need to configure proxy credentials or settings yourself.
-In v2, all proxy and fetch behaviour is controlled through the `FetchConfig` object, which you can pass to any service method (`extract`, `scrape`, `search`, `crawl`, etc.).
-
## How It Works
The API automatically routes your scraping requests through multiple proxy providers in a smart order:
@@ -23,59 +21,11 @@ The API automatically routes your scraping requests through multiple proxy provi
1. The system tries different proxy providers automatically
2. If one provider fails, it automatically falls back to the next one
3. Successful providers are cached for each domain to improve performance
-4. Everything happens transparently — you just make your API request as normal
-
-## Fetch Modes
-
-The `mode` parameter inside `FetchConfig` controls how pages are retrieved and which proxy strategy is used:
-
-| Mode | Description | JS Rendering | Best For |
-|------|-------------|:------------:|----------|
-| `auto` | Automatically selects the best provider chain | Adaptive | General use (default) |
-| `fast` | Direct HTTP fetch via impit | No | Static pages, maximum speed |
-| `js` | Headless browser rendering | Yes | JavaScript-heavy SPAs |
-
-To enable stealth mode (residential proxy with anti-bot headers), set the separate `stealth` boolean to `true` alongside any mode. For example, `mode: "js"` with `stealth: true` provides JS rendering through a residential proxy — equivalent to the old `js+stealth` mode.
-
-
-
-```python Python
-from scrapegraph_py import Client, FetchConfig
-
-client = Client(api_key="your-api-key")
-
-# Use stealth mode with JS rendering
-response = client.extract(
- url="https://example.com",
- prompt="Extract product information",
- fetch_config=FetchConfig(
- mode="js",
- stealth=True,
- wait=2000,
- ),
-)
-```
-
-```javascript JavaScript
-import { extract } from 'scrapegraph-js';
-
-// Use stealth mode with JS rendering
-const result = await extract('your-api-key', {
- url: 'https://example.com',
- prompt: 'Extract product information',
- fetchConfig: {
- mode: 'js',
- stealth: true,
- wait: 2000,
- },
-});
-```
-
-
+4. Everything happens transparently - you just make your API request as normal
## Country Selection (Geotargeting)
-You can optionally specify a two-letter country code via `FetchConfig.country` to route requests through proxies in a specific country. This is useful for:
+You can optionally specify a country code to route requests through proxies in a specific country. This is useful for:
- Accessing geo-restricted content
- Getting localized versions of websites
@@ -84,45 +34,46 @@ You can optionally specify a two-letter country code via `FetchConfig.country` t
### Using Country Code
+Include the `country_code` parameter in your API request:
+
```python Python
-from scrapegraph_py import Client, FetchConfig
+from scrapegraph_py import Client
client = Client(api_key="your-api-key")
-# Route through US proxies
-response = client.extract(
- url="https://example.com",
- prompt="Extract product information",
- fetch_config=FetchConfig(country="us"),
+# Request with country code
+response = client.smartscraper(
+ website_url="https://example.com",
+ user_prompt="Extract product information",
+ country_code="us" # Route through US proxies
)
```
```javascript JavaScript
-import { extract } from 'scrapegraph-js';
+import { smartScraper } from 'scrapegraph-js';
-// Route through US proxies
-const result = await extract('your-api-key', {
- url: 'https://example.com',
- prompt: 'Extract product information',
- fetchConfig: { country: 'us' },
+const apiKey = 'your-api-key';
+
+// Request with country code
+const response = await smartScraper(apiKey, {
+ website_url: 'https://example.com',
+ user_prompt: 'Extract product information',
+ country_code: 'us',
});
```
```bash cURL
curl -X 'POST' \
- 'https://api.scrapegraphai.com/api/v2/extract' \
+ 'https://api.scrapegraphai.com/v1/smartscraper' \
-H 'accept: application/json' \
- -H 'Authorization: Bearer your-api-key' \
-H 'SGAI-APIKEY: your-api-key' \
-H 'Content-Type: application/json' \
-d '{
- "url": "https://example.com",
- "prompt": "Extract product information",
- "fetchConfig": {
- "country": "us"
- }
+ "website_url": "https://example.com",
+ "user_prompt": "Extract product information",
+ "country_code": "us"
}'
```
@@ -155,55 +106,16 @@ And many more! The API supports over 100 countries. Use standard ISO 3166-1 alph
-## FetchConfig Reference
-
-All proxy and fetch behaviour is configured through the `FetchConfig` object:
-
-| Parameter | Type | Default | Description |
-|-----------|------|---------|-------------|
-| `mode` | string | `"auto"` | Fetch mode: `auto`, `fast`, `js` |
-| `stealth` | bool | `false` | Enable stealth mode with residential proxy and anti-bot headers |
-| `timeout` | int | `30000` | Request timeout in milliseconds (1000–60000) |
-| `wait` | int | `0` | Milliseconds to wait after page load before scraping (0–30000) |
-| `scrolls` | int | `0` | Number of page scrolls to perform (0–100) |
-| `country` | string | — | Two-letter ISO country code for geo-located proxy routing (e.g. `"us"`) |
-| `headers` | object | — | Custom HTTP headers to send with the request |
-| `cookies` | object | — | Cookies to send with the request |
-| `mock` | bool | `false` | Enable mock mode for testing (no real request is made) |
-
-
-
-```python Python
-from scrapegraph_py import FetchConfig
-
-config = FetchConfig(
- mode="js", # Fetch mode
- stealth=True, # Stealth proxy
- timeout=15000, # 15s timeout
- wait=2000, # Wait 2s after page load
- scrolls=3, # Scroll 3 times
- country="us", # Route through US proxies
- headers={"Accept-Language": "en-US"},
- cookies={"session": "abc123"},
- mock=False,
-)
-```
+## Available Parameters
-```javascript JavaScript
-const fetchConfig = {
- mode: 'js', // Fetch mode
- stealth: true, // Stealth proxy
- timeout: 15000, // 15s timeout
- wait: 2000, // Wait 2s after page load
- scrolls: 3, // Scroll 3 times
- country: 'us', // Route through US proxies
- headers: { 'Accept-Language': 'en-US' },
- cookies: { session: 'abc123' },
- mock: false,
-};
-```
+The following parameters in API requests can affect proxy behavior:
-
+### `country_code` (optional)
+- **Type**: String
+- **Description**: Two-letter ISO country code to route requests through proxies in a specific country
+- **Example**: `"us"`, `"uk"`, `"de"`, `"it"`, `"fr"`
+- **Default**: No specific country (uses optimal routing)
+- **Format**: ISO 3166-1 alpha-2 (e.g., `us`, `gb`, `de`)
## Usage Examples
@@ -216,20 +128,22 @@ from scrapegraph_py import Client
client = Client(api_key="your-api-key")
-# Automatic proxy selection — no configuration needed
-response = client.extract(
- url="https://example.com",
- prompt="Extract product information",
+# Automatic proxy selection - no configuration needed
+response = client.smartscraper(
+ website_url="https://example.com",
+ user_prompt="Extract product information"
)
```
```javascript JavaScript
-import { extract } from 'scrapegraph-js';
+import { smartScraper } from 'scrapegraph-js';
+
+const apiKey = 'your-api-key';
// Automatic proxy selection
-const result = await extract('your-api-key', {
- url: 'https://example.com',
- prompt: 'Extract product information',
+const response = await smartScraper(apiKey, {
+ website_url: 'https://example.com',
+ user_prompt: 'Extract product information',
});
```
@@ -240,80 +154,35 @@ const result = await extract('your-api-key', {
```python Python
-from scrapegraph_py import Client, FetchConfig
+from scrapegraph_py import Client
client = Client(api_key="your-api-key")
# Route through US proxies
-response = client.extract(
- url="https://example.com",
- prompt="Extract product information",
- fetch_config=FetchConfig(country="us"),
+response = client.smartscraper(
+ website_url="https://example.com",
+ user_prompt="Extract product information",
+ country_code="us"
)
# Route through UK proxies
-response = client.extract(
- url="https://example.com",
- prompt="Extract product information",
- fetch_config=FetchConfig(country="gb"),
+response = client.smartscraper(
+ website_url="https://example.com",
+ user_prompt="Extract product information",
+ country_code="uk"
)
```
```javascript JavaScript
-import { extract } from 'scrapegraph-js';
-
-// Route through US proxies
-const result = await extract('your-api-key', {
- url: 'https://example.com',
- prompt: 'Extract product information',
- fetchConfig: { country: 'us' },
-});
-
-// Route through UK proxies
-const ukResult = await extract('your-api-key', {
- url: 'https://example.com',
- prompt: 'Extract product information',
- fetchConfig: { country: 'gb' },
-});
-```
-
-
-
-### Stealth Mode with JS Rendering
-
-
-
-```python Python
-from scrapegraph_py import Client, FetchConfig
+import { smartScraper } from 'scrapegraph-js';
-client = Client(api_key="your-api-key")
-
-response = client.scrape(
- url="https://heavily-protected-site.com",
- format="markdown",
- fetch_config=FetchConfig(
- mode="js",
- stealth=True,
- wait=3000,
- scrolls=5,
- country="us",
- ),
-)
-```
+const apiKey = 'your-api-key';
-```javascript JavaScript
-import { scrape } from 'scrapegraph-js';
-
-const result = await scrape('your-api-key', {
- url: 'https://heavily-protected-site.com',
- formats: [{ type: 'markdown' }],
- fetchConfig: {
- mode: 'js',
- stealth: true,
- wait: 3000,
- scrolls: 5,
- country: 'us',
- },
+// Route through US proxies
+const response = await smartScraper(apiKey, {
+ website_url: 'https://example.com',
+ user_prompt: 'Extract product information',
+ country_code: 'us',
});
```
@@ -323,107 +192,75 @@ const result = await scrape('your-api-key', {
#### Accessing Geo-Restricted Content
-
-
-```python Python
-from scrapegraph_py import Client, FetchConfig
+```python
+from scrapegraph_py import Client
client = Client(api_key="your-api-key")
# Access US-only content
-response = client.extract(
- url="https://us-only-service.com",
- prompt="Extract available services",
- fetch_config=FetchConfig(country="us"),
+response = client.smartscraper(
+ website_url="https://us-only-service.com",
+ user_prompt="Extract available services",
+ country_code="us"
)
```
-```javascript JavaScript
-import { extract } from 'scrapegraph-js';
-
-const result = await extract('your-api-key', {
- url: 'https://us-only-service.com',
- prompt: 'Extract available services',
- fetchConfig: { country: 'us' },
-});
-```
-
-
-
#### Getting Localized Content
```python
-from scrapegraph_py import Client, FetchConfig
-
-client = Client(api_key="your-api-key")
-
# Get German version of a website
-response = client.extract(
- url="https://example.com",
- prompt="Extract product prices in local currency",
- fetch_config=FetchConfig(country="de"),
+response = client.smartscraper(
+ website_url="https://example.com",
+ user_prompt="Extract product prices in local currency",
+ country_code="de"
)
# Get French version
-response = client.extract(
- url="https://example.com",
- prompt="Extract product prices in local currency",
- fetch_config=FetchConfig(country="fr"),
+response = client.smartscraper(
+ website_url="https://example.com",
+ user_prompt="Extract product prices in local currency",
+ country_code="fr"
)
```
#### E-commerce Price Comparison
```python
-from scrapegraph_py import Client, FetchConfig
-
-client = Client(api_key="your-api-key")
-
# Compare prices from different regions
-countries = ["us", "gb", "de", "fr"]
+countries = ["us", "uk", "de", "fr"]
for country in countries:
- response = client.extract(
- url="https://ecommerce-site.com/product/123",
- prompt="Extract product price and availability",
- fetch_config=FetchConfig(country=country),
+ response = client.smartscraper(
+ website_url="https://ecommerce-site.com/product/123",
+ user_prompt="Extract product price and availability",
+ country_code=country
)
- print(f"{country}: {response['data']}")
+ print(f"{country}: {response['result']}")
```
## Best Practices
-### 1. Choose the Right Fetch Mode
-
-Pick the mode that matches your target site:
-- **`auto`** (default) — let the system decide; works for most sites
-- **`fast`** — use for simple, static HTML pages
-- **`js`** — use for SPAs and JavaScript-rendered content
-- Add **`stealth: true`** for anti-bot sites — combine with any mode (e.g., `mode: "js"` + `stealth: true` for dynamic anti-bot sites)
-
-### 2. Use Country Code When Needed
+### 1. Use Country Code When Needed
Only specify a country code if you have a specific requirement:
-- Accessing geo-restricted content
-- Getting localized versions of websites
-- Complying with regional requirements
-- Don't specify if you don't need it — let the system optimize automatically
+- ✅ Accessing geo-restricted content
+- ✅ Getting localized versions of websites
+- ✅ Complying with regional requirements
+- ❌ Don't specify if you don't need it - let the system optimize automatically
-### 3. Let the System Handle Routing
+### 2. Let the System Handle Routing
The API automatically selects the best proxy provider for each request:
- No manual proxy selection needed
- Automatic failover ensures reliability
- Performance is optimized automatically
-### 4. Handle Errors Gracefully
+### 3. Handle Errors Gracefully
If a request fails, the system has already tried multiple providers:
-
-
-```python Python
-from scrapegraph_py import Client, FetchConfig
+```python
+from scrapegraph_py import Client
import time
client = Client(api_key="your-api-key")
@@ -431,10 +268,10 @@ client = Client(api_key="your-api-key")
def scrape_with_retry(url, prompt, max_retries=3):
for attempt in range(max_retries):
try:
- response = client.extract(
- url=url,
- prompt=prompt,
- fetch_config=FetchConfig(country="us"),
+ response = client.smartscraper(
+ website_url=url,
+ user_prompt=prompt,
+ country_code="us"
)
return response
except Exception as e:
@@ -445,29 +282,7 @@ def scrape_with_retry(url, prompt, max_retries=3):
raise e
```
-```javascript JavaScript
-import { extract } from 'scrapegraph-js';
-
-async function scrapeWithRetry(apiKey, url, prompt, maxRetries = 3) {
- for (let attempt = 0; attempt < maxRetries; attempt++) {
- const result = await extract(apiKey, {
- url,
- prompt,
- fetchConfig: { country: 'us' },
- });
- if (result.status === 'success') return result;
- if (attempt < maxRetries - 1) {
- console.log(`Attempt ${attempt + 1} failed: ${result.error}`);
- await new Promise((r) => setTimeout(r, 2 ** attempt * 1000));
- }
- }
- throw new Error('All retries failed');
-}
-```
-
-
-
-### 5. Monitor Rate Limits
+### 4. Monitor Rate Limits
Be aware of your API rate limits:
- The proxy system respects these limits automatically
@@ -483,9 +298,8 @@ If your scraping request fails:
1. **Verify the URL**: Make sure the URL is correct and accessible
2. **Check the website**: Some websites may block automated access regardless of proxy
-3. **Try a different mode**: Use `mode: "js"` with `stealth: true` for heavily-protected sites
-4. **Retry the request**: The system uses automatic retries, but you can manually retry after a delay
-5. **Try a different country**: If geo-restriction is the issue, try a different `country`
+3. **Retry the request**: The system uses automatic retries, but you can manually retry after a delay
+4. **Try a different country**: If geo-restriction is the issue, try a different `country_code`
### Rate Limiting
@@ -504,21 +318,21 @@ If you receive rate limit errors (HTTP 429):
If you're trying to access geo-restricted content:
-- Use the `country` parameter inside `FetchConfig` to specify the required country
+- Use the `country_code` parameter to specify the required country
- Make sure the content is available in that country
- Some content may still be restricted regardless of proxy location
- Try multiple country codes if one doesn't work
-### Anti-Bot Protection
+### Proxy Selection Issues
-
-If a website is blocking your requests:
+
+If you're experiencing proxy-related issues:
-- Set `stealth: true` in `FetchConfig` (combine with `mode: "js"` for dynamic sites)
-- Add a `wait` time to let the page fully load
-- Use `scrolls` to trigger lazy-loaded content
-- Add custom `headers` if the site expects specific ones
+- The system automatically tries multiple providers
+- No manual configuration is needed
+- If issues persist, contact support with your request ID
+- Check if the issue is specific to certain websites or domains
## FAQ
@@ -527,30 +341,42 @@ If a website is blocking your requests:
**A**: No, the proxy system is fully managed and automatic. You don't need to provide any proxy credentials or configuration.
-
-**A**: Use the `mode` parameter in `FetchConfig` (`auto`, `fast`, or `js`) and set `stealth: true` when you need residential proxy with anti-bot headers.
+
+**A**: No, the system automatically selects the best proxy provider for each request. This ensures optimal performance and reliability.
-
-**A**: No, the system automatically selects the best proxy provider for each request. You can influence the strategy by setting the `mode` parameter.
+
+**A**: The proxy selection is handled automatically and transparently. You don't need to know which proxy was used - just use the API as normal.
+
+
+
+**A**: The API uses managed proxy services. If you have specific proxy requirements, please contact support.
**A**: The API will return an error. The system tries multiple providers with automatic fallback, so this is rare. If it happens, verify the URL and try again.
-
-**A**: No, the `country` parameter doesn't affect pricing. Credits are charged the same regardless of proxy location.
+
+**A**: No, the `country_code` parameter doesn't affect pricing. Credits are charged the same regardless of proxy location.
-
-**A**: Yes, `FetchConfig` is available for all services including `extract`, `scrape`, `search`, `crawl`, and `monitor`.
+
+**A**: Yes, `country_code` is available for all scraping services including SmartScraper, SearchScraper, SmartCrawler, and Markdownify.
**A**: Both `uk` and `gb` refer to the United Kingdom. The API accepts both codes for compatibility.
+## API Reference
+
+For detailed API documentation, see:
+- [SmartScraper Start Job](/api-reference/endpoint/smartscraper/start)
+- [SearchScraper Start Job](/api-reference/endpoint/searchscraper/start)
+- [SmartCrawler Start Job](/api-reference/endpoint/smartcrawler/start)
+- [Markdownify Start Job](/api-reference/endpoint/markdownify/start)
+
## Support & Resources
diff --git a/services/additional-parameters/wait-ms.mdx b/services/additional-parameters/wait-ms.mdx
index 1dd0774..45a4646 100644
--- a/services/additional-parameters/wait-ms.mdx
+++ b/services/additional-parameters/wait-ms.mdx
@@ -67,20 +67,27 @@ response = client.markdownify(
### JavaScript SDK
```javascript
-import { extract, scrape } from 'scrapegraph-js';
+import { smartScraper, scrape, markdownify } from 'scrapegraph-js';
-// Extract with custom wait time
-const result = await extract('your-api-key', {
- url: 'https://example.com',
- prompt: 'Extract product information',
- fetchConfig: { wait: 5000 },
+const apiKey = 'your-api-key';
+
+// SmartScraper with custom wait time
+const response = await smartScraper(apiKey, {
+ website_url: 'https://example.com',
+ user_prompt: 'Extract product information',
+ wait_ms: 5000,
});
// Scrape with custom wait time
-const scrapeResult = await scrape('your-api-key', {
- url: 'https://example.com',
- formats: [{ type: 'markdown' }],
- fetchConfig: { wait: 5000 },
+const scrapeResponse = await scrape(apiKey, {
+ website_url: 'https://example.com',
+ wait_ms: 5000,
+});
+
+// Markdownify with custom wait time
+const mdResponse = await markdownify(apiKey, {
+ website_url: 'https://example.com',
+ wait_ms: 5000,
});
```
diff --git a/services/agenticscraper.mdx b/services/agenticscraper.mdx
index b2167d6..45e0c65 100644
--- a/services/agenticscraper.mdx
+++ b/services/agenticscraper.mdx
@@ -17,7 +17,7 @@ Agentic Scraper is our most advanced service for automating browser actions and
- **Optionally** use AI to extract structured data according to a schema
-Try it instantly in our [interactive playground](https://scrapegraphai.com/dashboard) – no coding required!
+Try it instantly in our [interactive playground](https://dashboard.scrapegraphai.com/) – no coding required!
## Difference: With vs Without AI Extraction
@@ -39,7 +39,7 @@ const apiKey = process.env.SGAI_APIKEY;
// Basic scraping without AI extraction
const response = await agenticScraper(apiKey, {
- url: 'https://scrapegraphai.com/dashboard',
+ url: 'https://dashboard.scrapegraphai.com/',
steps: [
'Type email@gmail.com in email input box',
'Type test-password@123 in password inputbox',
@@ -52,7 +52,7 @@ console.log(response.data);
// With AI extraction
const aiResponse = await agenticScraper(apiKey, {
- url: 'https://scrapegraphai.com/dashboard',
+ url: 'https://dashboard.scrapegraphai.com/',
steps: [
'Type email@gmail.com in email input box',
'Type test-password@123 in password inputbox',
@@ -86,7 +86,7 @@ curl -X 'POST' \
-H 'SGAI-APIKEY: your-api-key' \
-H 'Content-Type: application/json' \
-d '{
- "url": "https://scrapegraphai.com/dashboard",
+ "url": "https://dashboard.scrapegraphai.com/",
"use_session": true,
"steps": ["Type email@gmail.com in email input box", "Type test-password@123 in password inputbox", "click on login"],
"ai_extraction": false
@@ -99,7 +99,7 @@ curl -X 'POST' \
-H 'SGAI-APIKEY: your-api-key' \
-H 'Content-Type: application/json' \
-d '{
- "url": "https://scrapegraphai.com/dashboard",
+ "url": "https://dashboard.scrapegraphai.com/",
"use_session": true,
"steps": ["Type email@gmail.com in email input box", "Type test-password@123 in password inputbox", "click on login", "wait for dashboard to load completely"],
"user_prompt": "Extract user info, dashboard sections, and remaining credits",
@@ -132,7 +132,7 @@ client = Client(api_key=api_key)
# Basic example: login and scrape without AI
response = client.agenticscraper(
- url="https://scrapegraphai.com/dashboard",
+ url="https://dashboard.scrapegraphai.com/",
use_session=True,
steps=[
"Type email@gmail.com in email input box",
@@ -157,7 +157,7 @@ output_schema = {
}
}
ai_response = client.agenticscraper(
- url="https://scrapegraphai.com/dashboard",
+ url="https://dashboard.scrapegraphai.com/",
use_session=True,
steps=[
"Type email@gmail.com in email input box",
@@ -175,12 +175,12 @@ client.close()
```bash CLI
# Basic scraping without AI extraction
-just-scrape agentic-scraper https://scrapegraphai.com/dashboard \
+just-scrape agentic-scraper https://dashboard.scrapegraphai.com/ \
-s "Type email@gmail.com in email input box,Type test-password@123 in password inputbox,Click login" \
--use-session
# With AI extraction
-just-scrape agentic-scraper https://scrapegraphai.com/dashboard \
+just-scrape agentic-scraper https://dashboard.scrapegraphai.com/ \
-s "Type email@gmail.com in email input box,Type test-password@123 in password inputbox,Click login,wait for dashboard to load" \
--ai-extraction -p "Extract user info, dashboard sections, and remaining credits" \
--use-session
@@ -201,7 +201,7 @@ just-scrape agentic-scraper https://scrapegraphai.com/dashboard \
| ai_extraction | bool | No | true = AI extraction, false = raw content only |
-Get your API key from the [dashboard](https://scrapegraphai.com/dashboard)
+Get your API key from the [dashboard](https://dashboard.scrapegraphai.com)
## Use Cases
@@ -245,6 +245,6 @@ For technical details see:
-
+
Get your API key and start using Agentic Scraper now!
diff --git a/services/cli.mdx b/services/cli.mdx
index 1eb9b8f..ab551d2 100644
--- a/services/cli.mdx
+++ b/services/cli.mdx
@@ -6,10 +6,10 @@ icon: 'terminal'
## Overview
-`just-scrape` is the official CLI for [ScrapeGraph AI](https://scrapegraphai.com) — AI-powered web scraping, data extraction, search, and crawling, straight from your terminal. Uses the **v2 API**.
+`just-scrape` is the official CLI for [ScrapeGraph AI](https://scrapegraphai.com) — AI-powered web scraping, data extraction, search, and crawling, straight from your terminal.
-Get your API key from the [dashboard](https://scrapegraphai.com/dashboard)
+Get your API key from the [dashboard](https://dashboard.scrapegraphai.com)
## Installation
@@ -58,144 +58,110 @@ The CLI needs a ScrapeGraph API key. Four ways to provide it (checked in order):
| Variable | Description | Default |
|---|---|---|
| `SGAI_API_KEY` | ScrapeGraph API key | — |
-| `SGAI_API_URL` | Override API base URL | `https://api.scrapegraphai.com` |
-| `SGAI_TIMEOUT_S` | Request timeout in seconds | `30` |
-
-Legacy variables (`JUST_SCRAPE_API_URL`, `JUST_SCRAPE_TIMEOUT_S`, `JUST_SCRAPE_DEBUG`) are still bridged.
+| `JUST_SCRAPE_API_URL` | Override API base URL | `https://api.scrapegraphai.com/v1` |
+| `JUST_SCRAPE_TIMEOUT_S` | Request/polling timeout in seconds | `120` |
+| `JUST_SCRAPE_DEBUG` | Set to `1` to enable debug logging | `0` |
## JSON Mode
All commands support `--json` for machine-readable output. Banner, spinners, and interactive prompts are suppressed — only minified JSON on stdout. Saves tokens when piped to AI agents.
```bash
-just-scrape credits --json | jq '.remainingCredits'
-just-scrape extract https://example.com -p "Extract data" --json > result.json
+just-scrape credits --json | jq '.remaining_credits'
+just-scrape smart-scraper https://example.com -p "Extract data" --json > result.json
```
## Commands
-### Extract
+### SmartScraper
-Extract structured data from any URL using AI (replaces `smart-scraper`). [Full docs →](/api-reference/extract)
+Extract structured data from any URL using AI. [Full docs →](/services/smartscraper)
```bash
-just-scrape extract -p
-just-scrape extract -p --schema
-just-scrape extract -p --scrolls
-just-scrape extract -p --mode reader # HTML mode: normal, reader, prune
-just-scrape extract -p --stealth
-just-scrape extract -p --cookies --headers
-just-scrape extract -p --country
+just-scrape smart-scraper -p
+just-scrape smart-scraper -p --schema
+just-scrape smart-scraper -p --scrolls
+just-scrape smart-scraper -p --pages
+just-scrape smart-scraper -p --stealth
+just-scrape smart-scraper -p --cookies --headers
+just-scrape smart-scraper -p --plain-text
```
-### Search
+### SearchScraper
-Search the web and extract structured data from results (replaces `search-scraper`). [Full docs →](/api-reference/search)
+Search the web and extract structured data from results. [Full docs →](/services/searchscraper)
```bash
-just-scrape search
-just-scrape search --num-results
-just-scrape search -p
-just-scrape search --schema
-just-scrape search --format markdown # or html
-just-scrape search --country # e.g. us, de, jp
-just-scrape search --time-range past_week # past_hour|past_24_hours|past_week|past_month|past_year
-just-scrape search --headers
+just-scrape search-scraper
+just-scrape search-scraper --num-results
+just-scrape search-scraper --no-extraction
+just-scrape search-scraper --schema
+just-scrape search-scraper --stealth --headers
```
-### Scrape
-
-Scrape a URL into one or more of 8 output formats. Multi-format is supported via comma-separated `-f`. [Full docs →](/api-reference/scrape)
-
-```bash
-just-scrape scrape # markdown (default)
-just-scrape scrape -f html
-just-scrape scrape -f screenshot
-just-scrape scrape -f markdown,links,images # multi-format
-just-scrape scrape -f json -p "Extract the title" # json format requires --prompt
-just-scrape scrape -f json -p --schema
-just-scrape scrape --html-mode reader # normal | reader | prune
-just-scrape scrape --scrolls 3
-just-scrape scrape -m js --stealth
-just-scrape scrape --country
-```
-
-#### Formats
-
-| Format | Description |
-|---|---|
-| `markdown` | Clean markdown conversion (default). Respects `--html-mode`. |
-| `html` | Raw / processed HTML. Respects `--html-mode`. |
-| `screenshot` | Page screenshot (PNG). |
-| `branding` | Extracted brand assets (logos, colors, fonts). |
-| `links` | All links on the page. |
-| `images` | All images on the page. |
-| `summary` | AI-generated page summary. |
-| `json` | Structured JSON via `--prompt` (+ optional `--schema`). |
-
### Markdownify
-Convert any webpage to clean markdown (convenience wrapper for `scrape --format markdown`). [Full docs →](/api-reference/scrape)
+Convert any webpage to clean markdown. [Full docs →](/services/markdownify)
```bash
just-scrape markdownify
-just-scrape markdownify -m js --stealth
+just-scrape markdownify --stealth
just-scrape markdownify --headers
```
### Crawl
-Crawl multiple pages. The CLI starts the crawl and polls until completion. [Full docs →](/api-reference/crawl)
+Crawl multiple pages and extract data from each. [Full docs →](/services/smartcrawler)
```bash
-just-scrape crawl
-just-scrape crawl --max-pages
-just-scrape crawl --max-depth
-just-scrape crawl --max-links-per-page
-just-scrape crawl --allow-external
-just-scrape crawl -f markdown # or html, json, etc.
-just-scrape crawl -m js --stealth
+just-scrape crawl -p
+just-scrape crawl -p --max-pages
+just-scrape crawl -p --depth
+just-scrape crawl --no-extraction --max-pages
+just-scrape crawl -p --schema
+just-scrape crawl -p --rules
+just-scrape crawl -p --no-sitemap
+just-scrape crawl -p --stealth
```
-### Fetch Modes
-
-Use `-m / --mode` on `scrape`, `markdownify`, and `crawl` to choose how pages are fetched. Add `--stealth` to enable anti-bot bypass.
+### Scrape
-| Mode | Description |
-|---|---|
-| `auto` | Automatic selection (default) |
-| `fast` | Fastest, no JS rendering |
-| `js` | Full JS rendering |
+Get raw HTML content from a URL. [Full docs →](/services/scrape)
-
-On `extract`, `--mode` sets the **HTML processing mode** (`normal`, `reader`, `prune`) instead. Use `--stealth` separately for anti-bot bypass.
-
+```bash
+just-scrape scrape
+just-scrape scrape --stealth
+just-scrape scrape --branding
+just-scrape scrape --country-code
+```
-### Monitor
+### Sitemap
-Create and manage page-change monitors that track changes on a URL at a set interval.
+Get all URLs from a website's sitemap. [Full docs →](/services/sitemap)
```bash
-# Create a monitor
-just-scrape monitor create --url --interval
-just-scrape monitor create --url --interval 1h --name "My Monitor"
-just-scrape monitor create --url --interval 30m -f markdown,links --webhook-url
-just-scrape monitor create --url --interval 1d -m js --stealth
+just-scrape sitemap
+just-scrape sitemap --json | jq -r '.urls[]'
+```
+
+### Agentic Scraper
+
+Browser automation with AI — login, click, navigate, fill forms. [Full docs →](/services/agenticscraper)
-# List all monitors
-just-scrape monitor list
+```bash
+just-scrape agentic-scraper -s
+just-scrape agentic-scraper -s --ai-extraction -p
+just-scrape agentic-scraper -s --schema
+just-scrape agentic-scraper -s --use-session
+```
-# Get a specific monitor
-just-scrape monitor get --id
+### Generate Schema
-# Update a monitor
-just-scrape monitor update --id --interval 2h
-just-scrape monitor update --id --name "New Name" -f html,screenshot
+Generate a JSON schema from a natural language description.
-# Pause / resume / delete
-just-scrape monitor pause --id
-just-scrape monitor resume --id
-just-scrape monitor delete --id
+```bash
+just-scrape generate-schema
+just-scrape generate-schema --existing-schema
```
### History
@@ -210,7 +176,7 @@ just-scrape history --page-size
just-scrape history --json
```
-Services: `scrape`, `extract`, `schema`, `search`, `monitor`, `crawl`
+Services: `markdownify`, `smartscraper`, `searchscraper`, `scrape`, `crawl`, `agentic-scraper`, `sitemap`
### Credits
@@ -218,7 +184,14 @@ Check your credit balance.
```bash
just-scrape credits
-just-scrape credits --json | jq '.remainingCredits'
+```
+
+### Validate
+
+Validate your API key.
+
+```bash
+just-scrape validate
```
## AI Agent Integration
@@ -241,7 +214,7 @@ bunx skills add https://github.com/ScrapeGraphAI/just-scrape
Join our Discord community
-
+
Get your API key
diff --git a/services/cli/ai-agent-skill.mdx b/services/cli/ai-agent-skill.mdx
index 4c64a98..50ee527 100644
--- a/services/cli/ai-agent-skill.mdx
+++ b/services/cli/ai-agent-skill.mdx
@@ -17,10 +17,9 @@ Browse the skill: [skills.sh/scrapegraphai/just-scrape/just-scrape](https://skil
Once installed, your coding agent can:
-- Extract structured data from any website using AI
+- Scrape a website to gather data needed for a task
- Convert documentation pages to markdown for context
- Search the web and extract structured results
-- Crawl multiple pages and collect data
- Check your credit balance mid-session
- Browse request history
@@ -29,13 +28,13 @@ Once installed, your coding agent can:
Agents call `just-scrape` in `--json` mode for clean, token-efficient output:
```bash
-just-scrape extract https://api.example.com/docs \
+just-scrape smart-scraper https://api.example.com/docs \
-p "Extract all endpoint names, methods, and descriptions" \
--json
```
```bash
-just-scrape search "latest release notes for react-query" \
+just-scrape search-scraper "latest release notes for react-query" \
--num-results 3 --json
```
@@ -77,17 +76,15 @@ This project uses `just-scrape` (ScrapeGraph AI CLI) for web scraping.
The API key is set via the SGAI_API_KEY environment variable.
Available commands (always use --json flag):
-- `just-scrape extract -p --json` — AI extraction from a URL
-- `just-scrape search --json` — search the web and extract data
+- `just-scrape smart-scraper -p --json` — AI extraction from a URL
+- `just-scrape search-scraper --json` — search the web and extract data
- `just-scrape markdownify --json` — convert a page to markdown
-- `just-scrape crawl --json` — crawl multiple pages
-- `just-scrape scrape --json` — get page content (markdown, html, screenshot, branding, links, images, summary, json)
-- `just-scrape credits --json` — check credit balance
+- `just-scrape crawl -p --json` — crawl multiple pages
+- `just-scrape scrape --json` — get raw HTML
+- `just-scrape sitemap --json` — get all URLs from a sitemap
Use --schema to enforce a JSON schema on the output.
-Use --mode direct+stealth or --mode js+stealth for sites with anti-bot protection.
-Use -f to pick scrape format(s), e.g. -f markdown,links,images for multi-format.
-Use --location-geo-code and --time-range with search for geo/time filtering.
+Use --stealth for sites with anti-bot protection.
```
### Example prompts for Claude Code
@@ -123,7 +120,7 @@ claude -p "Use just-scrape to scrape https://example.com/changelog \
- Pass `--schema` with a JSON schema to get typed, predictable output:
```bash
-just-scrape extract https://example.com \
+just-scrape smart-scraper https://example.com \
-p "Extract company info" \
--schema '{"type":"object","properties":{"name":{"type":"string"},"founded":{"type":"number"}}}' \
--json
diff --git a/services/cli/commands.mdx b/services/cli/commands.mdx
index 295fd86..566f827 100644
--- a/services/cli/commands.mdx
+++ b/services/cli/commands.mdx
@@ -3,113 +3,110 @@ title: 'Commands'
description: 'Full reference for every just-scrape command and its flags'
---
-## extract
+## smart-scraper
-Extract structured data from any URL using AI (replaces `smart-scraper`). [Full docs →](/api-reference/extract)
+Extract structured data from any URL using AI. [Full docs →](/services/smartscraper)
```bash
-just-scrape extract -p
-just-scrape extract -p --schema
-just-scrape extract -p --scrolls # infinite scroll (0-100)
-just-scrape extract -p --stealth # anti-bot bypass
-just-scrape extract -p --mode reader # HTML mode: normal (default), reader, prune
-just-scrape extract -p --cookies --headers
-just-scrape extract -p --country # geo-targeting
+just-scrape smart-scraper -p
+just-scrape smart-scraper -p --schema
+just-scrape smart-scraper -p --scrolls # infinite scroll (0-100)
+just-scrape smart-scraper -p --pages # multi-page (1-100)
+just-scrape smart-scraper -p --stealth # anti-bot bypass (+4 credits)
+just-scrape smart-scraper -p --cookies --headers
+just-scrape smart-scraper -p --plain-text # plain text instead of JSON
```
-## search
+## search-scraper
-Search the web and extract structured data from results (replaces `search-scraper`). [Full docs →](/api-reference/search)
+Search the web and extract structured data from results. [Full docs →](/services/searchscraper)
```bash
-just-scrape search
-just-scrape search -p # extraction prompt for results
-just-scrape search --num-results # sources to scrape (1-20, default 3)
-just-scrape search --schema
-just-scrape search --country # geo-target (e.g. 'us', 'de', 'jp-tk')
-just-scrape search --time-range # past_hour | past_24_hours | past_week | past_month | past_year
-just-scrape search --format # result format (default markdown)
-just-scrape search --headers
+just-scrape search-scraper
+just-scrape search-scraper --num-results # sources to scrape (3-20, default 3)
+just-scrape search-scraper --no-extraction # markdown only (2 credits vs 10)
+just-scrape search-scraper --schema
+just-scrape search-scraper --stealth --headers
```
## markdownify
-Convert any webpage to clean markdown (convenience wrapper for `scrape --format markdown`). [Full docs →](/api-reference/scrape)
+Convert any webpage to clean markdown. [Full docs →](/services/markdownify)
```bash
just-scrape markdownify
-just-scrape markdownify -m js --stealth # anti-bot bypass
+just-scrape markdownify --stealth
just-scrape markdownify --headers
```
+## crawl
+
+Crawl multiple pages and extract data from each. [Full docs →](/services/smartcrawler)
+
+```bash
+just-scrape crawl -p
+just-scrape crawl -p --max-pages # max pages (default 10)
+just-scrape crawl -p --depth # crawl depth (default 1)
+just-scrape crawl --no-extraction --max-pages # markdown only (2 credits/page)
+just-scrape crawl -p --schema
+just-scrape crawl -p --rules # include_paths, same_domain
+just-scrape crawl -p