feat: CLI migration + progressive disclosure redesign for ultimate-scraper by lukas-bekr · Pull Request #33 · apify/agent-skills

lukas-bekr · 2026-03-30T22:59:48Z

Summary

Major upgrade to the apify-ultimate-scraper skill: migrates from REST API scripts to Apify CLI, restructures the information architecture using progressive disclosure, and enriches all workflow guides with 58 research-backed data pipeline patterns.

Phase 1: CLI migration

Replaced 3 Node.js scripts (search_actors.js, run_actor.js, fetch_actor_details.js) with CLI commands (apify actors call --json, actors search, actors info, datasets get-items)
--json output as stable API contract - immune to upcoming CLI UI changes (Markdown default, colors)
OAuth-first authentication (apify login) with env var fallback. Fixed security contradiction in actorization skill (was using apify login -t exposing tokens in shell history, aligned with PR fix: migrate security fixes to actorization skill #31)

Phase 2: Progressive disclosure restructure

Replaced monolithic 400-line Actor index with hub-and-spoke architecture
SKILL.md (~109 lines) routes to lean actor-index (206 lines) + 14 workflow guides + gotchas (108 lines)
Simple task ("scrape Nike's Instagram") loads ~300 lines. Complex pipeline loads ~500. Neither loads the other 13 guides.

Phase 3: Research-driven workflow enrichment

4-workstream research: Notion internal use cases + AI research (Perplexity/Gemini/ChatGPT) + n8n template library scraping (85+ templates, 26 use Apify) + social media scraping
58 distinct workflow patterns mapped to Apify Actors, ranked by cross-source frequency
Every workflow guide now has 4-6 pipelines with explicit Actor chaining, data piping (results[].website -> startUrls), PPE cost estimates, and gotchas

Phase 4: New content

4 new workflow categories: e-commerce price monitoring, contact enrichment, knowledge base/RAG, company research (covers 5,000+ Store Actors with previously zero workflow coverage)
Enriched gotchas with anti-bot guidance (Cloudflare, SPA, fingerprinting), platform rate limits, cost estimation protocols

By the numbers

17 files, 1,597 lines (was 13 files, 782 lines)
Token budget for simple tasks: ~300 lines (unchanged, progressive disclosure)
14 workflow guides with 4-6 pipelines each (was 10 with 1-4 each)
Design principles: Anthropic's "Lessons from Building Skills" - skip the obvious, gotchas are highest-signal, hub-and-spoke progressive disclosure, don't railroad

Scope

apify-ultimate-scraper skill only (full rewrite)
apify-actorization auth fix (aligned with PR fix: migrate security fixes to actorization skill #31)
apify-actor-development minor auth alignment (OAuth-first)
commands/create-actor.md auth alignment
Did NOT touch developer skill content (actor-development, actorization workflows) - Patrik's territory

…, brand, reviews

…jobs, real estate

…uting

- Standardize auth to OAuth-first across all skills - Fix security contradiction in actorization (remove -t flag) - Delete legacy Node.js scripts (replaced by CLI commands) - Bump version to 2.0.0 - Add design spec and implementation plan Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove error handling table (moving non-obvious errors to gotchas.md), add 4 new routing rows for e-commerce, contact enrichment, knowledge base/RAG, and company research, and replace error section with a brief troubleshooting pointer. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…low guides Added 7 new pipelines across 3 files from combined-patterns research: - brand-monitoring: Twitter/X real-time mention routing (P16), Reddit brand monitoring (P17), multi-platform social listening with sentiment (P18) - review-analysis: competitor review intelligence (P21), Google Play app review monitoring (P22), multi-platform hospitality aggregation (P20) - content-and-seo: SERP content brief generation (P23), sitemap content audit (P24), keyword rank tracking with alerts (P26), deep research agent (P54) All pipelines include explicit pipe field paths, PPE cost estimates where applicable, and non-obvious gotchas only. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…with research patterns Added 3 new pipelines to lead-generation.md (Sales Navigator bulk, SERP discovery, Apollo icebreakers, Reddit lead mining), 3 to competitive-intel.md (website change detection, SERP position monitoring, feature benchmarking), and 3 to influencer-vetting.md (TikTok creator vetting, YouTube channel audit, cross-platform hashtag discovery). All entries include explicit field paths, cost estimates for PPE Actors, and per-pipeline gotchas. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…flow guides Add 2 pipelines to each guide from research patterns: Instagram competitor analysis + LinkedIn company page analytics (social); Reddit trend mining + YouTube outlier discovery (trend); sales signal outreach + Upwork monitoring (jobs); lead scoring/routing + construction discovery (real estate). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…esearch) Adds workflow reference guides for the 4 new categories identified in combined-patterns.md research: e-commerce price monitoring (patterns 45-49), contact enrichment (50-52), knowledge base and RAG pipelines (53-55), and company research (56-58). Each guide follows the existing format with When/Pipeline/Output fields/Cost estimate/Gotcha sections. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… rate limits

vystrcild · 2026-04-14T11:40:41Z

Few issues which needs to be fixed:
1 - Delete your docs/superpowers folder
2 - Unnecessary auth check before every run

The skill instructs to run apify info as an authentication check before every Actor call. This is wasteful — if auth is missing, apify actors call will fail with a clear error. The check should be removed from the workflow and auth should only be handled reactively on failure.

aka - you don't want to run this everytime, when I'm already logged in.

3 - Missing stderr redirect causes JSON parsing failures

The skill says to always pass --json to CLI commands, but doesn't mention that apify actors call --json writes progress messages to stderr. When the output is piped to a JSON parser, stderr and stdout get mixed, producing invalid JSON. This caused JSONDecodeError during our test run.

Fix: All CLI command examples that are meant to be parsed programmatically should include 2>/dev/null. For example:

apify actors call "ACTOR_ID" -i 'JSON_INPUT' --json 2>/dev/null

Alternatively, add a global note to the existing rule on line 10:
Rule: Always pass --json and 2>/dev/null to CLI commands. JSON output is stable across CLI versions. Never parse human-readable output.

This applies to all commands where JSON output is consumed: apify actors call, apify actors info, apify runs info, apify datasets get-items, etc.

4 - Pricing is not working. I already try to do that in previous version of these skills and never get exact and right costs. E.g. I get 4x lower costs that was reality. That's just confusing for the users.

Just few notes:

I still didn't try many cases - those were just first obvious issues
We need to be clear that CLI team will test these skills with every update of apify cli
I still believe that we should have two versions of this skills - one for CLI and one for API. Reason: You can't install cli everywhere - some VM, CI runner, sandbox where you don't have rights etc. OAuth needs browser, so login will not work in headless VM or container without GUI - although this should be fixed by fallback to .env file (btw I see that you're reading APIFY_TOKEN from env var in shell and not .env file - so that should be added too).

1. Delete docs/superpowers/ (specs/plans don't belong in repo) 2. Remove pre-run auth check (apify info) - handle auth reactively on failure instead of checking before every run. Added .env file sourcing as auth option. 3. Add 2>/dev/null to all CLI command examples in SKILL.md to prevent stderr mixing with JSON output (causes JSONDecodeError in parsers) 4. Strip all dollar-amount cost estimates from workflow guides (were 4x inaccurate in testing). Keep pricing model awareness (FREE/PPE/FLAT) in gotchas.md but without specific amounts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Restore all cost estimates removed in the previous commit. Add a mandatory disclaimer to gotchas.md cost estimation protocol: agents must always present estimates as rough guidance with a warning that actual costs can vary significantly. This addresses the accuracy concern while keeping the estimates useful as rough signals. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

lukas-bekr · 2026-04-14T15:03:37Z

Hey @vystrcild, thanks for the thorough review. All points addressed in 2 commits:

Fixed:

docs/superpowers/ deleted
Removed apify info pre-check - auth is now reactive (only on failure). Added .env file sourcing as auth option alongside env var and OAuth
Added 2>/dev/null to all CLI command examples in SKILL.md. Updated the global rule to: "Always pass --json and 2>/dev/null to CLI commands"
Cost estimates restored but with a mandatory disclaimer - agents must always caveat estimates as rough guidance that can vary significantly

On your strategic notes:

CLI team testing: agreed, let's discuss with Patrik how to integrate skill testing into the CLI release process
CLI vs API versions: parking for now, the env var + .env file fallback covers most headless scenarios

Jakub-Vacek · 2026-04-21T11:39:37Z

I like the approach, OAuth in CLI is pretty cool unlock.

Maybe one suggestion would be to use Markdown links for referencing other files (like when SKILL.md references some file from /references) - this way you can use extension to check validity of the link - you are not pointing to non existing files when modifying skills/agents/commands.

And I have one high level question (not really blocking this PR): Would it make sense in some cases use MCP instead of CLI?

I think that biggest difference between CLI & MCP is that CLI can do some of the calls "for free"/without auth - which is super useful in discovery phase. I am wondering how can we reuse this skill in platforms where it is not possible to install (non technical platforms) - I would say this leads to version of this skill which uses MCP instead of tools. This version would make sense for platforms aimed at less technical audience.

Generally there are at least 3 approaches to serve Apify:

Native connector (Strands, LangChain, Claude Desktop, N8N) => great UX, needs to be developed = high investment/low portability
MCP + skills (and other MD AI files) => improving UX (MCP auth), improving portability (skills standard, likely incoming plugin standards). Low investment/high portability
CLI + skills (and other MD AI files) => UX depends on the CLI (and how will platform handle it), improving portability (skills standard, likely incoming plugin standards). Low investment/high portability

Ideally we should have all of these, but maybe I just don' have enough of your context :)

lukas-bekr and others added 11 commits March 30, 2026 14:29

refactor: replace monolithic actor index with lean lookup + add gotchas

9f0f3a0

feat: add workflow guides for lead-gen, competitive-intel, influencer…

6d6dbf8

…, brand, reviews

feat: add workflow guides for content/SEO, social analytics, trends, …

f325c05

…jobs, real estate

refactor: rewrite SKILL.md with three-layer progressive disclosure ro…

36c25ce

…uting

feat: enrich gotchas with error recovery, anti-bot guidance, platform…

0bba7c4

… rate limits

patrikbraborec mentioned this pull request Apr 7, 2026

Extend CLI telemetry apify/apify-cli#1073

Open

1 task

lukas-bekr and others added 2 commits April 14, 2026 15:27

vystrcild approved these changes Apr 21, 2026

View reviewed changes

vystrcild merged commit 2227b17 into apify:main Apr 21, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: CLI migration + progressive disclosure redesign for ultimate-scraper#33

feat: CLI migration + progressive disclosure redesign for ultimate-scraper#33
vystrcild merged 13 commits into
apify:mainfrom
lukas-bekr:feat/ultimate-scraper-cli-migration-and-workflow-upgrade

lukas-bekr commented Mar 30, 2026 •

edited

Loading

Uh oh!

vystrcild commented Apr 14, 2026

Uh oh!

lukas-bekr commented Apr 14, 2026

Uh oh!

Jakub-Vacek commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

lukas-bekr commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Phase 1: CLI migration

Phase 2: Progressive disclosure restructure

Phase 3: Research-driven workflow enrichment

Phase 4: New content

By the numbers

Scope

Uh oh!

vystrcild commented Apr 14, 2026

Uh oh!

lukas-bekr commented Apr 14, 2026

Uh oh!

Jakub-Vacek commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lukas-bekr commented Mar 30, 2026 •

edited

Loading