feat: CLI migration + progressive disclosure redesign for ultimate-scraper#33
Conversation
…jobs, real estate
- Standardize auth to OAuth-first across all skills - Fix security contradiction in actorization (remove -t flag) - Delete legacy Node.js scripts (replaced by CLI commands) - Bump version to 2.0.0 - Add design spec and implementation plan Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove error handling table (moving non-obvious errors to gotchas.md), add 4 new routing rows for e-commerce, contact enrichment, knowledge base/RAG, and company research, and replace error section with a brief troubleshooting pointer. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…low guides Added 7 new pipelines across 3 files from combined-patterns research: - brand-monitoring: Twitter/X real-time mention routing (P16), Reddit brand monitoring (P17), multi-platform social listening with sentiment (P18) - review-analysis: competitor review intelligence (P21), Google Play app review monitoring (P22), multi-platform hospitality aggregation (P20) - content-and-seo: SERP content brief generation (P23), sitemap content audit (P24), keyword rank tracking with alerts (P26), deep research agent (P54) All pipelines include explicit pipe field paths, PPE cost estimates where applicable, and non-obvious gotchas only. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…with research patterns Added 3 new pipelines to lead-generation.md (Sales Navigator bulk, SERP discovery, Apollo icebreakers, Reddit lead mining), 3 to competitive-intel.md (website change detection, SERP position monitoring, feature benchmarking), and 3 to influencer-vetting.md (TikTok creator vetting, YouTube channel audit, cross-platform hashtag discovery). All entries include explicit field paths, cost estimates for PPE Actors, and per-pipeline gotchas. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…flow guides Add 2 pipelines to each guide from research patterns: Instagram competitor analysis + LinkedIn company page analytics (social); Reddit trend mining + YouTube outlier discovery (trend); sales signal outreach + Upwork monitoring (jobs); lead scoring/routing + construction discovery (real estate). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…esearch) Adds workflow reference guides for the 4 new categories identified in combined-patterns.md research: e-commerce price monitoring (patterns 45-49), contact enrichment (50-52), knowledge base and RAG pipelines (53-55), and company research (56-58). Each guide follows the existing format with When/Pipeline/Output fields/Cost estimate/Gotcha sections. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Few issues which needs to be fixed: The skill instructs to run apify info as an authentication check before every Actor call. This is wasteful — if auth is missing, apify actors call will fail with a clear error. The check should be removed from the workflow and auth should only be handled reactively on failure. aka - you don't want to run this everytime, when I'm already logged in. 3 - Missing stderr redirect causes JSON parsing failures The skill says to always pass --json to CLI commands, but doesn't mention that apify actors call --json writes progress messages to stderr. When the output is piped to a JSON parser, stderr and stdout get mixed, producing invalid JSON. This caused JSONDecodeError during our test run. Fix: All CLI command examples that are meant to be parsed programmatically should include 2>/dev/null. For example: apify actors call "ACTOR_ID" -i 'JSON_INPUT' --json 2>/dev/null Alternatively, add a global note to the existing rule on line 10: This applies to all commands where JSON output is consumed: apify actors call, apify actors info, apify runs info, apify datasets get-items, etc. 4 - Pricing is not working. I already try to do that in previous version of these skills and never get exact and right costs. E.g. I get 4x lower costs that was reality. That's just confusing for the users. Just few notes:
|
1. Delete docs/superpowers/ (specs/plans don't belong in repo) 2. Remove pre-run auth check (apify info) - handle auth reactively on failure instead of checking before every run. Added .env file sourcing as auth option. 3. Add 2>/dev/null to all CLI command examples in SKILL.md to prevent stderr mixing with JSON output (causes JSONDecodeError in parsers) 4. Strip all dollar-amount cost estimates from workflow guides (were 4x inaccurate in testing). Keep pricing model awareness (FREE/PPE/FLAT) in gotchas.md but without specific amounts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Restore all cost estimates removed in the previous commit. Add a mandatory disclaimer to gotchas.md cost estimation protocol: agents must always present estimates as rough guidance with a warning that actual costs can vary significantly. This addresses the accuracy concern while keeping the estimates useful as rough signals. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Hey @vystrcild, thanks for the thorough review. All points addressed in 2 commits: Fixed:
On your strategic notes:
|
|
I like the approach, OAuth in CLI is pretty cool unlock. Maybe one suggestion would be to use Markdown links for referencing other files (like when SKILL.md references some file from /references) - this way you can use extension to check validity of the link - you are not pointing to non existing files when modifying skills/agents/commands. And I have one high level question (not really blocking this PR): Would it make sense in some cases use MCP instead of CLI? I think that biggest difference between CLI & MCP is that CLI can do some of the calls "for free"/without auth - which is super useful in discovery phase. I am wondering how can we reuse this skill in platforms where it is not possible to install (non technical platforms) - I would say this leads to version of this skill which uses MCP instead of tools. This version would make sense for platforms aimed at less technical audience. Generally there are at least 3 approaches to serve Apify:
Ideally we should have all of these, but maybe I just don' have enough of your context :) |
Summary
Major upgrade to the
apify-ultimate-scraperskill: migrates from REST API scripts to Apify CLI, restructures the information architecture using progressive disclosure, and enriches all workflow guides with 58 research-backed data pipeline patterns.Phase 1: CLI migration
apify actors call --json,actors search,actors info,datasets get-items)--jsonoutput as stable API contract - immune to upcoming CLI UI changes (Markdown default, colors)apify login) with env var fallback. Fixed security contradiction in actorization skill (was usingapify login -texposing tokens in shell history, aligned with PR fix: migrate security fixes to actorization skill #31)Phase 2: Progressive disclosure restructure
Phase 3: Research-driven workflow enrichment
results[].website->startUrls), PPE cost estimates, and gotchasPhase 4: New content
By the numbers
Scope
apify-ultimate-scraperskill only (full rewrite)apify-actorizationauth fix (aligned with PR fix: migrate security fixes to actorization skill #31)apify-actor-developmentminor auth alignment (OAuth-first)commands/create-actor.mdauth alignment