Skip to content

The python-crawlee-beautifulsoup template produces a scraper which crashes as an Actor on Apify #352

@honzajavorek

Description

@honzajavorek

Steps to reproduce:

  1. mkdir -p ~/Projects
  2. cd ~/Projects
  3. apify create crashy-crash --template=python-crawlee-beautifulsoup
  4. cd ./crashy-crash
  5. apify push
  6. Answer Y to a question Do you want to open the Actor detail in your browser? in your terminal.
  7. Hit a blue Start Actor button.
  8. Hit a green Save & Start button.
  9. See Log.

The steps assume you are logged-in to an Apify account in your default browser. The same scraper finishes all right locally with apify run. The log:

2025-03-12T10:20:47.862Z ACTOR: Pulling Docker image of build nkuJNcJqM53LdOTZH from repository.
2025-03-12T10:20:48.590Z ACTOR: Creating Docker container.
2025-03-12T10:20:49.046Z ACTOR: Starting Docker container.
2025-03-12T10:20:51.068Z Downloading model definition files...
2025-03-12T10:20:51.190Z Error downloading fingerprint-network.zip: [Errno 13] Permission denied: '/usr/local/lib/python3.13/site-packages/browserforge/fingerprints/data/fingerprint-network.zip'
2025-03-12T10:20:51.191Z Error downloading fingerprint-network.zip: [Errno 13] Permission denied: '/usr/local/lib/python3.13/site-packages/browserforge/fingerprints/data/fingerprint-network.zip'
2025-03-12T10:20:51.191Z Error downloading fingerprint-network.zip: [Errno 13] Permission denied: '/usr/local/lib/python3.13/site-packages/browserforge/fingerprints/data/fingerprint-network.zip'
2025-03-12T10:20:51.192Z Error downloading fingerprint-network.zip: [Errno 13] Permission denied: '/usr/local/lib/python3.13/site-packages/browserforge/fingerprints/data/fingerprint-network.zip'
2025-03-12T10:20:51.192Z Error downloading fingerprint-network.zip: [Errno 13] Permission denied: '/usr/local/lib/python3.13/site-packages/browserforge/fingerprints/data/fingerprint-network.zip'
2025-03-12T10:20:51.193Z Downloading model definition files...
2025-03-12T10:20:51.211Z Error downloading input-network.zip: [Errno 13] Permission denied: '/usr/local/lib/python3.13/site-packages/browserforge/headers/data/input-network.zip'
2025-03-12T10:20:51.212Z Error downloading input-network.zip: [Errno 13] Permission denied: '/usr/local/lib/python3.13/site-packages/browserforge/headers/data/input-network.zip'
2025-03-12T10:20:51.213Z Error downloading input-network.zip: [Errno 13] Permission denied: '/usr/local/lib/python3.13/site-packages/browserforge/headers/data/input-network.zip'
2025-03-12T10:20:51.213Z Error downloading input-network.zip: [Errno 13] Permission denied: '/usr/local/lib/python3.13/site-packages/browserforge/headers/data/input-network.zip'
2025-03-12T10:20:51.229Z Traceback (most recent call last):
2025-03-12T10:20:51.237Z   File "<frozen runpy>", line 198, in _run_module_as_main
2025-03-12T10:20:51.239Z   File "<frozen runpy>", line 88, in _run_code
2025-03-12T10:20:51.239Z   File "/usr/src/app/src/__main__.py", line 3, in <module>
2025-03-12T10:20:51.240Z     from .main import main
2025-03-12T10:20:51.240Z   File "/usr/src/app/src/main.py", line 9, in <module>
2025-03-12T10:20:51.240Z     from apify import Actor
2025-03-12T10:20:51.241Z   File "/usr/local/lib/python3.13/site-packages/apify/__init__.py", line 15, in <module>
2025-03-12T10:20:51.241Z     from apify._actor import Actor
2025-03-12T10:20:51.241Z   File "/usr/local/lib/python3.13/site-packages/apify/_actor.py", line 28, in <module>
2025-03-12T10:20:51.242Z     from apify._charging import ChargeResult, ChargingManager, ChargingManagerImplementation
2025-03-12T10:20:51.242Z   File "/usr/local/lib/python3.13/site-packages/apify/_charging.py", line 17, in <module>
2025-03-12T10:20:51.242Z     from apify.storages import Dataset
2025-03-12T10:20:51.243Z   File "/usr/local/lib/python3.13/site-packages/apify/storages/__init__.py", line 3, in <module>
2025-03-12T10:20:51.246Z     from ._request_list import RequestList
2025-03-12T10:20:51.246Z   File "/usr/local/lib/python3.13/site-packages/apify/storages/_request_list.py", line 13, in <module>
2025-03-12T10:20:51.247Z     from crawlee.http_clients import HttpClient, HttpxHttpClient
2025-03-12T10:20:51.247Z   File "/usr/local/lib/python3.13/site-packages/crawlee/http_clients/__init__.py", line 6, in <module>
2025-03-12T10:20:51.247Z     from ._httpx import HttpxHttpClient
2025-03-12T10:20:51.248Z   File "/usr/local/lib/python3.13/site-packages/crawlee/http_clients/_httpx.py", line 13, in <module>
2025-03-12T10:20:51.248Z     from crawlee.fingerprint_suite import HeaderGenerator
2025-03-12T10:20:51.249Z   File "/usr/local/lib/python3.13/site-packages/crawlee/fingerprint_suite/__init__.py", line 1, in <module>
2025-03-12T10:20:51.249Z     from ._browserforge_adapter import BrowserforgeFingerprintGenerator as DefaultFingerprintGenerator
2025-03-12T10:20:51.249Z   File "/usr/local/lib/python3.13/site-packages/crawlee/fingerprint_suite/_browserforge_adapter.py", line 10, in <module>
2025-03-12T10:20:51.250Z     from browserforge.fingerprints import Fingerprint as bf_Fingerprint
2025-03-12T10:20:51.250Z   File "/usr/local/lib/python3.13/site-packages/browserforge/fingerprints/__init__.py", line 5, in <module>
2025-03-12T10:20:51.251Z     from browserforge.headers import Browser
2025-03-12T10:20:51.252Z   File "/usr/local/lib/python3.13/site-packages/browserforge/headers/__init__.py", line 5, in <module>
2025-03-12T10:20:51.253Z     from .generator import Browser, HeaderGenerator
2025-03-12T10:20:51.253Z   File "/usr/local/lib/python3.13/site-packages/browserforge/headers/generator.py", line 80, in <module>
2025-03-12T10:20:51.254Z     class HeaderGenerator:
2025-03-12T10:20:51.254Z     ...<470 lines>...
2025-03-12T10:20:51.254Z             )
2025-03-12T10:20:51.255Z   File "/usr/local/lib/python3.13/site-packages/browserforge/headers/generator.py", line 86, in HeaderGenerator
2025-03-12T10:20:51.255Z     input_generator_network = BayesianNetwork(DATA_DIR / "input-network.zip")
2025-03-12T10:20:51.256Z   File "/usr/local/lib/python3.13/site-packages/browserforge/bayesian_network.py", line 103, in __init__
2025-03-12T10:20:51.256Z     network_definition = extract_json(path)
2025-03-12T10:20:51.256Z   File "/usr/local/lib/python3.13/site-packages/browserforge/bayesian_network.py", line 288, in extract_json
2025-03-12T10:20:51.257Z     with zipfile.ZipFile(path, 'r') as zf:
2025-03-12T10:20:51.257Z          ~~~~~~~~~~~~~~~^^^^^^^^^^^
2025-03-12T10:20:51.257Z   File "/usr/local/lib/python3.13/zipfile/__init__.py", line 1367, in __init__
2025-03-12T10:20:51.260Z     self.fp = io.open(file, filemode)
2025-03-12T10:20:51.261Z               ~~~~~~~^^^^^^^^^^^^^^^^
2025-03-12T10:20:51.261Z FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/lib/python3.13/site-packages/browserforge/headers/data/input-network.zip'

Image


This blocks my work on the Web scraping basics for Python devs, specifically apify/apify-docs#1424, where I'm trying to show people how they can use the platform for their benefit.

Metadata

Metadata

Assignees

No one assigned

    Labels

    t-dxIssues owned by the DX team.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions