Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
625649e
Improving the security section
mtrunkat Feb 28, 2023
ee7422d
Upgrading limits section
mtrunkat Feb 28, 2023
58f90ec
WIP
mtrunkat Mar 1, 2023
e8488cb
Update limits.md
mtrunkat Mar 1, 2023
078b3e5
Update limits.md
mtrunkat Mar 1, 2023
ec8666c
WIP
mtrunkat Mar 1, 2023
790f593
Merge branch 'feature/docs-improvements-1' of github.com:apify/apify-…
mtrunkat Mar 1, 2023
2f89914
WIP
mtrunkat Mar 1, 2023
2ccbad0
Improving Mara's docs on actors in store (#535)
mhamas Mar 3, 2023
e9ef876
WIP
mtrunkat Mar 6, 2023
3454718
Merge branch 'master' of github.com:apify/apify-docs into feature/doc…
mtrunkat Mar 6, 2023
2f0119a
WIP
mtrunkat Mar 6, 2023
12313cf
WIP
mtrunkat Mar 8, 2023
f68bddd
WIP
mtrunkat Mar 8, 2023
cfb14d8
Merge branch 'master' of github.com:apify/apify-docs into feature/doc…
mtrunkat Mar 8, 2023
571a6e2
Linting
mtrunkat Mar 8, 2023
f2c3539
Fixing container web server page
mtrunkat Mar 8, 2023
2cdc2f4
WIP
mtrunkat Mar 8, 2023
6ebb933
WIP
mtrunkat Mar 8, 2023
c429d03
WIP
mtrunkat Mar 8, 2023
fad5b95
WIP
mtrunkat Mar 8, 2023
a652475
WIP
mtrunkat Mar 8, 2023
6efd5cc
WIP
mtrunkat Mar 8, 2023
cb0f1d7
WIP
mtrunkat Mar 8, 2023
18d7f1d
feat: Moving getting started into actors/running and actors/developme…
mtrunkat Mar 10, 2023
2c13f5d
Renaming AcademyCard to Card component
mtrunkat Mar 10, 2023
0265e4f
Merge branch 'feature/docs-improvements-2' of github.com:apify/apify-…
mtrunkat Mar 10, 2023
4f8215e
fix: academy tutorials sections position
PerVillalva Mar 15, 2023
4c4fe3a
Update sources/academy/tutorials/api/index.md
mtrunkat Mar 16, 2023
75068aa
Update sources/academy/tutorials/api/index.md
mtrunkat Mar 16, 2023
c505907
Update sources/platform/actors/development/index.md
mtrunkat Mar 16, 2023
54ca25d
Update sources/platform/actors/development/index.md
mtrunkat Mar 16, 2023
df54cdc
Update sources/platform/actors/running/index.md
mtrunkat Mar 16, 2023
fe966dd
Update sources/platform/actors/running/index.md
mtrunkat Mar 16, 2023
bb6401d
Update sources/platform/actors/running/index.md
mtrunkat Mar 16, 2023
a856265
Update sources/platform/actors/running/input_and_output.md
mtrunkat Mar 16, 2023
bf6b1cb
Update sources/platform/actors/running/index.md
mtrunkat Mar 16, 2023
4c0cf59
Update sources/platform/actors/development/continuous_integration.md
mtrunkat Mar 16, 2023
f41765a
Update continuous_integration.md
mtrunkat Mar 16, 2023
d130799
Update sources/platform/actors/development/index.md
mtrunkat Mar 16, 2023
dd7cb12
Apply suggestions from code review
mtrunkat Mar 16, 2023
b54809c
Merge branch 'master' of github.com:apify/apify-docs into feature/doc…
mtrunkat Mar 16, 2023
fd2f80a
Merge branch 'master' into feature/docs-improvements-2
mtrunkat Mar 16, 2023
8706171
text improvements
PerVillalva Mar 16, 2023
a562bcb
Merge branch 'feature/docs-improvements-2' of github.com:apify/apify-…
PerVillalva Mar 16, 2023
f2e9849
fix: broken link
mnmkng Mar 16, 2023
6f7dfd9
fix outdated examples of running programmatically
mnmkng Mar 17, 2023
ed73fed
fix link to outdated content
mnmkng Mar 17, 2023
b53003d
fix some minor things
mnmkng Mar 17, 2023
26c8699
Update sources/platform/actors/running/index.md
mtrunkat Mar 20, 2023
d7f903a
Update sources/platform/index.mdx
mtrunkat Mar 20, 2023
9563120
Update sources/platform/index.mdx
mtrunkat Mar 20, 2023
f4b8c83
Fixing search vs maps
mtrunkat Mar 20, 2023
8ef3015
Fixing search vs maps
mtrunkat Mar 20, 2023
8264282
Merge branch 'master' of github.com:apify/apify-docs into feature/doc…
mtrunkat Mar 20, 2023
4c9ade8
Merge branch 'feature/docs-improvements-2' of github.com:apify/apify-…
mtrunkat Mar 20, 2023
595554a
Linting
mtrunkat Mar 20, 2023
037599a
Lint
mtrunkat Mar 20, 2023
f05ffde
Lint
mtrunkat Mar 20, 2023
9281dc7
Update sources/academy/platform/expert_scraping_with_apify/actors_web…
mtrunkat Mar 21, 2023
36cdf40
Apply suggestions from code review
mtrunkat Mar 21, 2023
a9392b3
Apply suggestions from code review
mtrunkat Mar 21, 2023
dd36bd2
Apply suggestions from code review
mtrunkat Mar 21, 2023
e5da3ca
Update sources/platform/homepage_content.json
mtrunkat Mar 21, 2023
0a5ab7e
Update sources/platform/homepage_content.json
mtrunkat Mar 21, 2023
30655f9
feat: Reorganizing access rights (#543)
mtrunkat Mar 21, 2023
4076e3a
Apply suggestions from code review
mtrunkat Mar 21, 2023
1976c28
Update sources/academy/platform/deploying_your_code/deploying.md
mtrunkat Mar 21, 2023
919dc6f
Merge branch 'feature/docs-improvements-2' of github.com:apify/apify-…
mtrunkat Mar 21, 2023
a2af8ae
Merge pull request #542 from apify/feature/home
mtrunkat Mar 21, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docusaurus.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ module.exports = {
},
}),
],
'@docusaurus/theme-mermaid',
],
presets: /** @type {import('@docusaurus/types').PresetConfig[]} */ ([
[
Expand Down Expand Up @@ -102,6 +103,9 @@ module.exports = {
],
...config.plugins,
],
markdown: {
mermaid: true,
},
themeConfig: config.themeConfig,
staticDirectories: ['apify-docs-theme/static', 'static'],
};
9,188 changes: 3,351 additions & 5,837 deletions package-lock.json

Large diffs are not rendered by default.

3 changes: 3 additions & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -56,9 +56,12 @@
"@docusaurus/plugin-client-redirects": "^2.3.1",
"@docusaurus/preset-classic": "^2.3.1",
"@docusaurus/theme-common": "^2.3.1",
"@docusaurus/theme-mermaid": "^2.3.1",
"@giscus/react": "^2.2.8",
"clsx": "^1.2.1",
"form-data": "^4.0.0",
"prop-types": "^15.8.1",
"proxy-from-env": "^1.1.0",
"raw-loader": "^4.0.2",
"react": "^17.0.2",
"react-dom": "^17.0.2",
Expand Down
4 changes: 2 additions & 2 deletions sources/academy/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ slug: /
displayed_sidebar: courses
hide_table_of_contents: true
---
import AcademyCard from "@site/src/components/AcademyCard";
import Card from "@site/src/components/Card";
import CardGrid from "@site/src/components/CardGrid";
import homepageContent from "./homepage_content.json";

Expand All @@ -20,7 +20,7 @@ Learn everything about web scraping and automation with our free courses that wi
<CardGrid>
{
sections.map((section) =>
<AcademyCard
<Card
title={section.title}
desc={section.description}
imageUrl={section.imageUrl}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -67,4 +67,4 @@ The next step is to test your actor and experiment with the vast amount of featu

## Wrap up {#next}

That's it! In this short section, you've learned how to take your code written in any programming language and turn it into a usable actor that can run on the Apify platform! The next step is to start looking into the [paid actors](/platform/actors/paid-actors) program, which allows you to monetize your work.
That's it! In this short section, you've learned how to take your code written in any programming language and turn it into a usable Actor that can run on the Apify platform! The next step is to start looking into the [paid Actors](/platform/actors/publishing) program, which allows you to monetize your work.
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ Prior to moving forward, please read over these resources:

- Read about [running actors, handling actor inputs, memory and CPU](/platform/actors/running).
- Learn about [actor webhooks](/platform/integrations/webhooks), which we will implement in the next lesson.
- Learn [how to run actors](/platform/tutorials/run-actor-and-retrieve-data-via-api#run-an-actor-or-task) using Apify's REST API.
- Learn [how to run Actors](/academy/api/run-actor-and-retrieve-data-via-api) using Apify's REST API.

## Knowledge check 📝 {#quiz}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ Now we're done, and we can push it up to the Apify platform with the `apify push

## Setting up the webhook {#setting-up-the-webhook}

Since we'll be calling the actor via the [Apify API](/platform/tutorials/run-actor-and-retrieve-data-via-api#run-an-actor-or-task), we'll need to grab hold of the ID of the actor we just created and pushed to the platform. The ID is always accessible through the **Settings** page of the actor.
Since we'll be calling the Actor via the [Apify API](/academy/api/run-actor-and-retrieve-data-via-api), we'll need to grab hold of the ID of the Actor we just created and pushed to the platform. The ID is always accessible through the **Settings** page of the actor.

![Actor ID in actor settings](./images/actor-settings.jpg)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Storage allows us to save persistent data for further processing. As you'll lear

## Learning 🧠 {#learning}

- Check out [the docs about actor tasks](/platform/actors/tasks).
- Check out [the docs about actor tasks](/platform/actors/running/tasks).
- Read about the [two main storage options](/platform/storage#dataset) on the Apify platform.
- Understand the [crucial differences between named and unnamed storages](/platform/storage#named-and-unnamed-storages).
- Learn about the [`Dataset`](/sdk/js/reference/class/Dataset) and [`KeyValueStore`](/sdk/js/reference/class/KeyValueStore) objects in the Apify SDK.
Expand Down
19 changes: 19 additions & 0 deletions sources/academy/tutorials/api/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
title: API tutorials
description: A collection of various tutorials explaining how to interact with the Apify platform programmatically using its API.
sidebar_position: 20
category: tutorials
slug: /api
---

# API Tutorials 💻📚

**A collection of various tutorials explaining how to interact with the Apify platform programmatically using its API.**

---

This section explains how you can run [Apify Actors](/platform/actors) using Apify's [API](/api/v2), retrieve their results, and integrate them into your own product and workflows. You can do this using a raw HTTP client, or you can benefit from using one of our API clients for:

- [JavaScript](/api/client/js/)
- [Python](/api/client/python)

Original file line number Diff line number Diff line change
@@ -1,17 +1,15 @@
---
title: Run actor and retrieve data via API
description: Learn how to run an actor/task via the Apify API, wait for the job to finish, and retrieve its output data. Your key to integrating actors with your projects.
title: Run Actor and retrieve data via API
description: Learn how to run an Actor/task via the Apify API, wait for the job to finish, and retrieve its output data. Your key to integrating Actors with your projects.
sidebar_position: 6
slug: /tutorials/run-actor-and-retrieve-data-via-api
slug: /api/run-actor-and-retrieve-data-via-api
---

# Run an actor or task and retrieve data via API

**Learn how to run an actor/task via the Apify API, wait for the job to finish, and retrieve its output data. Your key to integrating actors with your projects.**
**Learn how to run an Actor/task via the Apify API, wait for the job to finish, and retrieve its output data. Your key to integrating Actors with your projects.**

---

The most popular way of [integrating](https://help.apify.com/en/collections/1669767-integrating-with-apify) the Apify platform with an external project/application is by programmatically running an [actor](../actors/index.md) or [task](../actors/tasks.md), waiting for it to complete its run, then collecting its data and using it within the project. Though this process sounds somewhat complicated, it's actually quite easy to do; however, due to the plethora of features offered on the Apify platform, new users may not be sure how exactly to implement this type of integration. So, let's dive in and see how you can do it.
The most popular way of [integrating](https://help.apify.com/en/collections/1669767-integrating-with-apify) the Apify platform with an external project/application is by programmatically running an [Actor](/platform/actors) or [task](/platform/actors/running/tasks), waiting for it to complete its run, then collecting its data and using it within the project. Though this process sounds somewhat complicated, it's actually quite easy to do; however, due to the plethora of features offered on the Apify platform, new users may not be sure how exactly to implement this type of integration. So, let's dive in and see how you can do it.

> Remember to check out our [API documentation](/api/v2) with examples in different languages and a live API console. We also recommend testing the API with a nice desktop client like [Postman](https://www.getpostman.com/) or [Insomnia](https://insomnia.rest).

Expand All @@ -22,17 +20,17 @@ There are 2 main ways of using the Apify API:

If the actor being run via API takes 5 minutes or less to complete a typical run, it should be called **synchronously**. Otherwise, (if a typical run takes longer than 5 minutes), it should be called **asynchronously**.

## Run an actor or task {#run-an-actor-or-task}
## Run an Actor or task {#run-an-actor-or-task}

> If you are unsure about the differences between an actor and task, you can read about them in the [tasks](../actors/tasks.md) documentation. In brief, tasks are just pre-configured inputs for actors.
> If you are unsure about the differences between an Actor and a task, you can read about them in the [tasks](/platform/actors/running/tasks) documentation. In brief, tasks are just pre-configured inputs for Actors.

The API endpoints and usage (for both sync and async) for [actors](/api/v2#/reference/actors/run-collection/run-actor) and [tasks](/api/v2#/reference/actor-tasks/run-collection/run-task) are essentially the same.
The API endpoints and usage (for both sync and async) for [Actors](/api/v2#/reference/actors/run-collection/run-actor) and [tasks](/api/v2#/reference/actor-tasks/run-collection/run-task) are essentially the same.

To run, or **call**, an actor/task, you will need a few things:
To run, or **call**, an Actor/task, you will need a few things:

- The name or ID of the actor/task. The name looks like `username~actorName` or `username~taskName`. The ID can be retrieved on the **Settings** page of the actor/task.
- The name or ID of the Actor/task. The name looks like `username~actorName` or `username~taskName`. The ID can be retrieved on the **Settings** page of the Actor/task.

- Your [API token](../integrations/index.md), which you can find on the **Integrations** page in the [Apify Console](https://console.apify.com/account?tab=integrations) (make sure it does not get leaked anywhere!).
- Your [API token](/platform/integrations), which you can find on the **Integrations** page in [Apify Console](https://console.apify.com/account?tab=integrations) (do not share it with anyone!).

- Possibly an input, which is passed in JSON format as the request's **body**.

Expand Down Expand Up @@ -60,7 +58,7 @@ We can also add settings for the actor (which will override the default settings
https://api.apify.com/v2/acts/ACTOR_NAME_OR_ID/runs?token=YOUR_TOKEN&memory=8192&build=beta
```

This works nearly identically for both actors and tasks; however, for tasks there is no reason to specify a [`build`](../actors/development/builds.md) parameter, as a task already has only one specific actor build which cannot be changed with query parameters.
This works in almost exactly the same way for both Actors and tasks; however, for tasks, there is no reason to specify a [`build`](/platform/actors/development/builds) parameter, as a task already has only one specific Actor build which cannot be changed with query parameters.

### Input JSON {#input-json}

Expand Down Expand Up @@ -94,7 +92,7 @@ If your synchronous run exceeds the 5-minute time limit, the response will be a

### Synchronous runs with dataset output {#synchronous-runs-with-dataset-output}

Most actor runs will store their data in the default [dataset](../storage/dataset.md). The Apify API provides **run-sync-get-dataset-items** endpoints for [actors](/api/v2#/reference/actors/run-actor-synchronously-and-get-dataset-items/run-actor-synchronously-with-input-and-get-dataset-items) and [tasks](/api/v2#/reference/actor-tasks/run-task-synchronously-and-get-dataset-items/run-task-synchronously-and-get-dataset-items-(post)), which allow you to run an actor and receive the items from the default dataset once the run has completed.
Most Actor runs will store their data in the default [dataset](/platform/storage/dataset). The Apify API provides **run-sync-get-dataset-items** endpoints for [actors](/api/v2#/reference/actors/run-actor-synchronously-and-get-dataset-items/run-actor-synchronously-with-input-and-get-dataset-items) and [tasks](/api/v2#/reference/actor-tasks/run-task-synchronously-and-get-dataset-items/run-task-synchronously-and-get-dataset-items-(post)), which allow you to run an Actor and receive the items from the default dataset once the run has finished.

Here is a simple Node.js example of calling a task via the API and logging the dataset items to the console:

Expand Down Expand Up @@ -131,7 +129,7 @@ items.forEach((item) => {

### Synchronous runs with key-value store output {#synchronous-runs-with-key-value-store-output}

[Key-value stores](../storage/key_value_store.md) are useful for storing files like images, HTML snapshots, or JSON data. The Apify API provides **run-sync** endpoints for [actors](/api/v2#/reference/actors/run-actor-synchronously/with-input) and [tasks](/api/v2#/reference/actor-tasks/run-task-synchronously/run-task-synchronously), which allow you to run a specific task and receive the output. By default, they return the `OUTPUT` record from the default key-value store.
[Key-value stores](/platform/storage/key-value-store) are useful for storing files like images, HTML snapshots, or JSON data. The Apify API provides **run-sync** endpoints for [actors](/api/v2#/reference/actors/run-actor-synchronously/with-input) and [tasks](/api/v2#/reference/actor-tasks/run-task-synchronously/run-task-synchronously), which allow you to run a specific task and receive the output. By default, they return the `OUTPUT` record from the default key-value store.

> For more detailed information, check the [API reference](/api/v2#/reference/actors/run-actor-synchronously-and-get-dataset-items/run-actor-synchronously-with-input-and-get-dataset-items).

Expand Down Expand Up @@ -165,13 +163,13 @@ Once again, the final response will be the **run info object**; however, now its

#### Webhooks {#webhooks}

If you have a server, [webhooks](../integrations/webhooks/index.md) are the most elegant and flexible solution for integrations with Apify. You can simply set up a webhook for any actor or task, and that webhook will send a POST request to your server after an [event](../integrations/webhooks/events.md) has occurred.
If you have a server, [webhooks](/platform/integrations/webhooks) are the most elegant and flexible solution for integrations with Apify. You can simply set up a webhook for any Actor or task, and that webhook will send a POST request to your server after an [event](/platform/integrations/webhooks/events) has occurred.

Usually, this event is a successfully finished run, but you can also set a different webhook for failed runs, etc.

![Webhook example](./images/webhook.png)

The webhook will send you a pretty complicated [JSON object](../integrations/webhooks/actions.md), but usually you are only interested in the `resource` object within the response, which is essentially just the **run info** JSON from the previous sections. We can leave the payload template as is as for our example use case, since it is what we need.
The webhook will send you a pretty complicated [JSON object](/platform/integrations/webhooks/actions), but usually, you would only be interested in the `resource` object within the response, which is essentially just the **run info** JSON from the previous sections. We can leave the payload template as is for our example since it is all we need.

Once your server receives this request from the webhook, you know that the event happened, and you can ask for the complete data.

Expand All @@ -195,7 +193,7 @@ Once a status of `SUCCEEDED` or `FAILED` has been received, we know the run has

Unless you used the [synchronous call](#synchronous-flow) mentioned above, you will have to make one additional request to the API to retrieve the data.

The **run info** JSON also contains the IDs of the default [dataset](../storage/dataset.md) and [key-value store](../storage/key_value_store.md) that are allocated separately for each run, which is usually everything you need. The fields are called `defaultDatasetId` and `defaultKeyValueStoreId`.
The **run info** JSON also contains the IDs of the default [dataset](/platform/storage/dataset) and [key-value store](/platform/storage/key-value-store) that are allocated separately for each run, which is usually everything you need. The fields are called `defaultDatasetId` and `defaultKeyValueStoreId`.

#### Retrieving a dataset {#retrieve-a-dataset}

Expand All @@ -219,7 +217,7 @@ https://api.apify.com/v2/datasets/DATASET_ID/items?format=csv&offset=250000

#### Retrieving a key-value store {#retrieve-a-key-value-store}

> [Key-value stores](../storage/key_value_store.md) are mainly useful if you have a single output or any kind of files that cannot be [stringified](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify) (such as images or PDFs).
> [Key-value stores](/platform/storage/key-value-store) are mainly useful if you have a single output or any kind of files that cannot be [stringified](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify) (such as images or PDFs).

When you want to retrieve something from a key-value store, the `defaultKeyValueStoreId` is _not_ enough. You also need to know the name (or **key**) of the record you want to retrieve.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ This takes you to the **Input and options** tab of the task configuration. Befor

Scroll down to the **Performance and limits** section and set the **Max pages per run** option to **10**. This tells your task to finish after 10 pages have been visited. We don't need to crawl the whole domain just to see that the actor works.

> This also helps with keeping your [compute unit](/platform/actors/running/compute-units) (CU) consumption low. Just to get an idea, our free plan includes 10 CUs and this run will consume about 0.04 CU, so you can run it 250 times a month for free. If you accidentally go over the limit, no worries, we won't charge you for it. You just won't be able to run more tasks that month.
> This also helps with keeping your [compute unit](/platform/actors/running/usage-and-resources) (CU) consumption low. Just to get an idea, our free plan includes 10 CUs and this run will consume about 0.04 CU, so you can run it 250 times a month for free. If you accidentally go over the limit, no worries, we won't charge you for it. You just won't be able to run more tasks that month.

Now click **Save & Run**! *(in the bottom-left part of your screen)*

Expand Down
3 changes: 1 addition & 2 deletions sources/academy/tutorials/apify_scrapers/index.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: Apify scrapers
description: Discover Apify's ready-made web scraping and automation tools. Compare Web Scraper, Cheerio Scraper and Puppeteer Scraper to decide which is right for you.
sidebar_position: 3.2
sidebar_position: 13.2
slug: /apify-scrapers
---

Expand Down Expand Up @@ -42,4 +42,3 @@ Puppeteer Scraper is the most powerful scraper tool in our arsenal (aside from d
Puppeteer is a Node.js library, so knowledge of Node.js and its paradigms is expected when working with Puppeteer Scraper.

[Visit the Puppeteer Scraper tutorial to get started!](./puppeteer_scraper.md)

Loading