Improve the deduplication of requests

### Context

A while ago, Honza Javorek raised some good points regarding the deduplication process in the request queue ([#190](https://github.com/apify/apify-sdk-python/issues/190)).

The first one:

> Is it possible that Apify's request queue dedupes the requests only based on the URL? Because the POSTs all have the same URL, just different payload. Which should be very common - by definition of what POST is, or even in practical terms with all the GraphQL APIs around.

In response, we improved the unique key generation logic in the Python SDK ([PR #193](https://github.com/apify/apify-sdk-python/pull/193)) to align with the TS Crawlee. This logic was lates copied to `crawlee-python` and can be found in [crawlee/_utils/requests.py](https://github.com/apify/crawlee-python/blob/v0.0.4/src/crawlee/_utils/requests.py).

The second one:

> Also wondering whether two identical requests with one different HTTP header should be considered same or different. Even with a simple GET request, I could make one with Accept-Language: cs, another with Accept-Language: en, and I can get two wildly different responses from the same server.

Currently, HTTP headers are not considered in the computation of unique keys. Additionally, we do not offer an option to explicitly bypass request deduplication, unlike the `dont_filter` option in Scrapy ([docs](https://docs.scrapy.org/en/latest/topics/request-response.html)).

### Questions

- Should we include HTTP headers in the `unique_key` and `extended_unique_key` computation?
  - Yes.
- Should we implement a `dont_filter` feature?
  - It will be just a syntax sugar appending some random string to a unique key. 
  - Also come up with a better name (e.g. `always_enqueue`)?
- Should `use_extended_unique_key` be set as the default behavior?
  - Probably not now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve the deduplication of requests #178

Context

Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve the deduplication of requests #178

Description

Context

Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions