Skip to content

HttpClient without initial proxy auth limits web scraping capability #100515

@ksmib

Description

@ksmib

Description

When using HttpClient with a proxy, it defaults to omitting proxy authentication in its initial CONNECT request, sending the credentials only after receiving a 407 HTTP status code from the proxy.

Unlike with server authentication, where SocketHttpHandler.PreAuthenticate allows for sending credentials preemptively, there is no equivalent option for proxy authentication.

This behavior limits its suitability for web scraping tasks. Every major proxy providers are using backconnecting proxy which means clients are connecting to same address even for different proxies

These proxy providers will deny client's new connection temporarily after triggering dozens of "407 Proxy Authentication Required" errors in a short time.

For those unfamiliar with the underlying issue, it will appear as though majority of their requests are failing, resulting in an HttpRequestException.

The lack of a preemptive proxy authentication feature makes .NET unsuitable for web scraping.

In contrast, curl and Python's urllib send proxy authentication credentials on the initial CONNECT request by default.

I tried manually adding a "Proxy-Authorization" header but it helps only with HTTP requests but not with HTTPS requests (which are most of the cases) since CONNECT request headers are independent.

The only workaround I could think of is that similar to SocketHttpHandler.PreAuthenticate for preemptive server authenication, adding a PreAuthenticateProxy to add "Proxy-Authorization" header on initial CONNECT request.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions