Skip to content

Conversation

@fralken
Copy link
Collaborator

@fralken fralken commented Oct 21, 2025

PAC files can only be loaded from local file system. This PR implements the download of a PAC file from an http server.
Now it is possible to define, e.g., PAC http://intra.example.com/proxy.pac in the cntlm.conf.
The file is downloaded just before the first connection request.

@jschwartzenberg
Copy link
Collaborator

Be aware that it can be quite tricky to do this securely, especially over HTTP as this connection could be intercepted, DNS could be poisoned, etc.

@fralken
Copy link
Collaborator Author

fralken commented Oct 23, 2025

You are right, but consider that if you need this tool you are probably in a corporate network behind a proxy, and the pac file would be served by a corporate server via an unsecure connection. The dns would be internal, too.

@jschwartzenberg
Copy link
Collaborator

Yes, I know it's a common pattern, but that doesn't make it necessarily a good idea. Maybe you could check if https is supported in your environment? That used to be an issue where I'm using cntlm, but that was fixed.

@fralken
Copy link
Collaborator Author

fralken commented Oct 24, 2025

Implementing https support would add a lot of complexity and also a dependency to openssl, just for a single request during startup. In case there are security concerns, there's still the choice of downloading the pac file in advance and pass it to cntlm from the local file system.

Indeed adding ssl support would be an interesting challange that with the help of copilot would be manageable in a reasonable amount of time.

My proposal is to provide support to http connections as a first implementation, and then extend to support https connections.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements the ability to fetch PAC (Proxy Auto-Configuration) files from HTTP URLs, expanding beyond the previous local-filesystem-only support. The implementation includes lazy fetching on first connection, new HTTP client functionality, extensive code style modernization, and comprehensive documentation updates.

Key changes:

  • Added fetch_url() and http_read_body() functions to enable HTTP GET requests for retrieving remote PAC files
  • Implemented lazy PAC fetching that defers HTTP downloads until the first incoming connection
  • Modernized C code style across multiple files (variable declarations in loop initializers, explicit casts, removed goto statements)

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 18 comments.

Show a summary per file
File Description
http.c Added new HTTP client functions fetch_url() and http_read_body() to support fetching PAC files from remote URLs; includes chunked transfer encoding support
http.h Declared new fetch_url() function
main.c Implemented lazy PAC fetching logic that downloads remote PACs on first connection; moved syslog configuration earlier; refactored password hashing conditionals
pac.c Added check for empty PAC files before allocation
doc/cntlm.conf Updated documentation to explain both local file and HTTP URL PAC configurations
doc/cntlm.1 Updated man page with detailed PAC file specification including HTTP URL support
utils.c Modernized code style: C99 loop variable declarations, explicit casts, improved formatting
socket.c Code style improvements: loop variable declarations, explicit casts, removed commented code
scanner.c Code style improvements: variable type changes, explicit casts, formatting
proxy.c Added explicit cast for file descriptor
ntlm.c Code style improvements: removed commented code, explicit casts for uint16_t
xcrypt.c Code style improvements: removed trailing semicolons from macros, modernized variable declarations
forward.c Refactored error handling to remove goto statements; improved control flow clarity
direct.c Refactored error handling to remove goto statements; improved memory management
config.c Code style improvements: variable type changes, explicit casts
config.h Added const-qualified config type
Makefile Changed CFLAGS assignment for macOS to allow appending

c = strlen(pos);
for (i = 0; i < c && pos[i] != '"'; ++i) {};
c = (int)strlen(pos);
for (i = 0; i < c && pos[i] != '"'; ++i);
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The empty for-loop body is now on a separate line with just a semicolon. While this is technically correct, it would be clearer to either add a comment explaining the intentionally empty loop body or use an empty brace block {}. This makes it more obvious that the empty body is intentional and not a missing implementation.

Copilot uses AI. Check for mistakes.
Comment on lines +920 to +922
/* Build Host header - include port for non-standard ports */
if (port != 80) {
size_t host_len = strlen(host) + 7; /* host + ":65535" + null */
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The calculation for host_len assumes a maximum port of 5 digits (65535), but the comment states "host + ':65535' + null" which would be 6 characters for the port number plus 1 for the colon plus 1 for null. The calculation strlen(host) + 7 is correct (host length + ':' + 5 digits + null), but the comment is misleading as it suggests counting the literal string ":65535" which is 6 characters, not 7. Consider updating the comment to clarify: "host + ':' + up to 5 digits + null".

Copilot uses AI. Check for mistakes.
Comment on lines +728 to +755
// no body
*outbuf = zmalloc(1);
*outlen = 0;
return 1;
}

if (hlist_subcmp(response->headers, "Transfer-Encoding", "chunked")) {
/* read chunked body by reading lines and chunks from fd */
int bsize = BUFSIZE;
char *line = zmalloc(bsize);
char *err = NULL;
long csize;
do {
int r = so_recvln(fd, &line, &bsize);
if (r <= 0) {
free(line);
free(buf);
return 0;
}
trimr(line);
csize = strtol(line, &err, 16);
if (!isspace((u_char)*err) && *err != ';' && *err != '\0') {
free(line);
free(buf);
return 0;
}
if (csize > 0) {
// ensure capacity
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments use C++ style // instead of the C-style /* */ comments used throughout the rest of the codebase. For consistency with the project's coding standards, these should be converted to C-style comments.

Copilot uses AI. Check for mistakes.
Comment on lines +766 to +797
// read csize bytes
length_t need = csize;
while (need > 0) {
size_t toread = (need > BLOCK ? BLOCK : (size_t)need);
ssize_t got = read(fd, buf + filled, toread);
if (got <= 0) {
free(line);
free(buf);
return 0;
}
filled += got;
need -= got;
}
// read and discard CRLF after chunk
char crlf[2];
if (read(fd, crlf, 2) != 2) {
free(line);
free(buf);
return 0;
}
}
} while (csize != 0);

// read possible trailer headers until empty line
do {
int r = so_recvln(fd, &line, &bsize);
if (r <= 0) {
free(line);
free(buf);
return 0;
}
} while (line[0] != '\r' && line[0] != '\n');
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments use C++ style // instead of the C-style /* */ comments used throughout the rest of the codebase. For consistency with the project's coding standards, these should be converted to C-style comments.

Copilot uses AI. Check for mistakes.
Comment on lines +848 to +898
// finalize buffer
if (buf == NULL) {
buf = zmalloc(1);
filled = 0;
}

*outbuf = buf;
*outlen = filled;

return 1;
}

/*
* Minimal HTTP GET fetcher to retrieve a file from URL into memory.
* Supports only http scheme (no HTTPS) and basic parsing of host:port/path.
* The function allocates *outbuf and sets *outlen. Caller must free(*outbuf).
* If outcode != NULL, the HTTP response status code will be stored there
* (or -1 on protocol/error).
* Returns 1 on successful fetch+read, 0 on error.
*/
int fetch_url(const char *url, char **outbuf, size_t *outlen, int *outcode) {
if (!url || !outbuf || !outlen)
return 0;

if (outcode)
*outcode = -1;

// Basic parse: expect http://host[:port]/path
const char *p = url;
if (strncasecmp(p, "http://", 7) == 0)
p += 7;
else
return 0; // only http supported for now

char *host = NULL;
int port = 80;
const char *path = strchr(p, '/');
if (path) {
host = substr(p, 0, (int)(path - p));
} else {
host = strdup(p);
path = "/";
}

// split port
char *colon = strchr(host, ':');
if (colon) {
*colon = '\0';
port = atoi(colon+1);
if (port == 0) port = 80;
}
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments use C++ style // instead of the C-style /* */ comments used throughout the rest of the codebase. For consistency with the project's coding standards, these should be converted to C-style comments.

Copilot uses AI. Check for mistakes.
Comment on lines +756 to +757
if (filled + csize > alloc) {
alloc = (filled + csize) * 2;
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When reallocating the buffer for chunked transfer, there's no check for integer overflow in the calculation alloc = (filled + csize) * 2. If filled + csize is large enough, multiplying by 2 could overflow, leading to a smaller allocation than expected and potential buffer overflow. Consider checking for overflow before the multiplication or using a safer allocation strategy with a maximum size check.

Copilot uses AI. Check for mistakes.
Comment on lines +291 to +296
if (buf && pac_init() && pac_parse_string(buf)) {
syslog(LOG_DEBUG, "Fetched PAC from %s\n", pac_file);
} else {
syslog(LOG_ERR, "Failed to parse PAC fetched from %s\n", pac_file);
myexit(1);
}
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When pac_init or pac_parse_string fails, the function calls myexit(1), but there's a potential issue with the logical evaluation. The condition buf && pac_init() && pac_parse_string(buf) will only call myexit if buf is non-NULL but initialization or parsing fails. However, if buf is NULL (which shouldn't happen given the http_code == 200 check, but is theoretically possible), the function would skip the error handling and proceed to free(buf) and return 1, indicating success when it actually failed. Consider restructuring the condition to handle all failure cases explicitly.

Copilot uses AI. Check for mistakes.
assert(buf != NULL);

for (i = strlen(buf)-1; i >= 0 && isspace((u_char)buf[i]); --i) {};
for (i = strlen(buf)-1; i >= 0 && isspace((u_char)buf[i]); --i);
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The empty for-loop body is now on a separate line (line 690) with just a semicolon. While this is technically correct, it would be clearer to either add a comment explaining the intentionally empty loop body or use an empty brace block {}. This makes it more obvious that the empty body is intentional and not a missing implementation.

Copilot uses AI. Check for mistakes.
Comment on lines +48 to +57
# - Remote URL: an HTTP URL (e.g. http://example.com/proxy.pac). The PAC
# will be fetched lazily on the first incoming connection to avoid
# blocking startup on network access. Only http:// (not https://) is
# supported for remote PAC fetching. After a successful fetch and parse
# the PAC is used for proxy selection. Fetch or parse failures are logged
# and may cause termination depending on configuration.
#
# Examples:
#Pac /etc/proxy.pac
#Pac http://intra.example.com/proxy.pac
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new documentation encourages configuring a remote PAC using http:// URLs, but PAC files are fetched over plain HTTP without TLS or integrity checks. Because the PAC script controls which upstream proxies are used for all traffic, any attacker able to perform DNS spoofing or MitM on that HTTP connection can serve a malicious PAC and silently redirect client traffic through an attacker-controlled proxy. The configuration examples and notes should strongly warn about this risk and, where possible, require or prefer authenticated/HTTPS delivery (or another integrity-protected mechanism) for PAC files rather than unauthenticated HTTP.

Copilot uses AI. Check for mistakes.
Comment on lines +385 to +389
Remote file: an HTTP URL (for example http://example.com/proxy.pac). The PAC
is fetched lazily on the first incoming connection to avoid blocking startup;
after a successful fetch and parse the PAC is used for parent-proxy selection.
Only plain HTTP (http://) is supported for remote PAC fetching (HTTPS is not
supported).
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The manpage now documents support for remote PAC files via http:// URLs and explicitly states that only plain HTTP is supported for PAC fetching. Since PAC contents are executed to decide which upstream proxies handle client traffic, loading them over unauthenticated HTTP allows any on-path attacker (e.g., via DNS/ARP spoofing) to tamper with the PAC and silently redirect all proxied traffic through an attacker-controlled proxy. The option description should highlight this as a high-risk configuration and, where feasible, favor HTTPS or another integrity-protected distribution mechanism for PAC files instead of bare HTTP.

Copilot uses AI. Check for mistakes.
@fralken
Copy link
Collaborator Author

fralken commented Dec 16, 2025

I'm implementing https support, and it is already working for macOS, but it is still wip for Windows and Linux. That's why it is not yet ready for this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants