-
Notifications
You must be signed in to change notification settings - Fork 40
Fetch PAC files from http url #140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
c6da4d0 to
444052e
Compare
444052e to
7381248
Compare
|
Be aware that it can be quite tricky to do this securely, especially over HTTP as this connection could be intercepted, DNS could be poisoned, etc. |
|
You are right, but consider that if you need this tool you are probably in a corporate network behind a proxy, and the pac file would be served by a corporate server via an unsecure connection. The dns would be internal, too. |
|
Yes, I know it's a common pattern, but that doesn't make it necessarily a good idea. Maybe you could check if https is supported in your environment? That used to be an issue where I'm using cntlm, but that was fixed. |
|
Implementing https support would add a lot of complexity and also a dependency to openssl, just for a single request during startup. In case there are security concerns, there's still the choice of downloading the pac file in advance and pass it to cntlm from the local file system. Indeed adding ssl support would be an interesting challange that with the help of copilot would be manageable in a reasonable amount of time. My proposal is to provide support to http connections as a first implementation, and then extend to support https connections. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements the ability to fetch PAC (Proxy Auto-Configuration) files from HTTP URLs, expanding beyond the previous local-filesystem-only support. The implementation includes lazy fetching on first connection, new HTTP client functionality, extensive code style modernization, and comprehensive documentation updates.
Key changes:
- Added
fetch_url()andhttp_read_body()functions to enable HTTP GET requests for retrieving remote PAC files - Implemented lazy PAC fetching that defers HTTP downloads until the first incoming connection
- Modernized C code style across multiple files (variable declarations in loop initializers, explicit casts, removed goto statements)
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 18 comments.
Show a summary per file
| File | Description |
|---|---|
| http.c | Added new HTTP client functions fetch_url() and http_read_body() to support fetching PAC files from remote URLs; includes chunked transfer encoding support |
| http.h | Declared new fetch_url() function |
| main.c | Implemented lazy PAC fetching logic that downloads remote PACs on first connection; moved syslog configuration earlier; refactored password hashing conditionals |
| pac.c | Added check for empty PAC files before allocation |
| doc/cntlm.conf | Updated documentation to explain both local file and HTTP URL PAC configurations |
| doc/cntlm.1 | Updated man page with detailed PAC file specification including HTTP URL support |
| utils.c | Modernized code style: C99 loop variable declarations, explicit casts, improved formatting |
| socket.c | Code style improvements: loop variable declarations, explicit casts, removed commented code |
| scanner.c | Code style improvements: variable type changes, explicit casts, formatting |
| proxy.c | Added explicit cast for file descriptor |
| ntlm.c | Code style improvements: removed commented code, explicit casts for uint16_t |
| xcrypt.c | Code style improvements: removed trailing semicolons from macros, modernized variable declarations |
| forward.c | Refactored error handling to remove goto statements; improved control flow clarity |
| direct.c | Refactored error handling to remove goto statements; improved memory management |
| config.c | Code style improvements: variable type changes, explicit casts |
| config.h | Added const-qualified config type |
| Makefile | Changed CFLAGS assignment for macOS to allow appending |
| c = strlen(pos); | ||
| for (i = 0; i < c && pos[i] != '"'; ++i) {}; | ||
| c = (int)strlen(pos); | ||
| for (i = 0; i < c && pos[i] != '"'; ++i); |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The empty for-loop body is now on a separate line with just a semicolon. While this is technically correct, it would be clearer to either add a comment explaining the intentionally empty loop body or use an empty brace block {}. This makes it more obvious that the empty body is intentional and not a missing implementation.
| /* Build Host header - include port for non-standard ports */ | ||
| if (port != 80) { | ||
| size_t host_len = strlen(host) + 7; /* host + ":65535" + null */ |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The calculation for host_len assumes a maximum port of 5 digits (65535), but the comment states "host + ':65535' + null" which would be 6 characters for the port number plus 1 for the colon plus 1 for null. The calculation strlen(host) + 7 is correct (host length + ':' + 5 digits + null), but the comment is misleading as it suggests counting the literal string ":65535" which is 6 characters, not 7. Consider updating the comment to clarify: "host + ':' + up to 5 digits + null".
| // no body | ||
| *outbuf = zmalloc(1); | ||
| *outlen = 0; | ||
| return 1; | ||
| } | ||
|
|
||
| if (hlist_subcmp(response->headers, "Transfer-Encoding", "chunked")) { | ||
| /* read chunked body by reading lines and chunks from fd */ | ||
| int bsize = BUFSIZE; | ||
| char *line = zmalloc(bsize); | ||
| char *err = NULL; | ||
| long csize; | ||
| do { | ||
| int r = so_recvln(fd, &line, &bsize); | ||
| if (r <= 0) { | ||
| free(line); | ||
| free(buf); | ||
| return 0; | ||
| } | ||
| trimr(line); | ||
| csize = strtol(line, &err, 16); | ||
| if (!isspace((u_char)*err) && *err != ';' && *err != '\0') { | ||
| free(line); | ||
| free(buf); | ||
| return 0; | ||
| } | ||
| if (csize > 0) { | ||
| // ensure capacity |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comments use C++ style // instead of the C-style /* */ comments used throughout the rest of the codebase. For consistency with the project's coding standards, these should be converted to C-style comments.
| // read csize bytes | ||
| length_t need = csize; | ||
| while (need > 0) { | ||
| size_t toread = (need > BLOCK ? BLOCK : (size_t)need); | ||
| ssize_t got = read(fd, buf + filled, toread); | ||
| if (got <= 0) { | ||
| free(line); | ||
| free(buf); | ||
| return 0; | ||
| } | ||
| filled += got; | ||
| need -= got; | ||
| } | ||
| // read and discard CRLF after chunk | ||
| char crlf[2]; | ||
| if (read(fd, crlf, 2) != 2) { | ||
| free(line); | ||
| free(buf); | ||
| return 0; | ||
| } | ||
| } | ||
| } while (csize != 0); | ||
|
|
||
| // read possible trailer headers until empty line | ||
| do { | ||
| int r = so_recvln(fd, &line, &bsize); | ||
| if (r <= 0) { | ||
| free(line); | ||
| free(buf); | ||
| return 0; | ||
| } | ||
| } while (line[0] != '\r' && line[0] != '\n'); |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comments use C++ style // instead of the C-style /* */ comments used throughout the rest of the codebase. For consistency with the project's coding standards, these should be converted to C-style comments.
| // finalize buffer | ||
| if (buf == NULL) { | ||
| buf = zmalloc(1); | ||
| filled = 0; | ||
| } | ||
|
|
||
| *outbuf = buf; | ||
| *outlen = filled; | ||
|
|
||
| return 1; | ||
| } | ||
|
|
||
| /* | ||
| * Minimal HTTP GET fetcher to retrieve a file from URL into memory. | ||
| * Supports only http scheme (no HTTPS) and basic parsing of host:port/path. | ||
| * The function allocates *outbuf and sets *outlen. Caller must free(*outbuf). | ||
| * If outcode != NULL, the HTTP response status code will be stored there | ||
| * (or -1 on protocol/error). | ||
| * Returns 1 on successful fetch+read, 0 on error. | ||
| */ | ||
| int fetch_url(const char *url, char **outbuf, size_t *outlen, int *outcode) { | ||
| if (!url || !outbuf || !outlen) | ||
| return 0; | ||
|
|
||
| if (outcode) | ||
| *outcode = -1; | ||
|
|
||
| // Basic parse: expect http://host[:port]/path | ||
| const char *p = url; | ||
| if (strncasecmp(p, "http://", 7) == 0) | ||
| p += 7; | ||
| else | ||
| return 0; // only http supported for now | ||
|
|
||
| char *host = NULL; | ||
| int port = 80; | ||
| const char *path = strchr(p, '/'); | ||
| if (path) { | ||
| host = substr(p, 0, (int)(path - p)); | ||
| } else { | ||
| host = strdup(p); | ||
| path = "/"; | ||
| } | ||
|
|
||
| // split port | ||
| char *colon = strchr(host, ':'); | ||
| if (colon) { | ||
| *colon = '\0'; | ||
| port = atoi(colon+1); | ||
| if (port == 0) port = 80; | ||
| } |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comments use C++ style // instead of the C-style /* */ comments used throughout the rest of the codebase. For consistency with the project's coding standards, these should be converted to C-style comments.
| if (filled + csize > alloc) { | ||
| alloc = (filled + csize) * 2; |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When reallocating the buffer for chunked transfer, there's no check for integer overflow in the calculation alloc = (filled + csize) * 2. If filled + csize is large enough, multiplying by 2 could overflow, leading to a smaller allocation than expected and potential buffer overflow. Consider checking for overflow before the multiplication or using a safer allocation strategy with a maximum size check.
| if (buf && pac_init() && pac_parse_string(buf)) { | ||
| syslog(LOG_DEBUG, "Fetched PAC from %s\n", pac_file); | ||
| } else { | ||
| syslog(LOG_ERR, "Failed to parse PAC fetched from %s\n", pac_file); | ||
| myexit(1); | ||
| } |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When pac_init or pac_parse_string fails, the function calls myexit(1), but there's a potential issue with the logical evaluation. The condition buf && pac_init() && pac_parse_string(buf) will only call myexit if buf is non-NULL but initialization or parsing fails. However, if buf is NULL (which shouldn't happen given the http_code == 200 check, but is theoretically possible), the function would skip the error handling and proceed to free(buf) and return 1, indicating success when it actually failed. Consider restructuring the condition to handle all failure cases explicitly.
| assert(buf != NULL); | ||
|
|
||
| for (i = strlen(buf)-1; i >= 0 && isspace((u_char)buf[i]); --i) {}; | ||
| for (i = strlen(buf)-1; i >= 0 && isspace((u_char)buf[i]); --i); |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The empty for-loop body is now on a separate line (line 690) with just a semicolon. While this is technically correct, it would be clearer to either add a comment explaining the intentionally empty loop body or use an empty brace block {}. This makes it more obvious that the empty body is intentional and not a missing implementation.
| # - Remote URL: an HTTP URL (e.g. http://example.com/proxy.pac). The PAC | ||
| # will be fetched lazily on the first incoming connection to avoid | ||
| # blocking startup on network access. Only http:// (not https://) is | ||
| # supported for remote PAC fetching. After a successful fetch and parse | ||
| # the PAC is used for proxy selection. Fetch or parse failures are logged | ||
| # and may cause termination depending on configuration. | ||
| # | ||
| # Examples: | ||
| #Pac /etc/proxy.pac | ||
| #Pac http://intra.example.com/proxy.pac |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new documentation encourages configuring a remote PAC using http:// URLs, but PAC files are fetched over plain HTTP without TLS or integrity checks. Because the PAC script controls which upstream proxies are used for all traffic, any attacker able to perform DNS spoofing or MitM on that HTTP connection can serve a malicious PAC and silently redirect client traffic through an attacker-controlled proxy. The configuration examples and notes should strongly warn about this risk and, where possible, require or prefer authenticated/HTTPS delivery (or another integrity-protected mechanism) for PAC files rather than unauthenticated HTTP.
| Remote file: an HTTP URL (for example http://example.com/proxy.pac). The PAC | ||
| is fetched lazily on the first incoming connection to avoid blocking startup; | ||
| after a successful fetch and parse the PAC is used for parent-proxy selection. | ||
| Only plain HTTP (http://) is supported for remote PAC fetching (HTTPS is not | ||
| supported). |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The manpage now documents support for remote PAC files via http:// URLs and explicitly states that only plain HTTP is supported for PAC fetching. Since PAC contents are executed to decide which upstream proxies handle client traffic, loading them over unauthenticated HTTP allows any on-path attacker (e.g., via DNS/ARP spoofing) to tamper with the PAC and silently redirect all proxied traffic through an attacker-controlled proxy. The option description should highlight this as a high-risk configuration and, where feasible, favor HTTPS or another integrity-protected distribution mechanism for PAC files instead of bare HTTP.
|
I'm implementing https support, and it is already working for macOS, but it is still wip for Windows and Linux. That's why it is not yet ready for this PR. |
PAC files can only be loaded from local file system. This PR implements the download of a PAC file from an http server.
Now it is possible to define, e.g.,
PAC http://intra.example.com/proxy.pacin thecntlm.conf.The file is downloaded just before the first connection request.