Skip to content

tcp_outgoing_address and HTTPS#180

Closed
michaeladm wants to merge 486 commits intov4.0from
master
Closed

tcp_outgoing_address and HTTPS#180
michaeladm wants to merge 486 commits intov4.0from
master

Conversation

@michaeladm
Copy link

@michaeladm michaeladm commented Mar 20, 2018

We have a set of rules (ACL's with url regex) for content, depending on which we make a decision for the outgoing address, for example, from 10.10.1.xx to 10.10.6.xx

Acl.cc(151) matches: checked: tcp_outgoing_address 10.10.5.11 = 1
Checklist.cc(63) markFinished: 0x7fffffffe2b8 answer ALLOWED for match
FilledChecklist.cc(67) ~ACLFilledChecklist: ACLFilledChecklist destroyed 0x7fffffffe2b8
Checklist.cc(197) ~ACLChecklist: ACLChecklist::~ACLChecklist: destroyed 0x7fffffffe2b8
peer_select.cc(1026) handlePath: PeerSelector3438 found local=10.10.5.11 remote=17.253.37.204:80 HIER_DIRECT flags=1, destination #2 for http://iosapps.itunes.apple.com/...xxx...ipa
...
peer_select.cc(1002) interestedInitiator: PeerSelector3438
peer_select.cc(112) ~PeerSelector: http://iosapps.itunes.apple.com/...xxx...ipa
store.cc(464) unlock: peerSelect unlocking key 60080000000000001C0E000001000000 e:=p2IWV/0x815c09500*3
AsyncCallQueue.cc(55) fireNext: entering AsyncJob::start()
AsyncCall.cc(38) make: make call AsyncJob::start [call195753]
AsyncJob.cc(123) callStart: Comm::ConnOpener status in: [ job10909]
comm.cc(348) comm_openex: comm_openex: Attempt open socket for: 10.10.5.11
comm.cc(391) comm_openex: comm_openex: Opened socket local=10.10.5.11 remote=[::] FD 114 flags=1 : family=2, type=1, protocol=6

In the case of normal traffic (http), everything works fine, as shuld.

In the case of HTTPS with traffic analysis (ssl_bump) we have such a picture:

Acl.cc(151) matches: checked: tcp_outgoing_address 10.10.5.120 = 1
Checklist.cc(63) markFinished: 0x7fffffffe2b8 answer ALLOWED for match
FilledChecklist.cc(67) ~ACLFilledChecklist: ACLFilledChecklist destroyed 0x7fffffffe2b8
Checklist.cc(197) ~ACLChecklist: ACLChecklist::~ACLChecklist: destroyed 0x7fffffffe2b8
peer_select.cc(1026) handlePath: PeerSelector569 found local=10.10.5.120 remote=23.16.9.11:443 PINNED flags=1, destination #1 for http://app-6.squid.internal/...xxx...zip
peer_select.cc(1027) handlePath: always_direct = DENIED
peer_select.cc(1028) handlePath: never_direct = DENIED
peer_select.cc(1029) handlePath: timedout = 0
peer_select.cc(1002) interestedInitiator: PeerSelector569
FwdState.cc(443) startConnectionOrFail: http://app-6.squid.internal/...xxx...zip
HttpRequest.cc(472) clearError: old error details: 0/0
FwdState.cc(886) connectStart: fwdConnectStart: http://app-6.squid.internal/...xxx...zip
FwdState.cc(905) connectStart: pinned peer connection: 0x812c51018
client_side.cc(4082) borrowPinnedConnection: local=10.10.2.120:47901 remote=23.16.9.11:443 HIER_DIRECT FD 28 flags=1
client_side.cc(4057) validatePinnedConnection: local=10.10.2.120:47901 remote=23.16.9.11:443 HIER_DIRECT FD 28 flags=1
AsyncCall.cc(56) cancel: will not call ConnStateData::clientPinnedConnectionRead [call20129] because comm_read_cancel
AsyncCall.cc(56) cancel: will not call ConnStateData::clientPinnedConnectionRead [call20129] also because comm_read_cancel
ModKqueue.cc(174) SetSelect: FD 28, type=1, handler=0, client_data=0x0, timeout=0
comm.cc(964) comm_add_close_handler: comm_add_close_handler: FD 28, handler=1, data=0x8028bf398
AsyncCall.cc(26) AsyncCall: The AsyncCall SomeCloseHandler constructed, this=0x802a456d0 [call20139]
comm.cc(975) comm_add_close_handler: comm_add_close_handler: FD 28, AsyncCall=0x802a456d01
FwdState.cc(987) dispatch: local=127.0.0.1:20990 remote=127.0.0.120:59799 FD 26 flags=1: Fetching GET http://app-6.squid.internal/...xxx...zip
AsyncJob.cc(34) AsyncJob: AsyncJob constructed, this=0x81258fe38 type=HttpStateData [job1763]
store.cc(439) lock: Client locked key 3F020000000000001C0E000001000000 e:=p2IWV/0x812b2df00
4
...
peer_select.cc(112) ~PeerSelector: http://app-6.squid.internal/...xxx...zip
store.cc(464) unlock: peerSelect unlocking key 3F020000000000001C0E000001000000 e:=p2IWV/0x812b2df004
AsyncCallQueue.cc(55) fireNext: entering AsyncJob::start()
AsyncCall.cc(38) make: make call AsyncJob::start [call20141]
AsyncJob.cc(123) callStart: HttpStateData status in: [ job1763]
http.cc(2838) sendRequest: local=10.10.2.120:47901 remote=23.16.9.11:443 HIER_DIRECT FD 28 flags=1, request 0x8125e8800
5, this 0x81258fd18.
AsyncCall.cc(26) AsyncCall: The AsyncCall HttpStateData::httpTimeout constructed, this=0x812492f80 [call20142]
comm.cc(554) commSetConnTimeout: local=10.10.2.120:47901 remote=23.16.9.11:443 HIER_DIRECT FD 28 flags=1 timeout 86400
http.cc(2204) maybeMakeSpaceAvailable: may read up to 131072 bytes info buf(0/131072) from local=10.10.2.120:47901 remote=213.156.90.131:443 HIER_DIRECT FD 28 flags=1
AsyncCall.cc(26) AsyncCall: The AsyncCall HttpStateData::readReply constructed, this=0x812493020 [call20143]
Read.cc(57) comm_read_base: comm_read, queueing read for local=10.10.2.120:47901 remote=23.16.9.11:443 HIER_DIRECT FD 28 flags=1; asynCall 0x812493020*1
ModKqueue.cc(174) SetSelect: FD 28, type=1, handler=1, client_data=0x80ce04728, timeout=0
AsyncCall.cc(26) AsyncCall: The AsyncCall HttpStateData::wroteLast constructed, this=0x812493160 [call20144]

I understand that without analyzing the traffic and not knowing the final goal for the beginning, we can not manage the process further.
Question: how can we break the established channel (unpinn it) along the old route and establish a new channel along the new route, when we already know how.

IN 127.0.0.1:443 (22.33.44.55:443 ???) ---> OUT 10.10.1.1 ---> (Catch 22.33.44.55:443/this/is/it.zip) ---> Kill IN ... ??? OUT 10.10.1.1 ---> Establish OUT 10.10.5.1 ---> 22.33.44.55:443/this/is/it.zip

I'm willing to pay a large price for traffic congestion in this case, since the goal justifies it.

yadij and others added 30 commits March 2, 2017 14:26
Convert ftpRealm() from generating char* to SBuf. This fixes issues identified
by GCC 7 where the realm string may be longer than the available buffer and
gets truncated.
The size of the buffer was making the occurance rather rare, but it is still
possible.
* replace the String local with an SBuf to get appendf()

* overdue removal of empty lines and '!= NULL' conditions

* reduce scope redux for many out assignments

* use sizeof(tmp) instead of '1024'

* Fixes many GCC 7 compile errors from snprintf() being called with a
  too-small buffer.

* update the for-loops in Adaptation::History to C++11 and produce output
  in an SBuf. Removing need for iterator typedef's and resolving more GCC 7
  warnings about too-small buffers for snprintf().
1. Honor EOF on Squid-to-server connections with full read ahead buffers
   and no clients when --enable-delay-pools is used without any delay
   pools configured in squid.conf.

Since trunk r6150.

Squid delays reading from the server after buffering read_ahead_gap
bytes that are not yet sent to the client. A delayed read is normally
resumed after Squid sends more buffered bytes to the client. See
readAheadPolicyCanRead() and kickReads().

However, Squid was not resuming the delayed read after all Store clients
were gone. If quick_abort prevents Squid from immediately closing the
corresponding Squid-to-server connection, then the connection gets stuck
until read_timeout (15m), even if the server closes much sooner, --
without reading from the server, Squid cannot detect the connection
closure. The affected connections enter the CLOSE_WAIT state.

Kicking delayed read when the last client leaves fixes the problem. The
removal of any client, including the last one, may change
readAheadPolicyCanRead() answer and, hence, deserves a kickReads() call.

Why "without any delay pools configured in squid.conf"? When classic
(i.e., delay_pool_*) delay pools are configured, Squid kicks all delayed
reads every second. That periodic kicking is an old design bug, but it
resumes stuck reads when all Store clients are gone. Without classic
delay pools, there is no periodic kicking. This fix does not address
that old bug but removes Squid hidden dependence on its side effect.

Note that the Squid-to-server connections with full read-ahead buffers
still remain "stuck" if there are non-reading clients. There is nothing
Squid can do about them because we cannot reliably detect EOF without
reading at least one byte and such reading is not allowed by the read
ahead gap. In other words, non-reading clients still stall server
connections.

While fixing this, I moved all CheckQuickAbort() tests into
CheckQuickAbortIsReasonable() because we need a boolean function to
avoid kicking aborted entries and because the old separation was rather
awkward -- CheckQuickAbort() contained "reasonable" tests that were not
in CheckQuickAbortIsReasonable(). All the aborting tests and their order
were preserved during this move. The moved tests gained debugging.

According to the existing test order in CheckQuickAbortIsReasonable(),
the above problem can be caused by:

* non-private responses with a known content length
* non-private responses with unknown content length, having quick_abort_min
  set to -1 KB.

2. Honor read_ahead_gap with --disable-delay-pools.

Since trunk r13954.

This fix also addresses "Perhaps these two calls should both live
in MemObject" comment and eliminates existing code duplication.
A parsed value for the AnyP::UriScheme image constructor parameter was
stored without toLower() canonicalization for known protocols (e.g.,
Squid would store "HTTP" instead of "http" after successfully parsing
"HTTP://EXAMPLE.COM/" in urlParseFinish()). Without that
canonicalization step, Squid violated various HTTP caching rules related
to URI comparison (and served fewer hits) when dealing with absolute
URLs containing non-lowercase HTTP scheme.

According to my limited tests, URL-based ACLs are not affected by this
bug, but I have not investigated how URL-based ACL code differs from
caching code when it comes to stored URL access and whether some ACLs
are actually affected in some environments.
Convrts the Http::Message lock/unlock macros to inline functions so the compiler can
catch this type of regression in future Pointer updates
This apparently will make the clear() operators faster as they no longer
have to data-copy.

  Detected by Coverity Scan. Issues 1364734 and 1364737
Also, add move semantics to Http1::RequestParser. This apparently will
make the clear() operators faster as they no longer have to data-copy.
At least, one the base Parser class supports move as well.

It also consists a small experiment to see if virtaul destructor alone
allows automatic move constructor to be added by the compiler.
Squid may fail to load cache entry metadata for several very different
reasons, including the following two relatively common ones:

* A cache_dir entry corruption.
* Huge cache_dir entry metadata that does not fit into the I/O buffer
  used for loading entry metadata.

Knowing the exact failure reason may help triage and guide development.
We refactored existing checks to distinguish various error cases,
including the two above. Refactoring also reduced code duplication.

These improvements also uncovered and fixed a null pointer dereference
inside ufsdump.cc (but ufsdump does not even build right now for reasons
unrelated to these changes).
Destructor is requied because this hierarchy contains virtuals, which in turn
means the compiler will not add move constructor by default. So we must add
teh default ones in ourselves.

  Detected by Coverity Scan. Issues 1364733 and 1364736.
... resolving Via header truncation at 1024 bytes.

Also fixes the generated Via values for non-HTTP protocols.
Most of the logic seems to be hangovers from when session helper was
using the BerkleyDB v1.85 compatibility interface. Some of it is
possibly still necessary for the time_quota helper, but that helper has
not been using it so far and needs an upgrade to match what happened to
session helper.

Changes:

* The helpers needing -ldb will not be built unless the library and
headers are available. So we can drop the Makefile LIB_DB substitutions
and always just link -ldb explicitly to these helpers.

NP: Anyone who needs small minimal binaries, can build with the
--as-needed linker flag, or without these helpers. This change has no
effect on other helpers or the main squid binary.

* Since we no longer need to check if -ldb is necessary, we can drop the
configure.ac and acinclude logic detecting that.

* Remove unused AC_CHECK_DECL(dbopen, ...)
 - resolves one "FIXME"

* Fix the time_quota helper check to only scan db.h header file contents
if that file is existing, and if the db_185.h file is not being used
instead.

* Fix the session helper check to only try compiling with the db.h
header if that header actually exists.

* De-duplicate the library header file detection shared by configure.ac
and the helpers required.m4 files (after the above two changes).

* Remove unused DBLIB variable from configure.ac.
Improves speed in several common header code paths using String.

Detected by Coverity Scan. Issue 1364732.
k0rv1n and others added 25 commits January 16, 2018 16:28
Matches CONNMARK of accepted connections. Takes into account
clientside_mark and qos_flows mark changes (because Squid-set marks are
cached by Squid in conn->nfmark). Ignores 3rd-party marks set after
Squid has accepted the connection from a client (because Squid never
re-queries the connection to update/sync conn->nfmark).

Also added a debugs()-friendly API to print hex values.
Alex Rousskov:
  let's consider removing the custom ESI parser from Squid. It is of
terrible quality and "nobody" is testing ESI code when things change. Is
the CVE risk worth supporting few platforms that do not have the right
parser libraries?

* Fixed configure.ac tests for ESI libraries
Explicit copy construction was slow and unnecessary.
Explicit copy assignment mishandled self copying and was unnecessary.
The remaining memcpy() calls mishandled self copying.

There are no known cases of Ip::Address self copying.
* Update reference to Squid-3.6

* Add missing squid.conf change details
Broken by commit 76d6111 which (correctly) made createMeObject() assert
but missed one case where the old code should have been converted to
call the new ensureMemObject() instead.

peerDigestRequest() is called every 5 minutes, triggered by the
peerDigestCheck event. Most calls find the old digest entry that has the
same method and URIs.
…e Basic (#104)

Commit 889fc47 was made to fix issue with Basic authentication and SSL bumping. But after this commit we can no longer properly use http_access with proxy_auth/proxy_auth_regex ACL because that type of ACL always return 1(match) regardless of the conditions in the rules.

Use the caches authentication results (if any) instead of a fixed 1(match) result.
... and Replace ESIParser::Parsers global with ESIParser::GetRegistry() lookup function

Resolves assertions on shutdown and an outstanding TODO.
…#81)

Move the http_port cert= and key= options logic to libsecurity and add GnuTLS implementation for PEM file loading. Also adds some extra debugging to clarify listening port initialization problems with the PEM files.

Enable most of the http(s)_port listening socket logic to always build except where OpenSSL-specific dependency still exists. It may seem reasonable to leave it optionally excluded for minimal builds, however a minimal proxy that does not support HTTPS in any way is increasingly useless in the modern web so preference is given to building the generic TLS related code. This also simplifies the required testing to detect code portability issues.

GnuTLS implementation is added for https_port configured with static cert=/key= parameters and the resulting TLS handshake behaviour. Squid built with GnuTLS can now act as useful parent proxies behind a SSL-Bump'ing frontend or for other clients which require a TLS explicit proxy.

Also fixes the definitions for the CertPointer and PrivateKeyPointer.
When Squid finds a requested entry in the memory cache, it does not
check whether the same entry is also stored in a cache_dir. The
StoreEntry object may become associated with its store entry in the
memory cache but not with its store entry on disk. This inconsistency
causes two known problems:

1. Squid may needlessly swap out the memory hit to disk, either
   overwriting an existing (and identical) disk entry or, worse,
   creating a duplicate entry on another disk. In the second case, the
   two disk entries are not synchronized and may eventually start to
   differ if one of them is removed or updated.

2. Squid may not delete a stale disk entry when needed, violating
   various HTTP MUSTs, and eventually serving stale [disk] cache entries
   to clients.

Another purging problem is not caused by the above inconsistency:

3. A DELETE request or equivalent may come for the entry which is still
   locked for writing. Squid fails to get a lock for such an entry (in
   order to purge it) and the entry remains in disk and/or memory cache.

To solve the first two problems:

* StoreEntry::mayStartSwapout() now avoids needless swapouts by checking
  whether StoreEntry was fully loaded, is being loaded, or could have
  been loaded from disk. To be able to reject swapouts in the last case,
  we now require that the newer (disk) entries explicitly delete their
  older variants instead of relying on the Store to overwrite the older
  (unlocked) variant. That explicit delete should already be happening
  in higher-level code (that knows which entry is newer and must mark
  any stale entries for deletion anyway).

To fix problem #3:

* A new Store::Controller::evictIfFound(key) method purges (or marks for
  deletion if purging is impossible) all the matching store entries,
  without loading the StoreEntry information from stores. Avoiding
  StoreEntry creation reduces waste of resources (the StoreEntry object
  would have to be deleted anyway) _and_ allows us to mark being-created
  entries (that are locked for writing and, hence, cannot be loaded into
  a StoreEntry object).

XXX: SMP cache purges may continue to malfunction when the Transients
table is missing. Currently, Transients are created only when the
collapsed_forwarding is on. After Squid bug 4579 is fixed, every public
StoreEntry will have the corresponding Transients entry and vice versa,
extending these fixes to all SMP environments.

Note that even if Squid properly avoids storing duplicate disk entries,
some cache_dir manipulations by humans and Squid crashes may lead to
such duplicates being present.  This patch leaves dealing with potential
duplicates out of scope except it guarantees that if an entry is
deleted, then all [possible] duplicates are deleted as well.

Fixing the above problems required (and/or benefited from) many related
improvements, including some Store API changes. It is impractical to
detail each change here, but several are highlighted below.

To propagate DELETEs across workers, every public StoreEntry now has a
Transients entry.

Prevented concurrent cache readers from aborting when their entry is
release()d. Unlike abort, release should not affect current readers.

Fixed store.log code to avoid "Bug: Missing MemObject::storeId value".

Removed Transients extras used to initialize MemObject StoreID/method in
StoreEntry objects created by Transients::get() for collapsed requests.
Controlled::get() and related Controller APIs do not _require_ setting
those MemObject details: get() methods for all cache stores return
StoreEntry objects without them (because entry basics lack Store ID and
request method). The caller is responsible for cache key collision
detection. Controlled::get() parameters could include Store ID and
request method for early cache key collision detection, but adding a
StoreQuery class and improving collision detection code is outside this
project scope (and requires many changes).

Found more cases where release() should not prevent sharing.
Remaining cases need further analysis as discussed in master 39fe14b.

Greatly simplified UFS store rebuilding, possibly fixing subtle bug(s).

Clarified RELEASE_REQUEST flag meaning, becoming 'a private StoreEntry
which can't become public anymore'. Refactored the related code,
combining two related notions: 'a private entry' and 'an entry marked
for removal'.

Do not abort collapsed StoreEntries during syncing just because the
corresponding being stored shared entry was marked for deletion. Abort
them if the shared entry has been also aborted.

Added StoreEntry helper methods to prevent direct manipulation of
individual disk-related data members (swap_dirn, swap_filen, and
swap_status). These methods help keep these related data members in a
coherent state and minimize code duplication.
... it was lacking 169.254.0.0/16 subnet due to a typo in fe204e1.
* Remove self-signed CA check

This check is not needed when loading the initial cert portion of a PEM file
as it will be performed later when loading the chain and was causing
self-signed CA to be rejected incorrectly.

* Fix a typo in debugs output

* Always generate static context from tls-cert= parameter

... if a cert= is provided. SSL-Bump still (for now) requires a static context as fallback when generate fails.

* Revert tlsAttemptHandshake to Squid_SSL_Accept API

* Update const correctness

* Document when initialization is skipped
The clientside_mark ACL was not working with TPROXY because a
conntrack query could not find connmark without a true client port.

Ip::Intercept::Lookup() must return true client address, but its
TproxyTransparent() component was reseting the client port. We should
use zero port when we compute the source address for the Squid-to-peer
connection instead.
There is no good way to printf() time_t. AFAICT, Squid usually just
casts to (long) int. TODO: Use C++ streams (with manipulators) instead.
PR#81 added functions only provided by GnuTLS 3.4.0 and later, but did not bump the
configure.ac check for GnuTLS to enforce that version as minimum.
Also moved private Controller method descriptions to .cc per convention.
To improve cross-compilation support and to simplify code, rely on C++11
cstdarg header instead of ./configure-time va_copy() detection.

Using ./configure-time detection for va_copy() is dangerous because when
it does not work (e.g., during a poorly configured cross-compilation
attempt), Squid may crash if va_copy() was needed but was not detected.

See also: Bug 4821 and bug 753.

Also found and fixed a low-level bug: StoreEntry::vappendf() was not
using va_copy() because store.cc lacked VA_COPY #defines. The affected
code (900+ callers!) is used for cache manager responses and Gopher
gateway response compilation. If any of those calls required a buffer
larger than 4KB, the lack of those va_copy() calls could lead to crashes
and/or data corruption issues on platforms where va_copy() is required.
Also update mkrelease.sh script for git
Solaris 10+ backported IPFiter v5 features to their v4.1.9 which breaks
the IPFilterv4 logic when IPv6 is received. Resulting in crashes.
see bug 4828
@squid-prbot
Copy link
Collaborator

Can one of the admins verify this patch?

@yadij
Copy link
Contributor

yadij commented Mar 20, 2018

Do you have actual proposed code changes hidden in those hundreds of commits that make Squid-4 and Squid-5 different? You need to do a cross-forks PR if your personal/private repository has changes, and please make sure any branch proposed is based on our master.

@michaeladm
Copy link
Author

I do not have "personal/private repository changes". I use squid-5 master branch. I do not know how post to here bugs or notatios, other than this method.

@yadij
Copy link
Contributor

yadij commented Mar 20, 2018

Ah, we do not use github for either of those.

For discussions about using Squid please contact the squid-users mailing list.
For discussions about Squid code changes please contact the squid-dev mailing list.
http://www.squid-cache.org/Support/mailing-lists.html

For reports about specific code bugs please use our Bugzilla.
http://bugs.squid-cache.org/

From what I can tell from your message you probably want squid-dev to discuss the missing ability to generate CONNECT through cache_peer's.

@yadij yadij closed this Mar 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.