Skip to content

Bug 4505: SMP caches sometimes do not purge entries#46

Merged
rousskov merged 126 commits intosquid-cache:masterfrom
measurement-factory:SQUID-175-methods-do-not-invalidate-cache
Feb 2, 2018
Merged

Bug 4505: SMP caches sometimes do not purge entries#46
rousskov merged 126 commits intosquid-cache:masterfrom
measurement-factory:SQUID-175-methods-do-not-invalidate-cache

Conversation

@eduard-bagdasaryan
Copy link
Contributor

@eduard-bagdasaryan eduard-bagdasaryan commented Aug 16, 2017

When Squid finds a requested entry in the memory cache, it does not
check whether the same entry is also stored in a cache_dir. The
StoreEntry object may become associated with its store entry in the
memory cache but not with its store entry on disk. This inconsistency
causes two known problems:

  1. Squid may needlessly swap out the memory hit to disk, either
    overwriting an existing (and identical) disk entry or, worse,
    creating a duplicate entry on another disk. In the second case, the
    two disk entries are not synchronized and may eventually start to
    differ if one of them is removed or updated.

  2. Squid may not delete a stale disk entry when needed, violating
    various HTTP MUSTs, and eventually serving stale [disk] cache entries
    to clients.

Another purging problem is not caused by the above inconsistency:

  1. A DELETE request or equivalent may come for the entry which is still
    locked for writing. Squid fails to get a lock for such an entry (in
    order to purge it) and the entry remains in disk and/or memory cache.

To solve the first two problems:

  • StoreEntry::mayStartSwapout() now avoids needless swapouts by checking
    whether StoreEntry was fully loaded, is being loaded, or could have
    been loaded from disk. To be able to reject swapouts in the last case,
    we now require that the newer (disk) entries explicitly delete their
    older variants instead of relying on the Store to overwrite the older
    (unlocked) variant. That explicit delete should already be happening
    in higher-level code (that knows which entry is newer and must mark
    any stale entries for deletion anyway).

To fix problem #3:

  • A new Store::Controller::evictIfFound(key) method purges (or marks for
    deletion if purging is impossible) all the matching store entries,
    without loading the StoreEntry information from stores. Avoiding
    StoreEntry creation reduces waste of resources (the StoreEntry object
    would have to be deleted anyway) and allows us to mark being-created
    entries (that are locked for writing and, hence, cannot be loaded into
    a StoreEntry object).

Note that even if Squid properly avoids storing duplicate disk entries,
some cache_dir manipulations by humans and Squid crashes may lead to
such duplicates being present. This patch leaves dealing with potential
duplicates out of scope except it guarantees that if an entry is
deleted, then all [possible] duplicates are deleted as well.

Fixing the above problems required (and/or benefited from) many related
improvements, including some Store API changes. It is impractical to
detail each change here, but several are highlighted below.

To propagate DELETEs across workers, every public StoreEntry now has a
Transients entry.

Prevented concurrent cache readers from aborting when their entry is
release()d. Unlike abort, release should not affect current readers.

Fixed store.log code to avoid "Bug: Missing MemObject::storeId value".

Removed Transients extras used to initialize MemObject StoreID/method in
StoreEntry objects created by Transients::get() for collapsed requests.
Controlled::get() and related Controller APIs do not require setting
those MemObject details: get() methods for all cache stores return
StoreEntry objects without them (because entry basics lack Store ID and
request method). The caller is responsible for cache key collision
detection. Controlled::get() parameters could include Store ID and
request method for early cache key collision detection, but adding a
StoreQuery class and improving collision detection code is outside this
project scope (and requires many changes).

Found more cases where release() should not prevent sharing.
Remaining cases need further analysis as discussed in master 39fe14b.

Greatly simplified UFS store rebuilding, possibly fixing subtle bug(s).

Clarified RELEASE_REQUEST flag meaning, becoming 'a private StoreEntry
which can't become public anymore'. Refactored the related code,
combining two related notions: 'a private entry' and 'an entry marked
for removal'.

Do not abort collapsed StoreEntries during syncing just because the
corresponding being stored shared entry was marked for deletion. Abort
them if the shared entry has been also aborted.

Added StoreEntry helper methods to prevent direct manipulation of
individual disk-related data members (swap_dirn, swap_filen, and
swap_status). These methods help keep these related data members in a
coherent state and minimize code duplication.

XXX: SMP cache purges may continue to malfunction when the Transients
table is missing. Currently, Transients are created only when the
collapsed_forwarding is on. After Squid bug 4579 is fixed, every public
StoreEntry will have the corresponding Transients entry and vice versa,
extending these fixes to all SMP environments.

Copy link
Contributor

@yadij yadij left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks mostly okay to me. Please fix the compile issues in stubs. The rest are optional polish requests and I will approve with just the compile ones fixed.

(swap_dirn < 0) == (swap_status == SWAPOUT_NONE) &&
(swap_dirn < 0 || swap_dirn < Config.cacheSwap.n_configured);

if (!ok) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems to be missing a WARNING label, unless it is not that important really and more deserving of a level 3+ debugs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with 'ERROR' label, since it is an 'error' rather than a 'warning' here.

@@ -1766,19 +1778,22 @@ neighborsHtcpClear(StoreEntry * e, const char *uri, HttpRequest * req, const Htt
char buf[128];

for (p = Config.peers; p; p = p->next) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function can be polished up a bit more. Please consider making the p variable be defined as part of the loop definition like neighborsHtcpClearNeeded and the buf a static local inside the new if-condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

virtual bool smpAware() const override;

/// Whether the entry with the key exists and was marked for removal
/// some time ago; get(key) will return NULL in such case.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please reference 'nil' or nullptr whichever is more descriptive instead of NULL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

virtual void sync() override;
virtual void maintain() override;
virtual void markForUnlink(StoreEntry &) override;
virtual void unlinkByKeyIfFound(const cache_key *key) override;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please follow the updated coding style in this file, not mentioning parameter names unless they are needed to document the parameter better than type or method name can. StoreEntry and cache_key are already descriptive enough types to convey the necessary detail.

Please also consider using the new style in the other new function and methods. I'm not asking for those others as a review item there since it is polish and they are situated in places not using the new style yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Also fixed some other new methods.

int64_t MemStore::maxObjectSize() const STUB_RETVAL(0)
bool MemStore::dereference(StoreEntry &) STUB_RETVAL(false)
void MemStore::markForUnlink(StoreEntry&) STUB
void MemStore::unlinkByKeyIfFound(const cache_key *key) STUB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove the parameter name here. it leads to compile error 'unused variable' on strict compilers.

same in new stub_store.cc and TestSwapDir.h lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.
BTW, there are currently lots of unused variables in Squid sources. I assume this should be fixed first before we are going to test new patches with 'strict' compilers.

@yadij yadij added the S-waiting-for-author author action is expected (and usually required) label Aug 21, 2017
@eduard-bagdasaryan
Copy link
Contributor Author

@yadij , I addressed your review comments and updated the PR.

Copy link
Contributor

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have discovered a new release() problem, stated my recommendations regarding the syncCollapsed() TODO, and added a few minor polishing requests.

assert(collapsed->mem_obj);
assert(collapsed->mem_obj->smpCollapsed);
// TODO: confirm whether this is reasonable:
// assert(collapsed->store_status == STORE_PENDING);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eduard: what to do with STORE_PENDING check? I suspect that it should always store_status==STORE_PENDING inside Controller::syncCollapsed(), because syncing should be performed on not-yet completed/STORE_OK entry.

Ignoring the trivial case of findCollapsed() returning nil, syncCollapsed() can be called for any transient entry. To assert that any transient entry is STORE_PENDING, we must establish two conditions:

  1. syncCollapsed() relies on the transient entry being STORE_PENDING.
  2. A transient entry cannot be STORE_OK.

For item 1: I do not see why syncCollapsed() would need the transient entry to always be in STORE_PENDING state. Do you? Sure, there are places inside syncCollapsed() where STORE_PENDING is needed (e.g., when we call abort(), but there are also places where it is irrelevant (e.g., when the entry is inSync).

For item 2: AFAICT, a StoreEntry object becomes STORE_OK when it is fully stored in disk or memory cache OR when it is aborted. I see no code that would guarantee that a StoreEntry is always removed from Transients when it becomes STORE_OK. Did I miss it?

Eduard: There is another reason indirectly confirming that: StoreEntry::abort() asserts if store_status!=STORE_PENDING and this method is invoked throughout Controller::syncCollapsed().

This is indeed an important observation -- syncCollapsed() must not abort() aborted or completed StoreEntry objects. However, that makes STORE_PENDING a precondition for the syncCollapsed() code calling abort(). It does not imply that the entire syncCollapsed() code has such a precondition.

If you do not find flaws in the above reasoning, then I suggest removing this TODO comment. More related same-method changes are requested below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then I suggest removing this TODO comment.

Done at c0a470d.

syncCollapsed() must not abort() aborted or completed StoreEntry objects.

The related c0a470d change handles 'aborted' case. Not sure what was specifically suggested for 'completed' case, left as is for now. So, currently, abort() will assert there if the entry is STORE_OK.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alex: This is indeed an important observation -- syncCollapsed() must not abort() aborted or completed StoreEntry objects.

Eduard: Not sure what was specifically suggested for 'completed' case

Nothing was suggested in that phrase, I was just retelling what you said (not STORE_PENDING means aborted or completed StoreEntry). If the current syncCollapsed() code never calls abort() for aborted or completed StoreEntry objects, then we are all set as far as this "remove TODO" change request is concerned.

debugs(20, 7, "syncing " << *collapsed);

bool abandoned = transients->abandoned(*collapsed);
bool aborted = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/aborted/abortedByWriter/ to clarify what kind of abort we are dealing with here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done at c0a470d.

bool waitingToBeFreed = false;
transients->status(*collapsed, aborted, waitingToBeFreed);
if (aborted) {
debugs(20, 3, "aborting " << *collapsed << " due to aborted shared status");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/due to aborted shared status/because its writer has aborted/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done at c0a470d.

collapsed->abort();
return;
}
bool found = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add an empty line here -- we are done processing the aborted case. The rest of the method can assume that the shared entry is not aborted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done at c0a470d.

bool aborted = false;
bool waitingToBeFreed = false;
transients->status(*collapsed, aborted, waitingToBeFreed);
if (aborted) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add an empty line above the if statement to better isolate the aborted case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done at c0a470d.

{
if (StoreEntry *entry = static_cast<StoreEntry*>(hash_lookup(store_table, key))) {
debugs(20, 3, "got in-transit entry: " << *entry);
entry->release();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this entry is collapsed but still detached, will its local client(s) get stuck for a while, waiting for a syncCollapsed() call (that this particular method does not trigger for local clients)? And then, when that syncCollapsed() call finally comes, we will call abort() because we cannot attach this entry?

AFAICT, collapsed but still detached entries in other workers will not get stuck because StoreEntry::release() eventually calls Transients::abandon() that calls CollapsedForwarding::Broadcast(). Please double check the current code and correct me if I am wrong.

If my concerns are at least partially valid, then we could add more code to StoreEntry::release(), but I am worried that calling abort() from release() would create dangerous reentrant code (i.e., the object calling release() will be synchronously called back to be notified about the abort and then will resume running with an aborted entry). While abort.callback is asynchronous, invokeHandlers() (also called by StoreEntry::abort()) is not!

To address this, perhaps we should do something like this instead:

// entry->release() notifies other waiting workers but
// XXX: it does not abort local collapsed Store clients if needed.
entry->release();
// Work around the above XXX. TODO: Move into release() after making invokeHandlers() asynchronous.
if (... has transient index ...)
    syncCollapsed(...);

Any better ideas?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, collapsed but still detached entries in other workers will not get stuck because StoreEntry::release() eventually calls Transients::abandon() that calls CollapsedForwarding::Broadcast().

While experimenting, I saw that this does not work properly with our changes. Currently, 'DELETE' causes purgeEntriesByUrl(), Controller::unlinkByKeyIfFound() and Transients::unlinkByKeyIfFound() (for USE_HTCP off scenario). These were our 'innovations' to optimize 'heavy' StoreEntry loading. I.e., instead of StoreEntry::release() and eventually Transients::abandon() now we have unlinkByKeyIfFound() and no abandon(). So other worker clients keep 'hanging' on detached entry which became deleted.
To fix this, we can either:

  • revert our changes in purgeEntriesByUrl() and load StoreEntry or
  • teach Transients::unlinkByKeyIfFound() to do CollapsedForwarding::Broadcast(). There should be added a suitable api for this though.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert our changes in purgeEntriesByUrl() and load StoreEntry

IIRC, loading StoreEntry is not an option in general because Controller::find() returns nil for not-yet-readable entries. See your PR description near "mark being-created
entries". Our changes were not simply an optimization. They addressed a problem with marking unreadable entries for deletion. We need to fix/improve our changes, not revert them.

teach Transients::unlinkByKeyIfFound() to do CollapsedForwarding::Broadcast().

unlinkByKeyIfFound() or a related method should broadcast (in some cases): A worker must broadcast whenever it changes the state of a shared being-read entry -- the broadcast is the mechanism for readers in other workers to learn about those changes. Marking a transient entry for deletion requires broadcasting (when there are other entry readers).

I am not sure this broadcast should be initiated by the Transients::unlinkByKeyIfFound() method, by the corresponding Controller method, or by similar methods in the cache Stores. I hope you can find and defend the answer to that question. Ideally, there should be at most one broadcast per change. See below for more clues.

There should be added a suitable api for this though.

I would check whether calling Transients::get() and broadcasting the resulting entry (or not broadcasting at all if it returns nil) is sufficient (and a good idea) in this case. That would still be different than relying on Controller::find() for deletion. Here, we are relying on (fast) get() to broadcast the change, not to mark the store entries for deletion.

Note that, AFAICT, we do not need to broadcast if the transients entry has already been marked for deletion (which means we changed nothing) or when the transients entry is locked for writing (which means it cannot have any readers yet).

Also, if there is a local StoreEntry object already, we should probably use that instead of calling Transients::get(). Perhaps the code already has this optimization.

Combining all of the above, you might arrive at something like this (entry locking/unlocking not shown and may be missing):

    if (auto entry = transients->get(key))
        entry->release(); // broadcasts to inform other readers if needed
    else
        Controller::unlinkByKeyIfFound(key); // does not broadcast but there are no readers

As always, please check my reasoning. The overarching "broadcast changes when there are readers" principle should be correct, but I could miss various nuances/details.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, loading StoreEntry is not an option in general because Controller::find() returns nil for not-yet-readable entries. See your PR description near "mark being-created
entries". Our changes were not simply an optimization. They addressed a problem with marking unreadable entries for deletion. We need to fix/improve our changes, not revert them.

To clarify: my (1) suggestion did not assume 100% reverting purgeEntriesByUrl(): unlinkByKeyIfFound() method should stay of course, to handle situation when it is already impossible/not_yet_possible to 'get' the entry from store but it is still possible to get it by the key. I suggested to keep getting entry when we can(as the old code did). This would be a simple but not a perfect/complete solution though.

Copy link
Contributor

@rousskov rousskov Aug 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My sketch is similar to what you are suggesting then, except it does not pay the relatively high price of creating a new StoreEntry object when transients do not have a usable StoreEntry object already.

If the optimization in my sketch causes trouble, then reusing the old code (i.e., creating a new StoreEntry object from store metadata whenever possible) is fine. Just add a source code comment for the reader who might be tempted to optimize this code by using Transients::get() alone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed, release() sometimes did not cause broadcasting in my tests (including the old code). I see the reason in Store::Controller::unlink(): it does not care about Transients, calling transientsAbandon().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the lack of a transients->unlink() call in Controller::unlink() looks wrong, especially since Controller::markForUnlink() has the expected transients->markForUnlink() call.

However, I cannot say, without doing research, whether changing that method alone would guarantee broadcasts during release(). Please make sure release() broadcasts (directly or indirectly) whenever the state changes in such a way that a broadcast is required, regardless of whether Controller::unlink() has been called. Locating the best place(s) to (check state and) initiate that broadcast is not easy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I cannot say, without doing research, whether changing that method alone would guarantee broadcasts during release()

No, it is not guaranteed: there is another problem. Transients::unlink() does broadcasting only for 'writers', i.e., when "e.mem_obj->xitTable.io == MemObject::ioWriting". That was the initial design: only writers can broadcast, but now, evidently, we need to broadcast from any worker, which received DELETE.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to broadcast from any worker, which received DELETE

To be more precise:

  • from any worker which changed the shared entry state
  • to any worker reading that shared entry

This PR focus is on DELETE-triggered changes, and we can broaden these narrow rules if needed to ease implementation (e.g., broadcast to all workers), but we should keep these rules in mind.

Please note that the rules have not changed. What changed is our understanding of their impact: We now know that DELETEs (and such) may require initiating broadcasts from non-writers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The related fixes for this thread discussion at 2765f0b.

src/enums.h Outdated
ENTRY_REVALIDATE_ALWAYS,
DELAY_SENDING,
RELEASE_REQUEST,
RELEASE_REQUEST, ///< makes private key permanent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the flag itself does not really make anything, I would rephrase as "prohibits making the key public".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done at c0a470d.

src/enums.h Outdated
typedef enum {
SWAPOUT_NONE,
SWAPOUT_WRITING,
SWAPOUT_NONE, ///< the store entry is in a 'disconnected' from disk state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disconnected can be easily misinterpreted as "was connected but not longer connected". How about "StoreEntry is currently not associated with any disk store entry". And, to make it more symmetric with other definitions:

/// StoreEntry is currently not associated with any disk store entry.
/// Does not guarantee (or preclude!) a matching disk store entry existence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done at c0a470d.

src/enums.h Outdated
SWAPOUT_WRITING,
SWAPOUT_NONE, ///< the store entry is in a 'disconnected' from disk state
SWAPOUT_WRITING, ///< the store entry is being stored on a disk
/// The store entry has been stored on a disk and still is 'connected'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest:

/// StoreEntry is associated with a complete (i.e., fully swapped out) disk store entry.
/// Guarantees the disk store entry existence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done at c0a470d.

src/enums.h Outdated
SWAPOUT_NONE,
SWAPOUT_WRITING,
SWAPOUT_NONE, ///< the store entry is in a 'disconnected' from disk state
SWAPOUT_WRITING, ///< the store entry is being stored on a disk
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest:

/// StoreEntry is being swapped out to the associated disk store entry.
/// Guarantees the disk store entry existence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done at c0a470d.

src/store.cc Outdated
expireNow();
if (!shareable) {
// This object should not be reused
expireNow();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to expire entries when making them private?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makePrivate(false) is called when the decision is 'reuseNot': we can't share the entry even with already collapsed clients. E.g., we got a header forbidding any caching/sharing (smth. like CC:no-store).

Copy link
Contributor

@rousskov rousskov Aug 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to expire entries when making them private and unshareable? You are explaining why unshareable entry cannot be shared, but I am asking why it has to expire.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably this sounds vague, but to my mind 'expiration' prevents sharing shareable but not reusable entries. I.e., it makes difference between doNotCacheButShare and reuseNot states.
I am not insist this is [the only] correct way for doing this. This is how it is implemented and working now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'expiration' prevents sharing shareable but not reusable entries

True, but should not be relevant because we are talking about unshareable entries.

I.e., it makes difference between doNotCacheButShare and reuseNot states.

My argument that it should not make a difference. I do not know whether it does. Unshareable entries should be unshareable regardless of their expiration time. If unsharable entries are shareable, there is a bug we should fix.

This is how it is implemented and working now.

I know but that is largely irrelevant to my question. If the current code works correctly (in some cases) because of a bad hack, and we are fixing a very related aspect of that code because of a very related bug (in other cases), then we should remove the hack instead of splitting the code into two parts, one that continues to rely on a hack and one that is no longer affected by the bugs that hack caused.

If you remove that if statement and expireNow() call, will the code break in some cases?

  • If yes, then the code relies on a weak expiration check to prevent sharing of unsharable entries. Such code must be fixed IMO.
  • If no, then the code is already OK and the expireNow() call is simply not needed (after all your other changes/fixes).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no, then the code is already OK and the expireNow() call is simply not needed (after all your other changes/fixes).

I think you are right with this. I overlooked our StoreEntry::mayStartHitting(), developed to distinguish such shareable and unshareable entries. So there should be no need to do extra work with expireNow().

will the code break in some cases

It is difficult to say. I can state only that the major code paths (clientReplyContext::cacheHit() or clientReplyContext::handleIMSReply()) already use mayStartHitting() and there is no need there for protection with 'expiration'.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can state only that the major code paths (clientReplyContext::cacheHit() or clientReplyContext::handleIMSReply()) already use mayStartHitting() and there is no need there for protection with 'expiration'.

The undocumented StoreEntry::validToSend() method also checks for "released" entries and, hence, does not rely on the expiration date alone (although I suspect it should check mayStartHitting() instead).

Finally, setting StoreEntry::expires is a rather weak protection because that setting can be ignored based on max-stale directives (at least). Look for FRESH_REQUEST_MAX_STALE_... Unlike the expiration date, private keys and shareableWhenPrivate should never be ignored.

Store::Root().unlinkByKeyIfFound(key);
// Are there local collapsed clients we should notify?
// If yes, get transients index and use it later.
const auto transientsIndex = entry->hasTransients() && entry->locked() ? entry->mem_obj->xitTable.index : -1;
Copy link
Contributor

@rousskov rousskov Aug 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that you call locked() here because Controller::get() might return an unlocked entry, and the transientsIndex is not going to survive StoreEntry::release() of an unlocked entry. I think we should make this code safer and simpler by always locking/unlocking the entry returned by get(). Doing so will remove this complex locked() logic and make post-release() entry manipulation safe. As a positive side-effect, it may also delete unlocked intransit entries that some other code might leave behind.

if (auto entry = Store::Root().get(key)) {
    entry->lock("purgeEntriesByUrl");
    ...

    // entry->release() notifies other waiting workers but
    // XXX: does not abort local collapsed Store clients if needed.
    entry->release(true);
    // Work around the above XXX.
    // TODO: Move into release() after making invokeHandlers() asynchronous.
    if (entry->hasTransients())
        Store::Root().syncCollapsed(entry->mem_obj->xitTable.index);

    entry->unlock("purgeEntriesByUrl");
} else ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done at fdcfaa3.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid I applied your improvement in hurry: I forgot about one reason why I decided to implement without locks. When we get from the Store, we could create a 'fake' entry (when store_table misses it). We need this entry only for notification purpose in release(). The syncCollapsed() method should handle local 'detached' entries, but it should not handle(abort for example) our 'fake' entry! That is why my previous version destroyed this fake entry first(after receiving required index) and then called syncCollapsed().
So I would suggest to undo that fdcfaa3.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we get from the Store, we could create a 'fake' entry (when store_table misses it).

Why do you call that entry "fake"? It is a regular/valid entry AFAICT. Does it have some special properties that are important in this context?

We need this entry only for notification purpose in release().

Yes, and, until that XXX is fixed, for the synchronization purposes in syncCollapsed().

The syncCollapsed() method should handle local 'detached' entries,

Yes.

but it should not handle(abort for example) our 'fake' entry!

Why not?

Copy link
Contributor Author

@eduard-bagdasaryan eduard-bagdasaryan Aug 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you call that entry "fake"?

I call this 'fake' because of my understanding of its usage. It is not used by clients. It is a 'helper' way to notify others. We could avoid using it at all, if, for example, transients->unlinkByKeyIfFound() had a mechanism to notify all workers(including this one) in case Transients has this entry. That was my initial suggestion you did not like.

Why not?

My suggestion prevents this extra work done on this entry in syncCollapsed(), including probably unnecessary(and misleading) logging. BTW, the abort() method says: "Someone wants to abort this transfer." What "transfer" do we have with our "fake" entry? If there is no local entry, we would return after transients->findCollapsed(xitIndex).

Copy link
Contributor Author

@eduard-bagdasaryan eduard-bagdasaryan Sep 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we must guarantee that Transients::unlinkByKeyIfFound() is not called when there is a local StoreEntry object.

Or, equivalently, if Transients table exists, for each local StoreEntry hasTransients() must be true. destroyMemObject() disconnects Transients from the StoreEntry. I see two scenarios to consider:

  1. Transients was disconnected while deleting StoreEntry, i.e., destroyStoreEntry() is called. In this case we firstly disconnect it from Transients and secondly remove it from store_table. Is it possible that anybody attaches to the already disconnected, but not yet deleted entry? Looks like impossible that this code became interrupted.

  2. Transients was disconnected while purging memory (handleIdleEntry(), discussed above, or storeGetMemSpace() methods). As I noted above, there could be cases when such memobj-less StoreEntries keep existing in the local store_table. What is the purpose of leaving such half-baked local entries? Optimization or anything else? If it is a bug, it should be fixed and the invariant becomes correct. If not, the invariant does not work.
    Also, one thing is known that such transient-less and memobj-less entries should be also unlocked, meaning that there are no clients attached to it. If this consideration is correct, the invariant should be valid without any additional fixes: we should not worry about such 'stand-alone' entries, because they stay apart from collapsing, notifying, etc. Also, we can assert that entry.locked()==false, before calling unlinkByKeyIfFound() inside Transients::markForUnlink().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: since we are dealing with a multi-process/thread environment any process may cause a kernel interrupt and do something with the shared parts unless the Transients disconnect part of the sequence is wrapped in proper atomic locks and the reader/writer counts checked inside the atomic lock (our StoreEntry locking does not count as thread-safety).

[ I have not checked the code to see if that is present, but you dont mention it as the reason for being "impossible", so though I should mention it. ]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yadij, AFAICT, when talking about "impossible" and "code interrupts" in item 1, Eduard is actually talking about things happening inside one Squid process. Here, "code interrupts" means AsyncCalls and such rather than process interrupts . This particular concern is unrelated to SMP, and Eduard is right that it is impossible for a new Store client to attach to a being destroyed StoreEntry object (i.e., attach during destroyStoreEntry()).

Copy link
Contributor Author

@eduard-bagdasaryan eduard-bagdasaryan Sep 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eduard is actually talking about things happening inside one Squid process.

That is correct. No other clients can attach during destruction process, because no async calls start processing during it. Also my supposition was that if Squid worker is suspended by OS at the beginning of StoreEntry destruction, it will resume executing from that very point of code, finishing the destruction process.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alex: it is impossible for a new Store client to attach to a being destroyed StoreEntry object (i.e., attach during destroyStoreEntry()).

Eduard: No other clients can attach during destruction process

For the record, both statements above talk about necessarily local (i.e., same-process) clients attaching to the StoreEntry object (which is always local!). Store clients in other workers (i.e., "remote" clients) may attach to the (shared) Transients entry and/or other StoreEntry objects in their processes, but that is a different issue.

@rousskov rousskov changed the title SMP Squid with SMP-aware caches sometimes does not purge cache entries. SMP caches sometimes do not purge cache entries Aug 30, 2017
@eduard-bagdasaryan
Copy link
Contributor Author

@rousskov , I addressed last discussed comments, but I can't set it to S-waiting-for-reviewer.

@rousskov rousskov added S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box and removed S-waiting-for-author author action is expected (and usually required) labels Aug 30, 2017
Copy link
Contributor

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a couple of problems. The rest is polishing. I hope we are getting close, but more changes are needed.

Also, if possible, please assert that RELEASE_FLAG is not set inside clearPrivate(), confirming that the private key setting is permanent.

// there will be no more updates from us after this, so we must prevent
// future readers from joining
map->freeEntry(e.mem_obj->xitTable.index); // just marks the locked entry
map->closeForWriting(e.mem_obj->xitTable.index);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment above is still correct, but instead of freeing the entry, we are now using complete() test to prevent new readers from joining. Let's clarify what is going on and document a dependency on our get() logic:

        // There will be no more updates from us after this, so we must prevent 
        // future readers from joining. Making the entry complete() is sufficient
        // because Transients::get() does not return completed entries.
        map->closeForWriting(e.mem_obj->xitTable.index);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done at a40bbf6.

{
unlink(e);
assert(e.key);
e.hasTransients() ? unlink(e) :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not use tertiary operator as an if-statement replacement. Use an if-statement. A tertiary operator imposes certain requirements on its second and third arguments, and ignoring its result may trigger compiler warnings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A tertiary operator imposes certain requirements on its second and third arguments, and ignoring its result may trigger compiler warnings.

In this particular case these 'requirements' should be satisfied and no warnings should be generated. Both arguments result is 'void' so the result of the operator is also 'void'.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I know, but the return type(s) may change. The current void types are not a part of some natural interface invariant for those methods. The if statement would not be affected by such changes. A tertiary operator offers no advantages here, so I am puzzled why you are spending time to defend its use.

Copy link
Contributor Author

@eduard-bagdasaryan eduard-bagdasaryan Aug 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I know, but the return type(s) may change.

And in this case we would need to fix our if/else, prefixing methods with (void) methods ...

I am puzzled why you are spending time to defend its use.

I am not going to defend this minor issue anymore. In general (if everything is equal), I would prefer one line of the ternary operator to a clumsy equivalent four lines of if/else.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And (if the return types change), we would need to fix our if/else, prefixing methods with (void) methods ...

Not necessarily. The if statement will continue to work and read correctly if the method(s) gain a an optional return value. If the return value is required/important, then the if statement will need to be changed, but the changes are very likely to be much smaller in this case (because the two methods are too different to return the same kind of a required/important result).

In general, I prefer using separate lines for conditions and actions, which results in a one line difference between a tertiary operator and an if statement. Removing that extra else line is not worth the problems that tertiary operator brings.

AFAICT, the tertiary operator is meant for obtaining a result from either of the two different algorithms. The code which does not care about the result should not use the operator that does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done at a40bbf6.

Transients::unlink(StoreEntry &e)
{
if (e.mem_obj && e.mem_obj->xitTable.io == MemObject::ioWriting)
if (e.mem_obj)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks unfinished. The old code was (indirectly) protecting abandon() from entries that do not have xitTable.index set. The new code does not and may result in asserts if unlink() is called without a hasTransients() check. I see two correct models:

  1. All callers must check hasTransients() before calling unlink().
  2. Unlink() must check hasTransients().

AFAICT, this PR is using approach 1 right now. Following that approach means not checking e.mem_obj either! If you agree and this PR continues to follow approach 1 (see below), then please remove that check as well. Otherwise, unlink() has to check hasTransients() and call markForUnlink() if there are none (approach 2).

I am not sure which approach is the best, but it bothers me that unlink() is using approach 1 while markForUnlink() is using approach 2. Approach 2 seems to be required for markForUnlink() because we must mark the entry even if it is being created (i.e., locked for writing) right now, correct? Should not the same logic apply to unlink()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, unlink() has to check hasTransients() and call markForUnlink() if there are none (approach 2).

After these changes I noticed that unlink() and markForUnlink() became identical. So I made unlink() private(which now only calls markForUnlink()) and use markForUnlink() in that single place instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done at a40bbf6.

debugs(47, 5, e);
map->freeEntry(e.swap_filen);
assert(e.key);
e.hasDisk() ? map->freeEntry(e.swap_filen) :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use an if statement instead of the tertiary operator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done at a40bbf6.

virtual uint64_t currentCount() const override { return n_disk_objects; }
virtual ConfigOption *getOptionTree() const override;
virtual bool smpAware() const override { return false; }
virtual bool hasReadableEntry(const StoreEntry &) const override { return false; };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should document why this method always returns false? Something like this, perhaps:

/// as long as ufs relies on the global store_table to index entries,
/// it is wrong to ask individual ufs cache_dirs whether they have an entry
virtual bool hasReadableEntry(const StoreEntry &) const override { return false; }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done at a40bbf6.

debugs(20, 3, "aborting unsyncable " << *collapsed);
collapsed->abort();
} else { // the entry is still not in one of the caches
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant this:

    if (inSync) {
        debugs(20, 5, "synced " << *collapsed);
        collapsed->invokeHandlers();
        return;
    } 

    if (found) { // unrecoverable problem syncing this entry
        debugs(20, 3, "aborting unsyncable " << *collapsed);
        collapsed->abort();
        return;
    }

    // the entry is still not in one of the caches
    debugs(20, 7, "waiting " << *collapsed);

if (e.hasDisk(i))
dir(i).unlink(e);
else
dir(i).unlinkByKeyIfFound(reinterpret_cast<const cache_key*>(e.key));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As already discussed elsewhere, with "approach 2", this inner if-statement would be replaced with just dir(i).unlink(e); and each cache dir will do either mark the (locked) entry for unlinking or unlink by key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done at a40bbf6.
Following it, I removed a hasDisk() check from storeSwapOutFileClosed() because the surrounding code assumes that the entry should be attached to a disk: it uses e->disk() without check. Probably there should be an assert(e->hasDisk()) near existing assert(e->swappingOut())?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not add a hasDisk() assert to protect disk() calls because those calls have that assert().

Since storeSwapOutFileClosed() does not manipulate raw disk indexes/pointers itself (only dumps them in stale debugs() messages), I think we are OK without an assert protecting such raw manipulations.

I consider this change request addressed.

virtual void markForUnlink(StoreEntry &e) = 0;

/// remove the entry from the store
/// Remove the entry from the store if possible
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/the entry/the matching entry/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done at a40bbf6.


/// Remove the entry from the store if possible
/// or mark it as waiting to be freed otherwise.
/// The entry must be present in the store.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With "approach 2", the "must be present" part will be gone.


if (Store::Root().markedForDeletionAndAbandoned(*this)) {
debugs(20, 3, "marked for deletion and abandoned");
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be MemObject::SwapOut::swImpossible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should. I think this bug is a consequence of code duplication throughout this method: swImpossible state is always followed by returning 'false'. I would suggest to fix this with something like:

static bool CreateImpossibleDecision(StoreEntry *e, const char *msg)
{
    debugs(20, 3, msg << " " << *e);
    swapOutDecision(MemObject::SwapOut::swImpossible);
    return false;
}

and the above block(and other similar checks) would transform into:

if (Store::Root().markedForDeletionAndAbandoned(*this))
   return CreateImpossibleDecision(this, "marked for deletion and abandoned");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not polish this now. I am concerned that either something went wrong in this method (before this PR) or the method itself needs to be redesigned again (outside of this PR) because false always means swImpossible. Addressing that concern is not trivial and is outside this PR scope.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done without polishing at a40bbf6.

@rousskov rousskov added S-waiting-for-author author action is expected (and usually required) S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box and removed S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box S-waiting-for-author author action is expected (and usually required) labels Aug 31, 2017
@rousskov
Copy link
Contributor

I approved the changes I requested. I do not like the final unlink()/markForUnlink() state. I think we should polish that split. However, that polishing can be done separately, and a separate polish may reduce overheads. @eduard-bagdasaryan, please proceed with rebasing the branch and doing the (hopefully) last batch of comprehensive tests.

@yadij, please approve the changes you requested if Eduard has indeed addressed them per #46 (comment)

@eduard-bagdasaryan
Copy link
Contributor Author

Also, if possible, please assert that RELEASE_FLAG is not set inside clearPrivate(), confirming that the private key setting is permanent.

s/RELEASE_FLAG/RELEASE_REQUEST/
This is possible, I think: clearPrivate() usage implies that the flags at the moment if its call is either already checked with the same assertion (StoreEntry::setPublicKey()), or checked that the entry is not private (StoreEntry::clearPublicKeyScope()). The entry must not have RELEASE_REQUEST without being KEY_PRIVATE.
Done at 3935345.

@eduard-bagdasaryan
Copy link
Contributor Author

@rousskov, please review the latest revision.

Copy link
Contributor

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a couple of more problems and added a few questions and polishing remarks as well. Please see the code comments for details.

}
#endif
entry->release();
// entry->release() notifies other waiting workers, including this one
Copy link
Contributor

@rousskov rousskov Sep 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/other waiting workers/waiting workers/

Also, the opening curly brace should go on the same line as the if-statement (it does in the original code). Please check that you have run the source code formatting script on these changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

const auto idx = entry.mem_obj->xitTable.index;
const auto &anchor = collapsedWriter(entry) ?
map->writeableEntry(idx) : map->readableEntry(idx);
aborted = EBIT_TEST(anchor.basics.flags, ENTRY_ABORTED);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anchor.basics.flags is not atomic but may be updated by writer's StoreMap::abortWriting() calls, so it is wrong to read it without holding an exclusive lock. For example, when collapsedWriter() is false, we do not have an exclusive lock and might try to read anchor.basics.flags when another worker (writer) is doing StoreMap::abortWriting(). To properly fix this, we would probably have to add a second dedicated atomic flag to StoreMapAnchor, like waitingToBeFreed, and use it to mark and detect aborted map entries. The ENTRY_ABORTED flag in StoreMapAnchor will be unused (like it was before these changes). Any better ideas?

Can we place that second aborted flag so that sizeof(StoreMapAnchor) does not change on 32bit and 64bit platforms?

The rest of StoreMapAnchor::basics was and remains constant when readers are allowed AFAIK, but this PR started changing flags...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any better ideas?

Having two duplicated flags looks not very well. Moreover, what if in future we would need to read one more 'flag' from anchor.basics.flags in another reader? Again add such 'atomic' duplicate?

I would like to suggest another approach:

aborted = false;
// Since a shared entry gets ENTRY_ABORTED only within StoreMap::abortWriting(),
// (where anchor.writing() becomes false), we should not check this value before.
// Also note that trying to read ENTRY_ABORTED for being written shared entry
// may cause an unpredictable result, because anchor.basics.flags is not 'atomic'.
if (!anchor.writing() && EBIT_TEST(anchor.basics.flags, ENTRY_ABORTED))
      aborted = true;
waitingToBeFreed = anchor.waitingToBeFreed;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having two duplicated flags looks not very well

I think we can avoid any duplication of flags in my proposal: The new atomic flag will be used without any relationship or reference to the unused(?) same-name bit in basics.flags. And we can name the new atomic flag writerGone or something like that to minimize the implication that the two are flags are duplicated.

Moreover, there is actually some extra danger in using ENTRY_ABORTED like we do in the current PR code. I suspect that some higher level code may assert if it gets an ENTRY_ABORTED entry from the cache. Our code probably copies shared basic.flags into local StoreEntry objects blindly, which (in rare cases) might result in a cache reader getting a ENTRY_ABORTED StoreEntry hit and asserting. The code should handle later hit aborts correctly (because they can always happen on incomplete entries), but finding an already aborted hit in the cache, especially for a "complete" entry, may be problematic. This concern alone may be a sufficient reason to not use ENTRY_ABORTED for our purposes, even if the atomicity issue does not exist!

Finally, we should consider removing basics.flags as effectively unused, but probably not in this PR because that decision is not trivial.

// Since a shared entry gets ENTRY_ABORTED only within StoreMap::abortWriting(),
// (where anchor.writing() becomes false), we should not check this value before.
if (!anchor.writing() && EBIT_TEST(anchor.basics.flags, ENTRY_ABORTED))
      aborted = true;

This trick misses the (rare) case where the flag has been set (before the check) but the write lock has not been released (until after the check). This problem is not critical because the affected reading worker should receive another notification later, when the writer finishes updating its state. At that time, the reader will detect the aborted entry correctly. Thus, we can get away with this by adding an XXX.

The question is: Why should we use tricks and add XXXs when we can use an atomic flag instead? I do not think we should if the earlier discussion addresses your "duplicated flags" concern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done at 8cd8303. Does it meet your expectation?

Can we place that second aborted flag so that sizeof(StoreMapAnchor) does not change on 32bit and 64bit platforms?

I placed it just after waitingToBeFreed. I assume that since sizeof(waitintToBeFreed) == sizeof (writerHalted) == 1, these fields are not aligned. The following data field (uint64_t key[2]) should be aligned with 3-byte offset before changes and with 2-byte offset after changes. The sizeof(StoreMapAnchor) should be the same in both cases, I think. In my environment it is 96 bytes.

Store::Controller::syncCollapsedWriter(StoreEntry &collapsed)
{
assert(transients);
bool aborted = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: This change request is probably not relevant because syncCollapsed() restructuring discussed in another change request should make these polishing touches unnecessary.


s/aborted/abortedByUs/ to restore symmetry with syncCollapsedReader().

Also, please add a comment to confirm and explain why this variable is unused. For example:

// keeps Transients::status() API simple; unused because we already
// know whether we are aborted (so this flag does not need syncing)
bool abortedByUs = false;

You can write know we are not aborted instead because syncCollapsed() checks that, but I would not introduce that unnecessary implied dependency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipped as irrelevant.

debugs(20, 7, "not SMP-syncing not-transient " << xitIndex);
return;
if (waitingToBeFreed) {
debugs(20, 3, "releasing writing " << collapsed << " due to waitingToBeFreed shared status");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot really release a being-written entry so I would

s/releasing/marking/
or
s/releasing/will release/

I would also drop "shared status" for brevity sake.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipped as irrelevant.

return;
if (waitingToBeFreed) {
debugs(20, 3, "releasing writing " << collapsed << " due to waitingToBeFreed shared status");
collapsed.release(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check and release() call are missing for the syncCollapsedReader() case. There is a similar check calling abort() there, but that check does not cover the non-aborting cases AFAICT.

When I imagine the resulting (fixed) code, it feels like we should restructure Store::Controller::syncCollapsed() to reduce the chance of similar future bugs and to reduce code duplication. A single method like this may work better:

...
debugs(20, 7, "syncing " << *collapsed);

bool abortedByWriter = false;
bool waitingToBeFreed = false;
transients->status(collapsed, abortedByWriter, waitingToBeFreed);

if (waitingToBeFreed) {
    debugs(20, 3, "will release " << collapsed << " due to waitingToBeFreed");
    collapsed.release(true); // may already be marked
}

if (transients->collapsedWriter(*collapsed))
    return; // readers can only change our waitingToBeFreed flag

assert(transients->collapsedReader(*collapsed));

if (abortedByWriter) {
    debugs(20, 3, "aborting " << collapsed << " because its writer has aborted");
    collapsed.abort();
    return;
}

... the rest of the current syncCollapsedReader() code follows ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

/// free the entry if possible or mark it as waiting to be freed if not
void freeEntry(const sfileno fileno);
/// \param stateChanged when provided, true if the entry state changes, false otherwise
void freeEntry(const sfileno fileno, bool *stateChanged = nullptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest a simpler interface:

/// free the entry if possible or mark it as waiting to be freed if not
/// \returns whether the entry was neither empty nor marked
bool freeEntry(const sfileno fileno);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did similarly at first, but almost all (~15) freeEntry() calls end up with ignoring its return value. We would have to prefix them with (void), which looks ugly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

almost all freeEntry() calls end up with ignoring its return value. We would have to prefix them with (void)

I do not think we have to or should prefix those calls with (void). That prefix is useful or needed when you want to emphasize that a usually important return value is ignored on purpose rather than by accident. In this case, the return value is usually unimportant (because it does not indicate a success or failure -- the method cannot fail) and, hence, can be silently ignored by most callers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

freeChain(fileno, s, false);
else
s.waitingToBeFreed = true; // mark to free it later
if (stateChanged)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid returning true for the already marked and/or already free entries, we can do something like this:

if (s.lock.lockExclusive()) {
    const bool result = !(s.waitingToBeFreed || s.empty());
    freeChain(...);
    return result;
}

// mark to free the locked entry later (if not already marked)
bool expected = false;
return s.waitingToBeFreed.compare_exchange_strong(expected, true);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.
I used !s.waitingToBeFreed && !s.empty() instead, which looks simpler for me.

{
debugs(20, 3, storeKeyText(key));

if (markedForDeletion(key))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a debugs() here, to match the other lines in this method and to provide an explanation to admins trying to understand why the previous cached object does not produce hits:

if (markedForDeletion(key)) {
    debugs(20, 3, "ignoring marked " << storeKeyText(key));
    return nullptr;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

debugs(20, 3, "got in-transit entry: " << *entry);
entry->release(true);
// The entry should be locked (but see find()). Return here because the
// release of a locked entry should take care of all its stores.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we care whether the entry is locked here? StoreEntry::release() takes care of all its stores regardless of whether the entry is locked, right? If this comment is stale or just wrong, I would do:

entry->release(true); // takes cares of all its stores if any
return;

However, the "all of its stores" logic implies that release() may ignore Store entries that are not attached to the entry object. On the other hand, Controller::unlinkByKeyIfFound() should mark all matching store entries in existence, even those not currently associated with any StoreEntry object. In other words:

  • The above if-statement takes care of the (matching) store entries attached to the matching local StoreEntry object.
  • The code below that if-statement takes care of the matching store entries when there is no matching local StoreEntry object.
  • Who will take care of the matching store entries unattached to the matching local StoreEntry object? I believe such entries may exist because, for example, one worker could be writing a response to disk while another worker may be writing the same-key response to the memory cache. Can Transients guarantee that these two independent concurrent writes can never happen? I do not think so: Transients may not be able to create a shared entry to coordinate those two workers because, for example, the required Transients slot in shared memory may already be locked by another key due to a benign hash collision. And there may be cases other than two concurrent different-worker writes where a local matching StoreEntry coexists with matching but unattached store entries.

If my concern is valid, then we cannot return from the above if-statement. We have to continue to the code below. We still need to if-statement to keep any local StoreEntry objects in sync with Store changes, of course.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, the "all of its stores" logic implies that release() may ignore Store entries that are not attached to the entry object.

Looks like this comment is not precise: release() should mark not attached entry for deletion from storages via Controller::markForUnlink(), and corresponding markForUnlink() storage methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, Controller::unlinkByKeyIfFound() is only called from purgeEntriesByUrl() with precondition that Store::Root().get() returned nil. That makes me thinking that the "if" block in Controller::unlinkByKeyIfFound() is not necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code: release(true); // takes cares of all its stores if any

Alex: the "all of its stores" logic implies that release() may ignore Store entries that are not attached to the entry object

Eduard: release() should mark not attached entry for deletion from storages via Controller::markForUnlink

Whether release() should unlink unattached entries is actually debatable, but, for this discussion, let's assume that release() does unlink unattached entries.

Eduard: Looks like [Alex] comment is not precise

My comment seems OK, but the source code comment implies something that it should not imply. In other words, if release() unlinks unattached entries, then we need to fix the comment. Something like this may work better:

// unlink all Store entries with entry->key, even unattached ones
entry->release(true);
return;

And to minimize similar doubts in the future, we should improve the release() description as well. For example, something like this would be better than the current text IMO (we are running rather thin on Store-related terminology, but let's not try to fix that here, in a method description):

/// Removes all unlocked (and marks for eventual removal all locked) Store
/// entries, including attached and unattached entries that have our key.
/// Also destroys us if we are unlocked or makes us private otherwise.
virtual void release(const bool shareable = false);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Controller::unlinkByKeyIfFound() is only called from purgeEntriesByUrl() with precondition that Store::Root().get() returned nil. That makes me thinking that the "if" block in Controller::unlinkByKeyIfFound() is not necessary.

AFAICT, that thinking does not match the code because Root().get() can return nil when store_table has a matching entry. See the markedForDeletion() check in Controller::find().

And, more importantly, we should not make nil get() a hidden precondition for calling Root().unlinkByKeyIfFound() because some caller will eventually forget to check that. Your comment and several similar earlier discussions all suggest to me that we need to revise the unlink/markForUnlink API. I hope we can do that after this PR. For example, it seems likely that a chunk of the current purgeEntriesByUrl() logic should go directly inside Controller::unlinkByKeyIfFound() instead of appearing as preconditions for (or alternative actions to) that method call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: fixed comments, the code is the same.


entry->unlock("purgeEntriesByUrl");
} else {
Store::Root().unlinkByKeyIfFound(key); // does not broadcast but there are no waiting workers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The argument is valid IMO, but please s/are/were/ to make it a bit more accurate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@rousskov rousskov added the S-waiting-for-author author action is expected (and usually required) label Sep 13, 2017
@eduard-bagdasaryan
Copy link
Contributor Author

@rousskov , I addressed your latest remarks.There is a couple of not yet fully discussed questions.

return result;
}

// TODO: check whether 'empty' anchors can be still locked.
Copy link
Contributor

@rousskov rousskov Sep 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, any anchor/entry can be locked. Locking does not depend on the entry state (other than the state of the lock itself, of course, but the lock can/should be viewed as being "outside" the entry itself).


entry->unlock("purgeEntriesByUrl");
} else {
Store::Root().unlinkByKeyIfFound(key); // does not broadcast but there were no waiting workers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is still misleading or even wrong. There could be workers working on an entry with key, of course. What we are trying to say here is that we should not worry about ignoring them. Why? I believe we can safely ignore such workers because either the entry has already been marked for deletion (and so another notification would make no difference) or Controller::find() failed to create new or join an existing Transients entry (and so we would not be able to notify anybody anyway). If my logic is correct, then let's replace this comment with something like this:

// The call below does not notify workers but that is OK here because nil
// Store::Root().get() guarantees that either the matching entry has already
// been marked (so new notifications are useless) OR get() could not create
// (or join) a Transients entry (so notifications are impossible).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// The call below does not notify workers but that is OK here because nil
// Store::Root().get() guarantees that either the matching entry has already
// been marked (so new notifications are useless) OR get() could not create
// (or join) a Transients entry (so notifications are impossible).

I do not mind this in general, however note that if get() returns nil that only means that "it could join a Transients entry" (as well as any other of shared or local storages), we don't know whether the entry was deleted or have not been created. Also, AFAIK, get() does not create Transients entries: it is done by Controller::allowCollapsing() and Transients::startWriting().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if get() returns nil that only means that "it could join a Transients entry"

I assume you meant "could not join a Transients entry".

(as well as any other of shared or local storages)

Right, but those other stores do not matter as far as notifications are concerned, and the comment we are discussing is about notifications.

get() does not create Transients entries

Good point! It only creates StoreEntries from existing Transient entries.

Another problem with that comment is that it is actually possible to broadcast a notification even if we could not join an existing Transients record -- broadcast messages just need xitTable.index value which we could compute. Such broadcasts would be useless because if we could not join then nobody else could, and so there are no readers to broadcast to (and the Transients writer does not care about the delete broadcasts, right?).

Here is a fixed(?) comment:

// The call below does not notify workers but that is OK here because nil
// Store::Root().get() guarantees that the matching Transients entry does
// not exist (so there are no readers to notify), could not be joined yet
// (ditto), or has already been marked (so new notifications are useless).

However, all the conditions in the above comment were true only at the get() time. They may no longer be true at the unlinkByKeyIfFound() time! What happens if a new reader comes and manages to attach to the Transients entry after our failed get() but before our unlinkByKeyIfFound() call? Will that reader have to wait for the next store write to realize that it can never attach to the stores and has to abandon the current transaction? If yes, is there a simple way to improve this?

Perhaps unlinkByKeyIfFound() should broadcast using raw Transients index but it feels like we should just reorder the markings instead:

  1. Mark stores (not Transients!) by key.
  2. Root().get().
  3. If we got an entry, do the usual entry->release(), with Broadcast().
  4. Otherwise, do nothing.

What do you think?

Copy link
Contributor Author

@eduard-bagdasaryan eduard-bagdasaryan Sep 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if a new reader comes and manages to attach to the Transients entry after our failed get() but before our unlinkByKeyIfFound() call?

BTW, similar logic should be applied to the writer itself: an entry may be locked for Transients writing (without appending) at the time of get() but become 'appending' (and available for another get()) just before unlinkByKeyIfFound(). Then the skipped broadcasting would not release() that local writer entry (Controller::syncCollapsed()).

but it feels like we should just reorder the markings instead

What is the need of (1)? If Root.get() works for us(Transients is not marked) why it will not work for others? I.e., If shared Transients entry is unavailable at (2) but becomes available just before (3) wont we face same problems?

Perhaps unlinkByKeyIfFound() should broadcast using raw Transients index

This would be more reliable solution, I assume.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar logic should be applied to the writer itself: an entry may be locked for Transients writing (without appending) at the time of get() but become 'appending' (and available for another get()) just before unlinkByKeyIfFound(). Then the skipped broadcasting would not release() that local writer entry (Controller::syncCollapsed()).

Agreed. The writer problem cannot be (easily) solved by broadcasting AFAICT. Instead, right before switching to appending mode, the writer must check waitingToBeFreed and release() its entry. If that code is missing, please add it. I think there is already similar code somewhere.

What is the need of (1)?

Step 1 calls unlinkByKeyIfFound() for memory and disk stores (if any).

If Root.get() works for us(Transients is not marked) why it will not work for others?

Root.get() will work for others as well, but they will get our notification and will be fine. They will not attach themselves to stores because we marked those first (in step 1).

If shared Transients entry is unavailable at (2) but becomes available just before (3) wont we face same problems?

We will! Glad you found this bug. I also realized that my step 4 was missing unlinkByKeyIfFound() for Transients, but adding that would not fix this bug.

Also, I really do not want to broadcast based on unlocked Transient index values because key:index mapping is (or will be) unreliable (if StoreMap supports hash collision resolution).

I do not have any good solutions to offer right now. Back to the drawing board... If you can offer something that addresses all these problems, please do!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested latest adjustments with our Daft test. One problem occasionally happens: when several clients request a disk-cached entry, Squid returns unexpected "503 Service Unavailable" responses instead of "200" hits. Cache.log discovery showed that the reason is Transients entry creation collision: two workers simultaneously find the entry in Rock::SwapDir::get(), create StoreEntry objects and try to create Transients entry, locking it for writing. Unexpected 503s looks like the test imperfection, in reality we will get unexpected cache misses in this case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad our tests are capable of exposing such hidden problems!

In general, it is acceptable to return a miss when we cannot guarantee reliable hit delivery. A hit test must either accommodate/tolerate/retry such misses or avoid them.

However, we should also avoid making hit delivery unreliable without a good reason. In the context of your "concurrent hits" example, I do not see such a reason (from an independent reviewer point of view; ignoring internal problems of our own making). There are two different cases to consider here:

  1. A complete() store entry. AFAICT, such an entry does not need a Transients entry. There is no need to signal other workers, even when handling DELETE or similar purging events.
  2. An in-complete() store entry. Such an entry must already have a Transients entry so there should be no creation conflicts at hit time.

If the above analysis is correct, then we should relax the earlier "every active Store entry must have a corresponding Transients entry" rule to "every Store entry being written must have a corresponding Transients entry", right? A store reader must still attach to the existing Transients entry (if any), of course, but it should not try to create that Transients entry.

If you agree with the above analysis, please make the necessary code adjustments. Otherwise, please discuss what I have missed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A complete() store entry. AFAICT, such an entry does not need a Transients entry. There is no need to signal other workers, even when handling DELETE or similar purging events.

I am thinking about possible problems with scenario:

  1. There is a disk-cached entry.
  2. Two clients request it at the same time, only one of them manages to create Transients entry.
  3. The two clients start receiving cache hit.
  4. DELETE comes on another worker, marking the the shared cache and releasing only the first StoreEntry(via notification).
  5. The first client finishes getting the response and the corresponding Transients entry is freed. Our Store::Controller::markedForDeletion() protection stops working since now.
  6. Another client hits the the second worker, thus accessing its local still active StoreEntry (and ignoring previous DELETE request).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Two clients request an existing disk-cached entry at the same time, only one of them manages to create Transients entry.

With the adjusted rule, none of the reading workers will try to create a Transients entry.

  1. Another client hits the the second worker, thus accessing its local still active StoreEntry (and ignoring previous DELETE request).

This concern is valid (and we do not even need two clients in step 2 to trigger this problem!).

I see two ways to fix this problem:

A. Go back to the original "all active store entries have Transients entries" rule and accept the increased probability of rejecting perfectly valid cache hits.

B. Stay with the reduced "store writers must create Transients entries" rule but adjust the existing Controller::markedForDeletion() method to return true for keys that have local StoreEntry objects reading store entries marked for deletion. Please note that such StoreEntry objects will have a lock on the store entries they are reading.

I think solution B is preferable. Does it miss any use cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stay with the reduced "store writers must create Transients entries" rule

be9511bb implements this suggestion. While re-running the test, the above problem has gone, but a new, similar one appeared. I found the source of that new problem and posted the fix. After that, the test succeeded for both shared memory and disk cache.

@rousskov
Copy link
Contributor

There is a couple of not yet fully discussed questions.

I believe I answered all the pending questions. I also added a few additional (minor) change requests. I think we are getting really close to being done here. The ball is on your side.

Copy link
Contributor

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice choice for the new flag name -- much better than writerGone I have suggested.

I added a few minor change requests for the last commit but it looks good to me overall.

The only big issue remaining is the purgeEntriesByUrl() race condition discussed a bit earlier: #46 (comment)


public:
mutable ReadWriteLock lock; ///< protects slot data below
std::atomic<uint8_t> waitingToBeFreed; ///< may be accessed w/o a lock
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not change this type unless we have to for some reason. The correct type here may actually be std::atomic_bool, but this PR is not about improving that type and such improvements are not trivial.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still open.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this old problem to the new review: Please do not change this type unless we have to for some reason. The correct type here may actually be std::atomic_bool, but this PR is not about improving that type and such improvements are not trivial.

mutable ReadWriteLock lock; ///< protects slot data below
std::atomic<uint8_t> waitingToBeFreed; ///< may be accessed w/o a lock
std::atomic<bool> waitingToBeFreed; ///< may be accessed w/o a lock
/// true when StoreMap::abortWriting() was called but there were still readers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can simplify because we do not need to document the free entry state:

std::atomic<uint8_t> writerHalted; ///< whether abortWriting() has been called

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still open.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this old problem to the new review: I think we can simplify because we do not need to document the free entry state:

std::atomic<uint8_t> writerHalted; ///< whether abortWriting() has been called

const auto idx = entry.mem_obj->xitTable.index;
const auto &anchor = collapsedWriter(entry) ?
map->writeableEntry(idx) : map->readableEntry(idx);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this empty line: All the lines above and below it are about extracting those two flags.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still open. Please also remove the new empty line in Transients::copyFromShm().

(http://lists.squid-cache.org/pipermail/squid-dev/2017-July/009220.html):

* MemStore::updateCollapsed(): restored old behavior with returning
  true when the given collapsed entry is disconnected from he cache.
* unlinkByKeyIfFound(): do nothing if the 'map' is null.
* Corrected several methods documentation and code comments.

Still not addressed:

* Whether(and how) remove RELEASE_REQUEST flag direct manipulation for
  a just created StoreEntry.
Before this change, the code abused RELEASE_REQUEST flag setting,
violating general 'releaseRequest()' API.

For now, RELEASE_REQUEST flag means an (locked) entry marked for
removal. As a consequence, such StoreEntry becomes not reachable for new
store clients. On the other hand, if a StoreEntry is private, it is not
usable by new store clients also. This patch aggregates these two
related notions: all work of making StoreEntry 'unusable for new store
clients' is performed now inside StoreEntry::setPrivateKey().
RELEASE_REQUEST flag now gets a proper meaning: it means a private
StoreEntry which can't become public anymore. In other words, this new
meaning implies that after setting RELEASE_REQUEST (making the
StoreEntry 'permanently' private), the other code must not clear
that flag (and change the key from private to public).  Some existing
EBIT_CLR(RELEASE_REQUEST) lines were in the way. It was carefully
analysed that removing them is safe (i.e., there are no scenarios when
StoreEntry has the flag set at these points).
Keeping to this new meaning introduced at 9cfa044, simplified with a new
StoreEntry::permanentlyPrivate().
Copy link
Contributor

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am done fixing and polishing this PR branch. I will freshen up the description and am asking @eduard-bagdasaryan to review my commits, but if there are no surprises, then this code can be finally committed.

@rousskov rousskov changed the title SMP caches sometimes do not purge cache entries Bug 4505: SMP caches sometimes do not purge entries Jan 3, 2018
@rousskov
Copy link
Contributor

rousskov commented Jan 3, 2018

Some [Jenkins] checks were not successful

After checking a few, I am going to assume that all build failures were those mysterious "reference is not a tree" Jenkins plugin(?) failures that I keep asking @kinkie about. I am going to ignore them in hope that the next build will magically avoid them.

@rousskov rousskov removed the S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box label Jan 3, 2018
@yadij
Copy link
Contributor

yadij commented Jan 17, 2018

There are some real build errors remaining to be fixed in the bb533f0 test results. Please can one of you attend to those and get this merged into master ASAP. This is now not going into the Jan v4 beta, but I would really like it to have as long a period of use as possible for the Feb beta.

@eduard-bagdasaryan
Copy link
Contributor Author

Looks like these errors were caused by the fact that only a couple of SwapDir virtual members are overridden. We should override all such methods or none of them.

@rousskov
Copy link
Contributor

@eduard-bagdasaryan, please remove those inconsistent overrides while you are reviewing the changes.

* Removed unused EntryGuard::releaseLocked().

* Protected EntryGuard destructor from the throwing code inside of it.
... and honor this rule in Transients::evictCached().
@eduard-bagdasaryan
Copy link
Contributor Author

eduard-bagdasaryan commented Jan 26, 2018

Checking merge bot, attempt 2.

The removing of writing state from Transients was incorrect: there are
several places where writers and readers behave differently. For
example, broadcasting should only work when there are readers and should
not work when there is only writer.  Another example is checking for
transient readers (giving incorrect results) inside
cheCheckQuickAbortIsReasonable().
* Fixed Transients::addEntry() comment
* Fixed wrong argument passing to StoreEntry::release()
* Removed wrong statement in Controller::transientsCompleteWriting()
Before this change (and after 5c09e58) a Transients entry sometimes
could not be properly evicted using Transients::evictCached() because
the StoreEntry had already been disconnected from the Transients cache,
while evicting shared memory and destroying memory object.
To overcome this, we need to evict the entry from Transients before
destroying memory object.
It may seem that doing so would violate the "transients invariant"
(every public StoreEntry must have the corresponding Transients entry),
for a very short period of time. However this is not a problem,
because the StoreEntry is not accessible via Controller interface
after it has been evicted form Transients.
@rousskov
Copy link
Contributor

Status update: @eduard-bagdasaryan is investigating one potential problem discovered during his last review. Once he addresses that concern, I plan to merge these changes.

@rousskov
Copy link
Contributor

rousskov commented Feb 2, 2018

Status update: Problem confirmed. The fix is being tested.

StoreEntry::release() (during disk rebuild) and storeGetMemSpace() were
purging unlocked entries from the memory cache without abandon()ing
them. Purging an entry from a memory cache destroys mem_obj and, hence,
its Transients (if any). Such silent Transients loss violates the
public=hasTransients invariant and leads to bugs, especially if
StoreEntry was not cached in memory and, hence, did not broadcast its
eviction.

Also do not memoryEvictCached() in StoreEntry::release() because all
memoryEvictCached() calls are dangerous, and that call was not needed as
releaseRequest() yields to a (safe and broader) evictCached() call.

Dangerous purgeMem() is gone, and memoryEvictCached() is now private.

Also possibly fixed purging of incomplete on-disk entries that managed
to lose their mem_obj (in non-SMP caching environment):
Controller::handleIdleEntry() was not releasing such entries because it
relied on purgeMem() that did nothing without a mem_obj.
@rousskov
Copy link
Contributor

rousskov commented Feb 2, 2018

Tests failed with the usual reference is not a tree Jenkins plugin error. I suspect Jenkins GitHub integration plugin (or something related to it) has a bug where target branch changes (at certain sensitive stages of the test execution) result in these failures. Resyncing the branch and retesting...

I plan to merge this PR if the tests are successful.

@rousskov
Copy link
Contributor

rousskov commented Feb 2, 2018

Three out of ten Jenkins tests failed with the usual reference is not a tree error (again).

  1. the error is not related to Squid code
  2. the same tests have all passed earlier without significant branch code changes
  3. other (random) tests have failed earlier for the same non-Squid reason
  4. successful tests cover both compilers in diverse environments
  5. GitHub is having trouble with this huge PR -- often unicorning this Conversation page; I suspect GitHub may have similar troubles when the PR is accessed by the Jenkins plugin using GitHub API.

Given the combination of the above facts, I plan to ignore similar future failures for this PR and merge this PR in a few hours.

P.S. I tried to restart just the failed Jenkins test runs, but could not find a way to do it. I then tried to delete the failed tests in hope that Jenkins will report an overall success to GitHub, but it looks like Jenkins deleted the whole build 312 (all tests) instead. Sigh.

@rousskov rousskov merged commit 4310f8b into squid-cache:master Feb 2, 2018
@rousskov
Copy link
Contributor

rousskov commented Feb 2, 2018

After triggering another round of tests, I forced-merged ignoring several Jenkins test failures. All failures were due to the same reference is not a tree error.

I expect these changes to trigger new and expose old bugs: While we did perform several rounds of fairly extensive lab testing, the vast scope and complexity of these changes virtually guarantee problems in some untested use cases.

The next big step is to fix Squid bug 4579 mentioned as an important XXX in this PR commit message. We also need to officially record a dozen out-of-scope problems discovered while working on this PR.

@rousskov rousskov removed the S-waiting-for-author author action is expected (and usually required) label Feb 2, 2018
squidadm pushed a commit to squidadm/squid that referenced this pull request Feb 13, 2018
When Squid finds a requested entry in the memory cache, it does not
check whether the same entry is also stored in a cache_dir. The
StoreEntry object may become associated with its store entry in the
memory cache but not with its store entry on disk. This inconsistency
causes two known problems:

1. Squid may needlessly swap out the memory hit to disk, either
   overwriting an existing (and identical) disk entry or, worse,
   creating a duplicate entry on another disk. In the second case, the
   two disk entries are not synchronized and may eventually start to
   differ if one of them is removed or updated.

2. Squid may not delete a stale disk entry when needed, violating
   various HTTP MUSTs, and eventually serving stale [disk] cache entries
   to clients.

Another purging problem is not caused by the above inconsistency:

3. A DELETE request or equivalent may come for the entry which is still
   locked for writing. Squid fails to get a lock for such an entry (in
   order to purge it) and the entry remains in disk and/or memory cache.

To solve the first two problems:

* StoreEntry::mayStartSwapout() now avoids needless swapouts by checking
  whether StoreEntry was fully loaded, is being loaded, or could have
  been loaded from disk. To be able to reject swapouts in the last case,
  we now require that the newer (disk) entries explicitly delete their
  older variants instead of relying on the Store to overwrite the older
  (unlocked) variant. That explicit delete should already be happening
  in higher-level code (that knows which entry is newer and must mark
  any stale entries for deletion anyway).

To fix problem squid-cache#3:

* A new Store::Controller::evictIfFound(key) method purges (or marks for
  deletion if purging is impossible) all the matching store entries,
  without loading the StoreEntry information from stores. Avoiding
  StoreEntry creation reduces waste of resources (the StoreEntry object
  would have to be deleted anyway) _and_ allows us to mark being-created
  entries (that are locked for writing and, hence, cannot be loaded into
  a StoreEntry object).

XXX: SMP cache purges may continue to malfunction when the Transients
table is missing. Currently, Transients are created only when the
collapsed_forwarding is on. After Squid bug 4579 is fixed, every public
StoreEntry will have the corresponding Transients entry and vice versa,
extending these fixes to all SMP environments.

Note that even if Squid properly avoids storing duplicate disk entries,
some cache_dir manipulations by humans and Squid crashes may lead to
such duplicates being present.  This patch leaves dealing with potential
duplicates out of scope except it guarantees that if an entry is
deleted, then all [possible] duplicates are deleted as well.

Fixing the above problems required (and/or benefited from) many related
improvements, including some Store API changes. It is impractical to
detail each change here, but several are highlighted below.

To propagate DELETEs across workers, every public StoreEntry now has a
Transients entry.

Prevented concurrent cache readers from aborting when their entry is
release()d. Unlike abort, release should not affect current readers.

Fixed store.log code to avoid "Bug: Missing MemObject::storeId value".

Removed Transients extras used to initialize MemObject StoreID/method in
StoreEntry objects created by Transients::get() for collapsed requests.
Controlled::get() and related Controller APIs do not _require_ setting
those MemObject details: get() methods for all cache stores return
StoreEntry objects without them (because entry basics lack Store ID and
request method). The caller is responsible for cache key collision
detection. Controlled::get() parameters could include Store ID and
request method for early cache key collision detection, but adding a
StoreQuery class and improving collision detection code is outside this
project scope (and requires many changes).

Found more cases where release() should not prevent sharing.
Remaining cases need further analysis as discussed in master 39fe14b.

Greatly simplified UFS store rebuilding, possibly fixing subtle bug(s).

Clarified RELEASE_REQUEST flag meaning, becoming 'a private StoreEntry
which can't become public anymore'. Refactored the related code,
combining two related notions: 'a private entry' and 'an entry marked
for removal'.

Do not abort collapsed StoreEntries during syncing just because the
corresponding being stored shared entry was marked for deletion. Abort
them if the shared entry has been also aborted.

Added StoreEntry helper methods to prevent direct manipulation of
individual disk-related data members (swap_dirn, swap_filen, and
swap_status). These methods help keep these related data members in a
coherent state and minimize code duplication.
Conflicts:
	src/MemObject.cc
squidadm pushed a commit to squidadm/squid that referenced this pull request Feb 13, 2018
When Squid finds a requested entry in the memory cache, it does not
check whether the same entry is also stored in a cache_dir. The
StoreEntry object may become associated with its store entry in the
memory cache but not with its store entry on disk. This inconsistency
causes two known problems:

1. Squid may needlessly swap out the memory hit to disk, either
   overwriting an existing (and identical) disk entry or, worse,
   creating a duplicate entry on another disk. In the second case, the
   two disk entries are not synchronized and may eventually start to
   differ if one of them is removed or updated.

2. Squid may not delete a stale disk entry when needed, violating
   various HTTP MUSTs, and eventually serving stale [disk] cache entries
   to clients.

Another purging problem is not caused by the above inconsistency:

3. A DELETE request or equivalent may come for the entry which is still
   locked for writing. Squid fails to get a lock for such an entry (in
   order to purge it) and the entry remains in disk and/or memory cache.

To solve the first two problems:

* StoreEntry::mayStartSwapout() now avoids needless swapouts by checking
  whether StoreEntry was fully loaded, is being loaded, or could have
  been loaded from disk. To be able to reject swapouts in the last case,
  we now require that the newer (disk) entries explicitly delete their
  older variants instead of relying on the Store to overwrite the older
  (unlocked) variant. That explicit delete should already be happening
  in higher-level code (that knows which entry is newer and must mark
  any stale entries for deletion anyway).

To fix problem squid-cache#3:

* A new Store::Controller::evictIfFound(key) method purges (or marks for
  deletion if purging is impossible) all the matching store entries,
  without loading the StoreEntry information from stores. Avoiding
  StoreEntry creation reduces waste of resources (the StoreEntry object
  would have to be deleted anyway) _and_ allows us to mark being-created
  entries (that are locked for writing and, hence, cannot be loaded into
  a StoreEntry object).

XXX: SMP cache purges may continue to malfunction when the Transients
table is missing. Currently, Transients are created only when the
collapsed_forwarding is on. After Squid bug 4579 is fixed, every public
StoreEntry will have the corresponding Transients entry and vice versa,
extending these fixes to all SMP environments.

Note that even if Squid properly avoids storing duplicate disk entries,
some cache_dir manipulations by humans and Squid crashes may lead to
such duplicates being present.  This patch leaves dealing with potential
duplicates out of scope except it guarantees that if an entry is
deleted, then all [possible] duplicates are deleted as well.

Fixing the above problems required (and/or benefited from) many related
improvements, including some Store API changes. It is impractical to
detail each change here, but several are highlighted below.

To propagate DELETEs across workers, every public StoreEntry now has a
Transients entry.

Prevented concurrent cache readers from aborting when their entry is
release()d. Unlike abort, release should not affect current readers.

Fixed store.log code to avoid "Bug: Missing MemObject::storeId value".

Removed Transients extras used to initialize MemObject StoreID/method in
StoreEntry objects created by Transients::get() for collapsed requests.
Controlled::get() and related Controller APIs do not _require_ setting
those MemObject details: get() methods for all cache stores return
StoreEntry objects without them (because entry basics lack Store ID and
request method). The caller is responsible for cache key collision
detection. Controlled::get() parameters could include Store ID and
request method for early cache key collision detection, but adding a
StoreQuery class and improving collision detection code is outside this
project scope (and requires many changes).

Found more cases where release() should not prevent sharing.
Remaining cases need further analysis as discussed in master 39fe14b.

Greatly simplified UFS store rebuilding, possibly fixing subtle bug(s).

Clarified RELEASE_REQUEST flag meaning, becoming 'a private StoreEntry
which can't become public anymore'. Refactored the related code,
combining two related notions: 'a private entry' and 'an entry marked
for removal'.

Do not abort collapsed StoreEntries during syncing just because the
corresponding being stored shared entry was marked for deletion. Abort
them if the shared entry has been also aborted.

Added StoreEntry helper methods to prevent direct manipulation of
individual disk-related data members (swap_dirn, swap_filen, and
swap_status). These methods help keep these related data members in a
coherent state and minimize code duplication.
Conflicts:
	src/MemObject.cc
yadij pushed a commit that referenced this pull request Feb 14, 2018
When Squid finds a requested entry in the memory cache, it does not
check whether the same entry is also stored in a cache_dir. The
StoreEntry object may become associated with its store entry in the
memory cache but not with its store entry on disk. This inconsistency
causes two known problems:

1. Squid may needlessly swap out the memory hit to disk, either
   overwriting an existing (and identical) disk entry or, worse,
   creating a duplicate entry on another disk. In the second case, the
   two disk entries are not synchronized and may eventually start to
   differ if one of them is removed or updated.

2. Squid may not delete a stale disk entry when needed, violating
   various HTTP MUSTs, and eventually serving stale [disk] cache entries
   to clients.

Another purging problem is not caused by the above inconsistency:

3. A DELETE request or equivalent may come for the entry which is still
   locked for writing. Squid fails to get a lock for such an entry (in
   order to purge it) and the entry remains in disk and/or memory cache.

To solve the first two problems:

* StoreEntry::mayStartSwapout() now avoids needless swapouts by checking
  whether StoreEntry was fully loaded, is being loaded, or could have
  been loaded from disk. To be able to reject swapouts in the last case,
  we now require that the newer (disk) entries explicitly delete their
  older variants instead of relying on the Store to overwrite the older
  (unlocked) variant. That explicit delete should already be happening
  in higher-level code (that knows which entry is newer and must mark
  any stale entries for deletion anyway).

To fix problem #3:

* A new Store::Controller::evictIfFound(key) method purges (or marks for
  deletion if purging is impossible) all the matching store entries,
  without loading the StoreEntry information from stores. Avoiding
  StoreEntry creation reduces waste of resources (the StoreEntry object
  would have to be deleted anyway) _and_ allows us to mark being-created
  entries (that are locked for writing and, hence, cannot be loaded into
  a StoreEntry object).

XXX: SMP cache purges may continue to malfunction when the Transients
table is missing. Currently, Transients are created only when the
collapsed_forwarding is on. After Squid bug 4579 is fixed, every public
StoreEntry will have the corresponding Transients entry and vice versa,
extending these fixes to all SMP environments.

Note that even if Squid properly avoids storing duplicate disk entries,
some cache_dir manipulations by humans and Squid crashes may lead to
such duplicates being present.  This patch leaves dealing with potential
duplicates out of scope except it guarantees that if an entry is
deleted, then all [possible] duplicates are deleted as well.

Fixing the above problems required (and/or benefited from) many related
improvements, including some Store API changes. It is impractical to
detail each change here, but several are highlighted below.

To propagate DELETEs across workers, every public StoreEntry now has a
Transients entry.

Prevented concurrent cache readers from aborting when their entry is
release()d. Unlike abort, release should not affect current readers.

Fixed store.log code to avoid "Bug: Missing MemObject::storeId value".

Removed Transients extras used to initialize MemObject StoreID/method in
StoreEntry objects created by Transients::get() for collapsed requests.
Controlled::get() and related Controller APIs do not _require_ setting
those MemObject details: get() methods for all cache stores return
StoreEntry objects without them (because entry basics lack Store ID and
request method). The caller is responsible for cache key collision
detection. Controlled::get() parameters could include Store ID and
request method for early cache key collision detection, but adding a
StoreQuery class and improving collision detection code is outside this
project scope (and requires many changes).

Found more cases where release() should not prevent sharing.
Remaining cases need further analysis as discussed in master 39fe14b.

Greatly simplified UFS store rebuilding, possibly fixing subtle bug(s).

Clarified RELEASE_REQUEST flag meaning, becoming 'a private StoreEntry
which can't become public anymore'. Refactored the related code,
combining two related notions: 'a private entry' and 'an entry marked
for removal'.

Do not abort collapsed StoreEntries during syncing just because the
corresponding being stored shared entry was marked for deletion. Abort
them if the shared entry has been also aborted.

Added StoreEntry helper methods to prevent direct manipulation of
individual disk-related data members (swap_dirn, swap_filen, and
swap_status). These methods help keep these related data members in a
coherent state and minimize code duplication.
Conflicts:
	src/MemObject.cc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants