Bug 4505: SMP caches sometimes do not purge entries#46
Conversation
yadij
left a comment
There was a problem hiding this comment.
looks mostly okay to me. Please fix the compile issues in stubs. The rest are optional polish requests and I will approve with just the compile ones fixed.
| (swap_dirn < 0) == (swap_status == SWAPOUT_NONE) && | ||
| (swap_dirn < 0 || swap_dirn < Config.cacheSwap.n_configured); | ||
|
|
||
| if (!ok) { |
There was a problem hiding this comment.
seems to be missing a WARNING label, unless it is not that important really and more deserving of a level 3+ debugs.
There was a problem hiding this comment.
Done with 'ERROR' label, since it is an 'error' rather than a 'warning' here.
| @@ -1766,19 +1778,22 @@ neighborsHtcpClear(StoreEntry * e, const char *uri, HttpRequest * req, const Htt | |||
| char buf[128]; | |||
|
|
|||
| for (p = Config.peers; p; p = p->next) { | |||
There was a problem hiding this comment.
This function can be polished up a bit more. Please consider making the p variable be defined as part of the loop definition like neighborsHtcpClearNeeded and the buf a static local inside the new if-condition.
src/store/Controller.h
Outdated
| virtual bool smpAware() const override; | ||
|
|
||
| /// Whether the entry with the key exists and was marked for removal | ||
| /// some time ago; get(key) will return NULL in such case. |
There was a problem hiding this comment.
please reference 'nil' or nullptr whichever is more descriptive instead of NULL.
src/store/Controller.h
Outdated
| virtual void sync() override; | ||
| virtual void maintain() override; | ||
| virtual void markForUnlink(StoreEntry &) override; | ||
| virtual void unlinkByKeyIfFound(const cache_key *key) override; |
There was a problem hiding this comment.
please follow the updated coding style in this file, not mentioning parameter names unless they are needed to document the parameter better than type or method name can. StoreEntry and cache_key are already descriptive enough types to convey the necessary detail.
Please also consider using the new style in the other new function and methods. I'm not asking for those others as a review item there since it is polish and they are situated in places not using the new style yet.
There was a problem hiding this comment.
Done. Also fixed some other new methods.
src/tests/stub_MemStore.cc
Outdated
| int64_t MemStore::maxObjectSize() const STUB_RETVAL(0) | ||
| bool MemStore::dereference(StoreEntry &) STUB_RETVAL(false) | ||
| void MemStore::markForUnlink(StoreEntry&) STUB | ||
| void MemStore::unlinkByKeyIfFound(const cache_key *key) STUB |
There was a problem hiding this comment.
please remove the parameter name here. it leads to compile error 'unused variable' on strict compilers.
same in new stub_store.cc and TestSwapDir.h lines.
There was a problem hiding this comment.
Done.
BTW, there are currently lots of unused variables in Squid sources. I assume this should be fixed first before we are going to test new patches with 'strict' compilers.
|
@yadij , I addressed your review comments and updated the PR. |
rousskov
left a comment
There was a problem hiding this comment.
I may have discovered a new release() problem, stated my recommendations regarding the syncCollapsed() TODO, and added a few minor polishing requests.
src/store/Controller.cc
Outdated
| assert(collapsed->mem_obj); | ||
| assert(collapsed->mem_obj->smpCollapsed); | ||
| // TODO: confirm whether this is reasonable: | ||
| // assert(collapsed->store_status == STORE_PENDING); |
There was a problem hiding this comment.
Eduard: what to do with STORE_PENDING check? I suspect that it should always store_status==STORE_PENDING inside Controller::syncCollapsed(), because syncing should be performed on not-yet completed/STORE_OK entry.
Ignoring the trivial case of findCollapsed() returning nil, syncCollapsed() can be called for any transient entry. To assert that any transient entry is STORE_PENDING, we must establish two conditions:
- syncCollapsed() relies on the transient entry being
STORE_PENDING. - A transient entry cannot be
STORE_OK.
For item 1: I do not see why syncCollapsed() would need the transient entry to always be in STORE_PENDING state. Do you? Sure, there are places inside syncCollapsed() where STORE_PENDING is needed (e.g., when we call abort(), but there are also places where it is irrelevant (e.g., when the entry is inSync).
For item 2: AFAICT, a StoreEntry object becomes STORE_OK when it is fully stored in disk or memory cache OR when it is aborted. I see no code that would guarantee that a StoreEntry is always removed from Transients when it becomes STORE_OK. Did I miss it?
Eduard: There is another reason indirectly confirming that: StoreEntry::abort() asserts if store_status!=STORE_PENDING and this method is invoked throughout Controller::syncCollapsed().
This is indeed an important observation -- syncCollapsed() must not abort() aborted or completed StoreEntry objects. However, that makes STORE_PENDING a precondition for the syncCollapsed() code calling abort(). It does not imply that the entire syncCollapsed() code has such a precondition.
If you do not find flaws in the above reasoning, then I suggest removing this TODO comment. More related same-method changes are requested below.
There was a problem hiding this comment.
then I suggest removing this TODO comment.
Done at c0a470d.
syncCollapsed() must not abort() aborted or completed StoreEntry objects.
The related c0a470d change handles 'aborted' case. Not sure what was specifically suggested for 'completed' case, left as is for now. So, currently, abort() will assert there if the entry is STORE_OK.
There was a problem hiding this comment.
Alex: This is indeed an important observation -- syncCollapsed() must not abort() aborted or completed StoreEntry objects.
Eduard: Not sure what was specifically suggested for 'completed' case
Nothing was suggested in that phrase, I was just retelling what you said (not STORE_PENDING means aborted or completed StoreEntry). If the current syncCollapsed() code never calls abort() for aborted or completed StoreEntry objects, then we are all set as far as this "remove TODO" change request is concerned.
src/store/Controller.cc
Outdated
| debugs(20, 7, "syncing " << *collapsed); | ||
|
|
||
| bool abandoned = transients->abandoned(*collapsed); | ||
| bool aborted = false; |
There was a problem hiding this comment.
s/aborted/abortedByWriter/ to clarify what kind of abort we are dealing with here
src/store/Controller.cc
Outdated
| bool waitingToBeFreed = false; | ||
| transients->status(*collapsed, aborted, waitingToBeFreed); | ||
| if (aborted) { | ||
| debugs(20, 3, "aborting " << *collapsed << " due to aborted shared status"); |
There was a problem hiding this comment.
s/due to aborted shared status/because its writer has aborted/
| collapsed->abort(); | ||
| return; | ||
| } | ||
| bool found = false; |
There was a problem hiding this comment.
Let's add an empty line here -- we are done processing the aborted case. The rest of the method can assume that the shared entry is not aborted.
src/store/Controller.cc
Outdated
| bool aborted = false; | ||
| bool waitingToBeFreed = false; | ||
| transients->status(*collapsed, aborted, waitingToBeFreed); | ||
| if (aborted) { |
There was a problem hiding this comment.
Let's add an empty line above the if statement to better isolate the aborted case.
src/store/Controller.cc
Outdated
| { | ||
| if (StoreEntry *entry = static_cast<StoreEntry*>(hash_lookup(store_table, key))) { | ||
| debugs(20, 3, "got in-transit entry: " << *entry); | ||
| entry->release(); |
There was a problem hiding this comment.
If this entry is collapsed but still detached, will its local client(s) get stuck for a while, waiting for a syncCollapsed() call (that this particular method does not trigger for local clients)? And then, when that syncCollapsed() call finally comes, we will call abort() because we cannot attach this entry?
AFAICT, collapsed but still detached entries in other workers will not get stuck because StoreEntry::release() eventually calls Transients::abandon() that calls CollapsedForwarding::Broadcast(). Please double check the current code and correct me if I am wrong.
If my concerns are at least partially valid, then we could add more code to StoreEntry::release(), but I am worried that calling abort() from release() would create dangerous reentrant code (i.e., the object calling release() will be synchronously called back to be notified about the abort and then will resume running with an aborted entry). While abort.callback is asynchronous, invokeHandlers() (also called by StoreEntry::abort()) is not!
To address this, perhaps we should do something like this instead:
// entry->release() notifies other waiting workers but
// XXX: it does not abort local collapsed Store clients if needed.
entry->release();
// Work around the above XXX. TODO: Move into release() after making invokeHandlers() asynchronous.
if (... has transient index ...)
syncCollapsed(...);
Any better ideas?
There was a problem hiding this comment.
AFAICT, collapsed but still detached entries in other workers will not get stuck because StoreEntry::release() eventually calls Transients::abandon() that calls CollapsedForwarding::Broadcast().
While experimenting, I saw that this does not work properly with our changes. Currently, 'DELETE' causes purgeEntriesByUrl(), Controller::unlinkByKeyIfFound() and Transients::unlinkByKeyIfFound() (for USE_HTCP off scenario). These were our 'innovations' to optimize 'heavy' StoreEntry loading. I.e., instead of StoreEntry::release() and eventually Transients::abandon() now we have unlinkByKeyIfFound() and no abandon(). So other worker clients keep 'hanging' on detached entry which became deleted.
To fix this, we can either:
- revert our changes in purgeEntriesByUrl() and load StoreEntry or
- teach Transients::unlinkByKeyIfFound() to do CollapsedForwarding::Broadcast(). There should be added a suitable api for this though.
What do you think?
There was a problem hiding this comment.
revert our changes in purgeEntriesByUrl() and load StoreEntry
IIRC, loading StoreEntry is not an option in general because Controller::find() returns nil for not-yet-readable entries. See your PR description near "mark being-created
entries". Our changes were not simply an optimization. They addressed a problem with marking unreadable entries for deletion. We need to fix/improve our changes, not revert them.
teach Transients::unlinkByKeyIfFound() to do CollapsedForwarding::Broadcast().
unlinkByKeyIfFound() or a related method should broadcast (in some cases): A worker must broadcast whenever it changes the state of a shared being-read entry -- the broadcast is the mechanism for readers in other workers to learn about those changes. Marking a transient entry for deletion requires broadcasting (when there are other entry readers).
I am not sure this broadcast should be initiated by the Transients::unlinkByKeyIfFound() method, by the corresponding Controller method, or by similar methods in the cache Stores. I hope you can find and defend the answer to that question. Ideally, there should be at most one broadcast per change. See below for more clues.
There should be added a suitable api for this though.
I would check whether calling Transients::get() and broadcasting the resulting entry (or not broadcasting at all if it returns nil) is sufficient (and a good idea) in this case. That would still be different than relying on Controller::find() for deletion. Here, we are relying on (fast) get() to broadcast the change, not to mark the store entries for deletion.
Note that, AFAICT, we do not need to broadcast if the transients entry has already been marked for deletion (which means we changed nothing) or when the transients entry is locked for writing (which means it cannot have any readers yet).
Also, if there is a local StoreEntry object already, we should probably use that instead of calling Transients::get(). Perhaps the code already has this optimization.
Combining all of the above, you might arrive at something like this (entry locking/unlocking not shown and may be missing):
if (auto entry = transients->get(key))
entry->release(); // broadcasts to inform other readers if needed
else
Controller::unlinkByKeyIfFound(key); // does not broadcast but there are no readers
As always, please check my reasoning. The overarching "broadcast changes when there are readers" principle should be correct, but I could miss various nuances/details.
There was a problem hiding this comment.
IIRC, loading StoreEntry is not an option in general because Controller::find() returns nil for not-yet-readable entries. See your PR description near "mark being-created
entries". Our changes were not simply an optimization. They addressed a problem with marking unreadable entries for deletion. We need to fix/improve our changes, not revert them.
To clarify: my (1) suggestion did not assume 100% reverting purgeEntriesByUrl(): unlinkByKeyIfFound() method should stay of course, to handle situation when it is already impossible/not_yet_possible to 'get' the entry from store but it is still possible to get it by the key. I suggested to keep getting entry when we can(as the old code did). This would be a simple but not a perfect/complete solution though.
There was a problem hiding this comment.
My sketch is similar to what you are suggesting then, except it does not pay the relatively high price of creating a new StoreEntry object when transients do not have a usable StoreEntry object already.
If the optimization in my sketch causes trouble, then reusing the old code (i.e., creating a new StoreEntry object from store metadata whenever possible) is fine. Just add a source code comment for the reader who might be tempted to optimize this code by using Transients::get() alone.
There was a problem hiding this comment.
As we discussed, release() sometimes did not cause broadcasting in my tests (including the old code). I see the reason in Store::Controller::unlink(): it does not care about Transients, calling transientsAbandon().
There was a problem hiding this comment.
I agree that the lack of a transients->unlink() call in Controller::unlink() looks wrong, especially since Controller::markForUnlink() has the expected transients->markForUnlink() call.
However, I cannot say, without doing research, whether changing that method alone would guarantee broadcasts during release(). Please make sure release() broadcasts (directly or indirectly) whenever the state changes in such a way that a broadcast is required, regardless of whether Controller::unlink() has been called. Locating the best place(s) to (check state and) initiate that broadcast is not easy.
There was a problem hiding this comment.
However, I cannot say, without doing research, whether changing that method alone would guarantee broadcasts during release()
No, it is not guaranteed: there is another problem. Transients::unlink() does broadcasting only for 'writers', i.e., when "e.mem_obj->xitTable.io == MemObject::ioWriting". That was the initial design: only writers can broadcast, but now, evidently, we need to broadcast from any worker, which received DELETE.
There was a problem hiding this comment.
we need to broadcast from any worker, which received DELETE
To be more precise:
- from any worker which changed the shared entry state
- to any worker reading that shared entry
This PR focus is on DELETE-triggered changes, and we can broaden these narrow rules if needed to ease implementation (e.g., broadcast to all workers), but we should keep these rules in mind.
Please note that the rules have not changed. What changed is our understanding of their impact: We now know that DELETEs (and such) may require initiating broadcasts from non-writers.
There was a problem hiding this comment.
The related fixes for this thread discussion at 2765f0b.
src/enums.h
Outdated
| ENTRY_REVALIDATE_ALWAYS, | ||
| DELAY_SENDING, | ||
| RELEASE_REQUEST, | ||
| RELEASE_REQUEST, ///< makes private key permanent |
There was a problem hiding this comment.
Since the flag itself does not really make anything, I would rephrase as "prohibits making the key public".
src/enums.h
Outdated
| typedef enum { | ||
| SWAPOUT_NONE, | ||
| SWAPOUT_WRITING, | ||
| SWAPOUT_NONE, ///< the store entry is in a 'disconnected' from disk state |
There was a problem hiding this comment.
disconnected can be easily misinterpreted as "was connected but not longer connected". How about "StoreEntry is currently not associated with any disk store entry". And, to make it more symmetric with other definitions:
/// StoreEntry is currently not associated with any disk store entry.
/// Does not guarantee (or preclude!) a matching disk store entry existence.
src/enums.h
Outdated
| SWAPOUT_WRITING, | ||
| SWAPOUT_NONE, ///< the store entry is in a 'disconnected' from disk state | ||
| SWAPOUT_WRITING, ///< the store entry is being stored on a disk | ||
| /// The store entry has been stored on a disk and still is 'connected' |
There was a problem hiding this comment.
I suggest:
/// StoreEntry is associated with a complete (i.e., fully swapped out) disk store entry.
/// Guarantees the disk store entry existence.
src/enums.h
Outdated
| SWAPOUT_NONE, | ||
| SWAPOUT_WRITING, | ||
| SWAPOUT_NONE, ///< the store entry is in a 'disconnected' from disk state | ||
| SWAPOUT_WRITING, ///< the store entry is being stored on a disk |
There was a problem hiding this comment.
I suggest:
/// StoreEntry is being swapped out to the associated disk store entry.
/// Guarantees the disk store entry existence.
src/store.cc
Outdated
| expireNow(); | ||
| if (!shareable) { | ||
| // This object should not be reused | ||
| expireNow(); |
There was a problem hiding this comment.
Why do we need to expire entries when making them private?
There was a problem hiding this comment.
This makePrivate(false) is called when the decision is 'reuseNot': we can't share the entry even with already collapsed clients. E.g., we got a header forbidding any caching/sharing (smth. like CC:no-store).
There was a problem hiding this comment.
Why do we need to expire entries when making them private and unshareable? You are explaining why unshareable entry cannot be shared, but I am asking why it has to expire.
There was a problem hiding this comment.
Probably this sounds vague, but to my mind 'expiration' prevents sharing shareable but not reusable entries. I.e., it makes difference between doNotCacheButShare and reuseNot states.
I am not insist this is [the only] correct way for doing this. This is how it is implemented and working now.
There was a problem hiding this comment.
'expiration' prevents sharing shareable but not reusable entries
True, but should not be relevant because we are talking about unshareable entries.
I.e., it makes difference between doNotCacheButShare and reuseNot states.
My argument that it should not make a difference. I do not know whether it does. Unshareable entries should be unshareable regardless of their expiration time. If unsharable entries are shareable, there is a bug we should fix.
This is how it is implemented and working now.
I know but that is largely irrelevant to my question. If the current code works correctly (in some cases) because of a bad hack, and we are fixing a very related aspect of that code because of a very related bug (in other cases), then we should remove the hack instead of splitting the code into two parts, one that continues to rely on a hack and one that is no longer affected by the bugs that hack caused.
If you remove that if statement and expireNow() call, will the code break in some cases?
- If yes, then the code relies on a weak expiration check to prevent sharing of unsharable entries. Such code must be fixed IMO.
- If no, then the code is already OK and the expireNow() call is simply not needed (after all your other changes/fixes).
There was a problem hiding this comment.
If no, then the code is already OK and the expireNow() call is simply not needed (after all your other changes/fixes).
I think you are right with this. I overlooked our StoreEntry::mayStartHitting(), developed to distinguish such shareable and unshareable entries. So there should be no need to do extra work with expireNow().
will the code break in some cases
It is difficult to say. I can state only that the major code paths (clientReplyContext::cacheHit() or clientReplyContext::handleIMSReply()) already use mayStartHitting() and there is no need there for protection with 'expiration'.
There was a problem hiding this comment.
I can state only that the major code paths (clientReplyContext::cacheHit() or clientReplyContext::handleIMSReply()) already use mayStartHitting() and there is no need there for protection with 'expiration'.
The undocumented StoreEntry::validToSend() method also checks for "released" entries and, hence, does not rely on the expiration date alone (although I suspect it should check mayStartHitting() instead).
Finally, setting StoreEntry::expires is a rather weak protection because that setting can be ignored based on max-stale directives (at least). Look for FRESH_REQUEST_MAX_STALE_... Unlike the expiration date, private keys and shareableWhenPrivate should never be ignored.
src/client_side_reply.cc
Outdated
| Store::Root().unlinkByKeyIfFound(key); | ||
| // Are there local collapsed clients we should notify? | ||
| // If yes, get transients index and use it later. | ||
| const auto transientsIndex = entry->hasTransients() && entry->locked() ? entry->mem_obj->xitTable.index : -1; |
There was a problem hiding this comment.
I understand that you call locked() here because Controller::get() might return an unlocked entry, and the transientsIndex is not going to survive StoreEntry::release() of an unlocked entry. I think we should make this code safer and simpler by always locking/unlocking the entry returned by get(). Doing so will remove this complex locked() logic and make post-release() entry manipulation safe. As a positive side-effect, it may also delete unlocked intransit entries that some other code might leave behind.
if (auto entry = Store::Root().get(key)) {
entry->lock("purgeEntriesByUrl");
...
// entry->release() notifies other waiting workers but
// XXX: does not abort local collapsed Store clients if needed.
entry->release(true);
// Work around the above XXX.
// TODO: Move into release() after making invokeHandlers() asynchronous.
if (entry->hasTransients())
Store::Root().syncCollapsed(entry->mem_obj->xitTable.index);
entry->unlock("purgeEntriesByUrl");
} else ...
There was a problem hiding this comment.
I am afraid I applied your improvement in hurry: I forgot about one reason why I decided to implement without locks. When we get from the Store, we could create a 'fake' entry (when store_table misses it). We need this entry only for notification purpose in release(). The syncCollapsed() method should handle local 'detached' entries, but it should not handle(abort for example) our 'fake' entry! That is why my previous version destroyed this fake entry first(after receiving required index) and then called syncCollapsed().
So I would suggest to undo that fdcfaa3.
There was a problem hiding this comment.
When we get from the Store, we could create a 'fake' entry (when store_table misses it).
Why do you call that entry "fake"? It is a regular/valid entry AFAICT. Does it have some special properties that are important in this context?
We need this entry only for notification purpose in release().
Yes, and, until that XXX is fixed, for the synchronization purposes in syncCollapsed().
The syncCollapsed() method should handle local 'detached' entries,
Yes.
but it should not handle(abort for example) our 'fake' entry!
Why not?
There was a problem hiding this comment.
Why do you call that entry "fake"?
I call this 'fake' because of my understanding of its usage. It is not used by clients. It is a 'helper' way to notify others. We could avoid using it at all, if, for example, transients->unlinkByKeyIfFound() had a mechanism to notify all workers(including this one) in case Transients has this entry. That was my initial suggestion you did not like.
Why not?
My suggestion prevents this extra work done on this entry in syncCollapsed(), including probably unnecessary(and misleading) logging. BTW, the abort() method says: "Someone wants to abort this transfer." What "transfer" do we have with our "fake" entry? If there is no local entry, we would return after transients->findCollapsed(xitIndex).
There was a problem hiding this comment.
I think we must guarantee that Transients::unlinkByKeyIfFound() is not called when there is a local StoreEntry object.
Or, equivalently, if Transients table exists, for each local StoreEntry hasTransients() must be true. destroyMemObject() disconnects Transients from the StoreEntry. I see two scenarios to consider:
-
Transients was disconnected while deleting StoreEntry, i.e., destroyStoreEntry() is called. In this case we firstly disconnect it from Transients and secondly remove it from store_table. Is it possible that anybody attaches to the already disconnected, but not yet deleted entry? Looks like impossible that this code became interrupted.
-
Transients was disconnected while purging memory (handleIdleEntry(), discussed above, or storeGetMemSpace() methods). As I noted above, there could be cases when such memobj-less StoreEntries keep existing in the local store_table. What is the purpose of leaving such half-baked local entries? Optimization or anything else? If it is a bug, it should be fixed and the invariant becomes correct. If not, the invariant does not work.
Also, one thing is known that such transient-less and memobj-less entries should be also unlocked, meaning that there are no clients attached to it. If this consideration is correct, the invariant should be valid without any additional fixes: we should not worry about such 'stand-alone' entries, because they stay apart from collapsing, notifying, etc. Also, we can assert that entry.locked()==false, before calling unlinkByKeyIfFound() inside Transients::markForUnlink().
There was a problem hiding this comment.
FYI: since we are dealing with a multi-process/thread environment any process may cause a kernel interrupt and do something with the shared parts unless the Transients disconnect part of the sequence is wrapped in proper atomic locks and the reader/writer counts checked inside the atomic lock (our StoreEntry locking does not count as thread-safety).
[ I have not checked the code to see if that is present, but you dont mention it as the reason for being "impossible", so though I should mention it. ]
There was a problem hiding this comment.
@yadij, AFAICT, when talking about "impossible" and "code interrupts" in item 1, Eduard is actually talking about things happening inside one Squid process. Here, "code interrupts" means AsyncCalls and such rather than process interrupts . This particular concern is unrelated to SMP, and Eduard is right that it is impossible for a new Store client to attach to a being destroyed StoreEntry object (i.e., attach during destroyStoreEntry()).
There was a problem hiding this comment.
Eduard is actually talking about things happening inside one Squid process.
That is correct. No other clients can attach during destruction process, because no async calls start processing during it. Also my supposition was that if Squid worker is suspended by OS at the beginning of StoreEntry destruction, it will resume executing from that very point of code, finishing the destruction process.
There was a problem hiding this comment.
Alex: it is impossible for a new Store client to attach to a being destroyed StoreEntry object (i.e., attach during destroyStoreEntry()).
Eduard: No other clients can attach during destruction process
For the record, both statements above talk about necessarily local (i.e., same-process) clients attaching to the StoreEntry object (which is always local!). Store clients in other workers (i.e., "remote" clients) may attach to the (shared) Transients entry and/or other StoreEntry objects in their processes, but that is a different issue.
|
@rousskov , I addressed last discussed comments, but I can't set it to S-waiting-for-reviewer. |
rousskov
left a comment
There was a problem hiding this comment.
I found a couple of problems. The rest is polishing. I hope we are getting close, but more changes are needed.
Also, if possible, please assert that RELEASE_FLAG is not set inside clearPrivate(), confirming that the private key setting is permanent.
src/Transients.cc
Outdated
| // there will be no more updates from us after this, so we must prevent | ||
| // future readers from joining | ||
| map->freeEntry(e.mem_obj->xitTable.index); // just marks the locked entry | ||
| map->closeForWriting(e.mem_obj->xitTable.index); |
There was a problem hiding this comment.
The comment above is still correct, but instead of freeing the entry, we are now using complete() test to prevent new readers from joining. Let's clarify what is going on and document a dependency on our get() logic:
// There will be no more updates from us after this, so we must prevent
// future readers from joining. Making the entry complete() is sufficient
// because Transients::get() does not return completed entries.
map->closeForWriting(e.mem_obj->xitTable.index);
src/Transients.cc
Outdated
| { | ||
| unlink(e); | ||
| assert(e.key); | ||
| e.hasTransients() ? unlink(e) : |
There was a problem hiding this comment.
Please do not use tertiary operator as an if-statement replacement. Use an if-statement. A tertiary operator imposes certain requirements on its second and third arguments, and ignoring its result may trigger compiler warnings.
There was a problem hiding this comment.
A tertiary operator imposes certain requirements on its second and third arguments, and ignoring its result may trigger compiler warnings.
In this particular case these 'requirements' should be satisfied and no warnings should be generated. Both arguments result is 'void' so the result of the operator is also 'void'.
There was a problem hiding this comment.
Yes, I know, but the return type(s) may change. The current void types are not a part of some natural interface invariant for those methods. The if statement would not be affected by such changes. A tertiary operator offers no advantages here, so I am puzzled why you are spending time to defend its use.
There was a problem hiding this comment.
Yes, I know, but the return type(s) may change.
And in this case we would need to fix our if/else, prefixing methods with (void) methods ...
I am puzzled why you are spending time to defend its use.
I am not going to defend this minor issue anymore. In general (if everything is equal), I would prefer one line of the ternary operator to a clumsy equivalent four lines of if/else.
There was a problem hiding this comment.
And (if the return types change), we would need to fix our if/else, prefixing methods with (void) methods ...
Not necessarily. The if statement will continue to work and read correctly if the method(s) gain a an optional return value. If the return value is required/important, then the if statement will need to be changed, but the changes are very likely to be much smaller in this case (because the two methods are too different to return the same kind of a required/important result).
In general, I prefer using separate lines for conditions and actions, which results in a one line difference between a tertiary operator and an if statement. Removing that extra else line is not worth the problems that tertiary operator brings.
AFAICT, the tertiary operator is meant for obtaining a result from either of the two different algorithms. The code which does not care about the result should not use the operator that does.
src/Transients.cc
Outdated
| Transients::unlink(StoreEntry &e) | ||
| { | ||
| if (e.mem_obj && e.mem_obj->xitTable.io == MemObject::ioWriting) | ||
| if (e.mem_obj) |
There was a problem hiding this comment.
This change looks unfinished. The old code was (indirectly) protecting abandon() from entries that do not have xitTable.index set. The new code does not and may result in asserts if unlink() is called without a hasTransients() check. I see two correct models:
- All callers must check
hasTransients()before callingunlink(). Unlink()must checkhasTransients().
AFAICT, this PR is using approach 1 right now. Following that approach means not checking e.mem_obj either! If you agree and this PR continues to follow approach 1 (see below), then please remove that check as well. Otherwise, unlink() has to check hasTransients() and call markForUnlink() if there are none (approach 2).
I am not sure which approach is the best, but it bothers me that unlink() is using approach 1 while markForUnlink() is using approach 2. Approach 2 seems to be required for markForUnlink() because we must mark the entry even if it is being created (i.e., locked for writing) right now, correct? Should not the same logic apply to unlink()?
There was a problem hiding this comment.
Otherwise, unlink() has to check hasTransients() and call markForUnlink() if there are none (approach 2).
After these changes I noticed that unlink() and markForUnlink() became identical. So I made unlink() private(which now only calls markForUnlink()) and use markForUnlink() in that single place instead.
src/fs/rock/RockSwapDir.cc
Outdated
| debugs(47, 5, e); | ||
| map->freeEntry(e.swap_filen); | ||
| assert(e.key); | ||
| e.hasDisk() ? map->freeEntry(e.swap_filen) : |
There was a problem hiding this comment.
Please use an if statement instead of the tertiary operator.
src/fs/ufs/UFSSwapDir.h
Outdated
| virtual uint64_t currentCount() const override { return n_disk_objects; } | ||
| virtual ConfigOption *getOptionTree() const override; | ||
| virtual bool smpAware() const override { return false; } | ||
| virtual bool hasReadableEntry(const StoreEntry &) const override { return false; }; |
There was a problem hiding this comment.
Do you think we should document why this method always returns false? Something like this, perhaps:
/// as long as ufs relies on the global store_table to index entries,
/// it is wrong to ask individual ufs cache_dirs whether they have an entry
virtual bool hasReadableEntry(const StoreEntry &) const override { return false; }
src/store/Controller.cc
Outdated
| debugs(20, 3, "aborting unsyncable " << *collapsed); | ||
| collapsed->abort(); | ||
| } else { // the entry is still not in one of the caches | ||
| } else { |
There was a problem hiding this comment.
I meant this:
if (inSync) {
debugs(20, 5, "synced " << *collapsed);
collapsed->invokeHandlers();
return;
}
if (found) { // unrecoverable problem syncing this entry
debugs(20, 3, "aborting unsyncable " << *collapsed);
collapsed->abort();
return;
}
// the entry is still not in one of the caches
debugs(20, 7, "waiting " << *collapsed);
src/store/Disks.cc
Outdated
| if (e.hasDisk(i)) | ||
| dir(i).unlink(e); | ||
| else | ||
| dir(i).unlinkByKeyIfFound(reinterpret_cast<const cache_key*>(e.key)); |
There was a problem hiding this comment.
As already discussed elsewhere, with "approach 2", this inner if-statement would be replaced with just dir(i).unlink(e); and each cache dir will do either mark the (locked) entry for unlinking or unlink by key.
There was a problem hiding this comment.
Done at a40bbf6.
Following it, I removed a hasDisk() check from storeSwapOutFileClosed() because the surrounding code assumes that the entry should be attached to a disk: it uses e->disk() without check. Probably there should be an assert(e->hasDisk()) near existing assert(e->swappingOut())?
There was a problem hiding this comment.
I would not add a hasDisk() assert to protect disk() calls because those calls have that assert().
Since storeSwapOutFileClosed() does not manipulate raw disk indexes/pointers itself (only dumps them in stale debugs() messages), I think we are OK without an assert protecting such raw manipulations.
I consider this change request addressed.
src/store/Storage.h
Outdated
| virtual void markForUnlink(StoreEntry &e) = 0; | ||
|
|
||
| /// remove the entry from the store | ||
| /// Remove the entry from the store if possible |
There was a problem hiding this comment.
s/the entry/the matching entry/
src/store/Storage.h
Outdated
|
|
||
| /// Remove the entry from the store if possible | ||
| /// or mark it as waiting to be freed otherwise. | ||
| /// The entry must be present in the store. |
There was a problem hiding this comment.
With "approach 2", the "must be present" part will be gone.
|
|
||
| if (Store::Root().markedForDeletionAndAbandoned(*this)) { | ||
| debugs(20, 3, "marked for deletion and abandoned"); | ||
| return false; |
There was a problem hiding this comment.
Should this be MemObject::SwapOut::swImpossible?
There was a problem hiding this comment.
Yes, it should. I think this bug is a consequence of code duplication throughout this method: swImpossible state is always followed by returning 'false'. I would suggest to fix this with something like:
static bool CreateImpossibleDecision(StoreEntry *e, const char *msg)
{
debugs(20, 3, msg << " " << *e);
swapOutDecision(MemObject::SwapOut::swImpossible);
return false;
}
and the above block(and other similar checks) would transform into:
if (Store::Root().markedForDeletionAndAbandoned(*this))
return CreateImpossibleDecision(this, "marked for deletion and abandoned");
There was a problem hiding this comment.
Let's not polish this now. I am concerned that either something went wrong in this method (before this PR) or the method itself needs to be redesigned again (outside of this PR) because false always means swImpossible. Addressing that concern is not trivial and is outside this PR scope.
|
I approved the changes I requested. I do not like the final unlink()/markForUnlink() state. I think we should polish that split. However, that polishing can be done separately, and a separate polish may reduce overheads. @eduard-bagdasaryan, please proceed with rebasing the branch and doing the (hopefully) last batch of comprehensive tests. @yadij, please approve the changes you requested if Eduard has indeed addressed them per #46 (comment) |
s/RELEASE_FLAG/RELEASE_REQUEST/ |
rousskov
left a comment
There was a problem hiding this comment.
I found a couple of more problems and added a few questions and polishing remarks as well. Please see the code comments for details.
src/client_side_reply.cc
Outdated
| } | ||
| #endif | ||
| entry->release(); | ||
| // entry->release() notifies other waiting workers, including this one |
There was a problem hiding this comment.
s/other waiting workers/waiting workers/
Also, the opening curly brace should go on the same line as the if-statement (it does in the original code). Please check that you have run the source code formatting script on these changes.
src/Transients.cc
Outdated
| const auto idx = entry.mem_obj->xitTable.index; | ||
| const auto &anchor = collapsedWriter(entry) ? | ||
| map->writeableEntry(idx) : map->readableEntry(idx); | ||
| aborted = EBIT_TEST(anchor.basics.flags, ENTRY_ABORTED); |
There was a problem hiding this comment.
anchor.basics.flags is not atomic but may be updated by writer's StoreMap::abortWriting() calls, so it is wrong to read it without holding an exclusive lock. For example, when collapsedWriter() is false, we do not have an exclusive lock and might try to read anchor.basics.flags when another worker (writer) is doing StoreMap::abortWriting(). To properly fix this, we would probably have to add a second dedicated atomic flag to StoreMapAnchor, like waitingToBeFreed, and use it to mark and detect aborted map entries. The ENTRY_ABORTED flag in StoreMapAnchor will be unused (like it was before these changes). Any better ideas?
Can we place that second aborted flag so that sizeof(StoreMapAnchor) does not change on 32bit and 64bit platforms?
The rest of StoreMapAnchor::basics was and remains constant when readers are allowed AFAIK, but this PR started changing flags...
There was a problem hiding this comment.
Any better ideas?
Having two duplicated flags looks not very well. Moreover, what if in future we would need to read one more 'flag' from anchor.basics.flags in another reader? Again add such 'atomic' duplicate?
I would like to suggest another approach:
aborted = false;
// Since a shared entry gets ENTRY_ABORTED only within StoreMap::abortWriting(),
// (where anchor.writing() becomes false), we should not check this value before.
// Also note that trying to read ENTRY_ABORTED for being written shared entry
// may cause an unpredictable result, because anchor.basics.flags is not 'atomic'.
if (!anchor.writing() && EBIT_TEST(anchor.basics.flags, ENTRY_ABORTED))
aborted = true;
waitingToBeFreed = anchor.waitingToBeFreed;
There was a problem hiding this comment.
Having two duplicated flags looks not very well
I think we can avoid any duplication of flags in my proposal: The new atomic flag will be used without any relationship or reference to the unused(?) same-name bit in basics.flags. And we can name the new atomic flag writerGone or something like that to minimize the implication that the two are flags are duplicated.
Moreover, there is actually some extra danger in using ENTRY_ABORTED like we do in the current PR code. I suspect that some higher level code may assert if it gets an ENTRY_ABORTED entry from the cache. Our code probably copies shared basic.flags into local StoreEntry objects blindly, which (in rare cases) might result in a cache reader getting a ENTRY_ABORTED StoreEntry hit and asserting. The code should handle later hit aborts correctly (because they can always happen on incomplete entries), but finding an already aborted hit in the cache, especially for a "complete" entry, may be problematic. This concern alone may be a sufficient reason to not use ENTRY_ABORTED for our purposes, even if the atomicity issue does not exist!
Finally, we should consider removing basics.flags as effectively unused, but probably not in this PR because that decision is not trivial.
// Since a shared entry gets ENTRY_ABORTED only within StoreMap::abortWriting(),
// (where anchor.writing() becomes false), we should not check this value before.
if (!anchor.writing() && EBIT_TEST(anchor.basics.flags, ENTRY_ABORTED))
aborted = true;
This trick misses the (rare) case where the flag has been set (before the check) but the write lock has not been released (until after the check). This problem is not critical because the affected reading worker should receive another notification later, when the writer finishes updating its state. At that time, the reader will detect the aborted entry correctly. Thus, we can get away with this by adding an XXX.
The question is: Why should we use tricks and add XXXs when we can use an atomic flag instead? I do not think we should if the earlier discussion addresses your "duplicated flags" concern.
There was a problem hiding this comment.
Done at 8cd8303. Does it meet your expectation?
Can we place that second aborted flag so that sizeof(StoreMapAnchor) does not change on 32bit and 64bit platforms?
I placed it just after waitingToBeFreed. I assume that since sizeof(waitintToBeFreed) == sizeof (writerHalted) == 1, these fields are not aligned. The following data field (uint64_t key[2]) should be aligned with 3-byte offset before changes and with 2-byte offset after changes. The sizeof(StoreMapAnchor) should be the same in both cases, I think. In my environment it is 96 bytes.
src/store/Controller.cc
Outdated
| Store::Controller::syncCollapsedWriter(StoreEntry &collapsed) | ||
| { | ||
| assert(transients); | ||
| bool aborted = false; |
There was a problem hiding this comment.
Edit: This change request is probably not relevant because syncCollapsed() restructuring discussed in another change request should make these polishing touches unnecessary.
s/aborted/abortedByUs/ to restore symmetry with syncCollapsedReader().
Also, please add a comment to confirm and explain why this variable is unused. For example:
// keeps Transients::status() API simple; unused because we already
// know whether we are aborted (so this flag does not need syncing)
bool abortedByUs = false;
You can write know we are not aborted instead because syncCollapsed() checks that, but I would not introduce that unnecessary implied dependency.
There was a problem hiding this comment.
Skipped as irrelevant.
src/store/Controller.cc
Outdated
| debugs(20, 7, "not SMP-syncing not-transient " << xitIndex); | ||
| return; | ||
| if (waitingToBeFreed) { | ||
| debugs(20, 3, "releasing writing " << collapsed << " due to waitingToBeFreed shared status"); |
There was a problem hiding this comment.
We cannot really release a being-written entry so I would
s/releasing/marking/
or
s/releasing/will release/
I would also drop "shared status" for brevity sake.
There was a problem hiding this comment.
Skipped as irrelevant.
src/store/Controller.cc
Outdated
| return; | ||
| if (waitingToBeFreed) { | ||
| debugs(20, 3, "releasing writing " << collapsed << " due to waitingToBeFreed shared status"); | ||
| collapsed.release(true); |
There was a problem hiding this comment.
This check and release() call are missing for the syncCollapsedReader() case. There is a similar check calling abort() there, but that check does not cover the non-aborting cases AFAICT.
When I imagine the resulting (fixed) code, it feels like we should restructure Store::Controller::syncCollapsed() to reduce the chance of similar future bugs and to reduce code duplication. A single method like this may work better:
...
debugs(20, 7, "syncing " << *collapsed);
bool abortedByWriter = false;
bool waitingToBeFreed = false;
transients->status(collapsed, abortedByWriter, waitingToBeFreed);
if (waitingToBeFreed) {
debugs(20, 3, "will release " << collapsed << " due to waitingToBeFreed");
collapsed.release(true); // may already be marked
}
if (transients->collapsedWriter(*collapsed))
return; // readers can only change our waitingToBeFreed flag
assert(transients->collapsedReader(*collapsed));
if (abortedByWriter) {
debugs(20, 3, "aborting " << collapsed << " because its writer has aborted");
collapsed.abort();
return;
}
... the rest of the current syncCollapsedReader() code follows ...
src/ipc/StoreMap.h
Outdated
| /// free the entry if possible or mark it as waiting to be freed if not | ||
| void freeEntry(const sfileno fileno); | ||
| /// \param stateChanged when provided, true if the entry state changes, false otherwise | ||
| void freeEntry(const sfileno fileno, bool *stateChanged = nullptr); |
There was a problem hiding this comment.
I suggest a simpler interface:
/// free the entry if possible or mark it as waiting to be freed if not
/// \returns whether the entry was neither empty nor marked
bool freeEntry(const sfileno fileno);
There was a problem hiding this comment.
I did similarly at first, but almost all (~15) freeEntry() calls end up with ignoring its return value. We would have to prefix them with (void), which looks ugly.
There was a problem hiding this comment.
almost all freeEntry() calls end up with ignoring its return value. We would have to prefix them with (void)
I do not think we have to or should prefix those calls with (void). That prefix is useful or needed when you want to emphasize that a usually important return value is ignored on purpose rather than by accident. In this case, the return value is usually unimportant (because it does not indicate a success or failure -- the method cannot fail) and, hence, can be silently ignored by most callers.
src/ipc/StoreMap.cc
Outdated
| freeChain(fileno, s, false); | ||
| else | ||
| s.waitingToBeFreed = true; // mark to free it later | ||
| if (stateChanged) |
There was a problem hiding this comment.
To avoid returning true for the already marked and/or already free entries, we can do something like this:
if (s.lock.lockExclusive()) {
const bool result = !(s.waitingToBeFreed || s.empty());
freeChain(...);
return result;
}
// mark to free the locked entry later (if not already marked)
bool expected = false;
return s.waitingToBeFreed.compare_exchange_strong(expected, true);
There was a problem hiding this comment.
Done.
I used !s.waitingToBeFreed && !s.empty() instead, which looks simpler for me.
src/store/Controller.cc
Outdated
| { | ||
| debugs(20, 3, storeKeyText(key)); | ||
|
|
||
| if (markedForDeletion(key)) |
There was a problem hiding this comment.
Please add a debugs() here, to match the other lines in this method and to provide an explanation to admins trying to understand why the previous cached object does not produce hits:
if (markedForDeletion(key)) {
debugs(20, 3, "ignoring marked " << storeKeyText(key));
return nullptr;
}
src/store/Controller.cc
Outdated
| debugs(20, 3, "got in-transit entry: " << *entry); | ||
| entry->release(true); | ||
| // The entry should be locked (but see find()). Return here because the | ||
| // release of a locked entry should take care of all its stores. |
There was a problem hiding this comment.
Why do we care whether the entry is locked here? StoreEntry::release() takes care of all its stores regardless of whether the entry is locked, right? If this comment is stale or just wrong, I would do:
entry->release(true); // takes cares of all its stores if any
return;
However, the "all of its stores" logic implies that release() may ignore Store entries that are not attached to the entry object. On the other hand, Controller::unlinkByKeyIfFound() should mark all matching store entries in existence, even those not currently associated with any StoreEntry object. In other words:
- The above if-statement takes care of the (matching) store entries attached to the matching local StoreEntry object.
- The code below that if-statement takes care of the matching store entries when there is no matching local StoreEntry object.
- Who will take care of the matching store entries unattached to the matching local StoreEntry object? I believe such entries may exist because, for example, one worker could be writing a response to disk while another worker may be writing the same-key response to the memory cache. Can Transients guarantee that these two independent concurrent writes can never happen? I do not think so: Transients may not be able to create a shared entry to coordinate those two workers because, for example, the required Transients slot in shared memory may already be locked by another key due to a benign hash collision. And there may be cases other than two concurrent different-worker writes where a local matching StoreEntry coexists with matching but unattached store entries.
If my concern is valid, then we cannot return from the above if-statement. We have to continue to the code below. We still need to if-statement to keep any local StoreEntry objects in sync with Store changes, of course.
There was a problem hiding this comment.
However, the "all of its stores" logic implies that release() may ignore Store entries that are not attached to the entry object.
Looks like this comment is not precise: release() should mark not attached entry for deletion from storages via Controller::markForUnlink(), and corresponding markForUnlink() storage methods.
There was a problem hiding this comment.
BTW, Controller::unlinkByKeyIfFound() is only called from purgeEntriesByUrl() with precondition that Store::Root().get() returned nil. That makes me thinking that the "if" block in Controller::unlinkByKeyIfFound() is not necessary.
There was a problem hiding this comment.
Code:
release(true); // takes cares of all its stores if any
Alex: the "all of its stores" logic implies that release() may ignore Store entries that are not attached to the entry object
Eduard: release() should mark not attached entry for deletion from storages via Controller::markForUnlink
Whether release() should unlink unattached entries is actually debatable, but, for this discussion, let's assume that release() does unlink unattached entries.
Eduard: Looks like [Alex] comment is not precise
My comment seems OK, but the source code comment implies something that it should not imply. In other words, if release() unlinks unattached entries, then we need to fix the comment. Something like this may work better:
// unlink all Store entries with entry->key, even unattached ones
entry->release(true);
return;
And to minimize similar doubts in the future, we should improve the release() description as well. For example, something like this would be better than the current text IMO (we are running rather thin on Store-related terminology, but let's not try to fix that here, in a method description):
/// Removes all unlocked (and marks for eventual removal all locked) Store
/// entries, including attached and unattached entries that have our key.
/// Also destroys us if we are unlocked or makes us private otherwise.
virtual void release(const bool shareable = false);
There was a problem hiding this comment.
Controller::unlinkByKeyIfFound() is only called from purgeEntriesByUrl() with precondition that Store::Root().get() returned nil. That makes me thinking that the "if" block in Controller::unlinkByKeyIfFound() is not necessary.
AFAICT, that thinking does not match the code because Root().get() can return nil when store_table has a matching entry. See the markedForDeletion() check in Controller::find().
And, more importantly, we should not make nil get() a hidden precondition for calling Root().unlinkByKeyIfFound() because some caller will eventually forget to check that. Your comment and several similar earlier discussions all suggest to me that we need to revise the unlink/markForUnlink API. I hope we can do that after this PR. For example, it seems likely that a chunk of the current purgeEntriesByUrl() logic should go directly inside Controller::unlinkByKeyIfFound() instead of appearing as preconditions for (or alternative actions to) that method call.
There was a problem hiding this comment.
Done: fixed comments, the code is the same.
src/client_side_reply.cc
Outdated
|
|
||
| entry->unlock("purgeEntriesByUrl"); | ||
| } else { | ||
| Store::Root().unlinkByKeyIfFound(key); // does not broadcast but there are no waiting workers |
There was a problem hiding this comment.
The argument is valid IMO, but please s/are/were/ to make it a bit more accurate.
src/ipc/StoreMap.cc
Outdated
| return result; | ||
| } | ||
|
|
||
| // TODO: check whether 'empty' anchors can be still locked. |
There was a problem hiding this comment.
Yes, any anchor/entry can be locked. Locking does not depend on the entry state (other than the state of the lock itself, of course, but the lock can/should be viewed as being "outside" the entry itself).
src/client_side_reply.cc
Outdated
|
|
||
| entry->unlock("purgeEntriesByUrl"); | ||
| } else { | ||
| Store::Root().unlinkByKeyIfFound(key); // does not broadcast but there were no waiting workers |
There was a problem hiding this comment.
This comment is still misleading or even wrong. There could be workers working on an entry with key, of course. What we are trying to say here is that we should not worry about ignoring them. Why? I believe we can safely ignore such workers because either the entry has already been marked for deletion (and so another notification would make no difference) or Controller::find() failed to create new or join an existing Transients entry (and so we would not be able to notify anybody anyway). If my logic is correct, then let's replace this comment with something like this:
// The call below does not notify workers but that is OK here because nil
// Store::Root().get() guarantees that either the matching entry has already
// been marked (so new notifications are useless) OR get() could not create
// (or join) a Transients entry (so notifications are impossible).
There was a problem hiding this comment.
// The call below does not notify workers but that is OK here because nil
// Store::Root().get() guarantees that either the matching entry has already
// been marked (so new notifications are useless) OR get() could not create
// (or join) a Transients entry (so notifications are impossible).
I do not mind this in general, however note that if get() returns nil that only means that "it could join a Transients entry" (as well as any other of shared or local storages), we don't know whether the entry was deleted or have not been created. Also, AFAIK, get() does not create Transients entries: it is done by Controller::allowCollapsing() and Transients::startWriting().
There was a problem hiding this comment.
if get() returns nil that only means that "it could join a Transients entry"
I assume you meant "could not join a Transients entry".
(as well as any other of shared or local storages)
Right, but those other stores do not matter as far as notifications are concerned, and the comment we are discussing is about notifications.
get() does not create Transients entries
Good point! It only creates StoreEntries from existing Transient entries.
Another problem with that comment is that it is actually possible to broadcast a notification even if we could not join an existing Transients record -- broadcast messages just need xitTable.index value which we could compute. Such broadcasts would be useless because if we could not join then nobody else could, and so there are no readers to broadcast to (and the Transients writer does not care about the delete broadcasts, right?).
Here is a fixed(?) comment:
// The call below does not notify workers but that is OK here because nil
// Store::Root().get() guarantees that the matching Transients entry does
// not exist (so there are no readers to notify), could not be joined yet
// (ditto), or has already been marked (so new notifications are useless).
However, all the conditions in the above comment were true only at the get() time. They may no longer be true at the unlinkByKeyIfFound() time! What happens if a new reader comes and manages to attach to the Transients entry after our failed get() but before our unlinkByKeyIfFound() call? Will that reader have to wait for the next store write to realize that it can never attach to the stores and has to abandon the current transaction? If yes, is there a simple way to improve this?
Perhaps unlinkByKeyIfFound() should broadcast using raw Transients index but it feels like we should just reorder the markings instead:
- Mark stores (not Transients!) by key.
- Root().get().
- If we got an entry, do the usual entry->release(), with Broadcast().
- Otherwise, do nothing.
What do you think?
There was a problem hiding this comment.
What happens if a new reader comes and manages to attach to the Transients entry after our failed get() but before our unlinkByKeyIfFound() call?
BTW, similar logic should be applied to the writer itself: an entry may be locked for Transients writing (without appending) at the time of get() but become 'appending' (and available for another get()) just before unlinkByKeyIfFound(). Then the skipped broadcasting would not release() that local writer entry (Controller::syncCollapsed()).
but it feels like we should just reorder the markings instead
What is the need of (1)? If Root.get() works for us(Transients is not marked) why it will not work for others? I.e., If shared Transients entry is unavailable at (2) but becomes available just before (3) wont we face same problems?
Perhaps unlinkByKeyIfFound() should broadcast using raw Transients index
This would be more reliable solution, I assume.
There was a problem hiding this comment.
similar logic should be applied to the writer itself: an entry may be locked for Transients writing (without appending) at the time of get() but become 'appending' (and available for another get()) just before unlinkByKeyIfFound(). Then the skipped broadcasting would not release() that local writer entry (Controller::syncCollapsed()).
Agreed. The writer problem cannot be (easily) solved by broadcasting AFAICT. Instead, right before switching to appending mode, the writer must check waitingToBeFreed and release() its entry. If that code is missing, please add it. I think there is already similar code somewhere.
What is the need of (1)?
Step 1 calls unlinkByKeyIfFound() for memory and disk stores (if any).
If Root.get() works for us(Transients is not marked) why it will not work for others?
Root.get() will work for others as well, but they will get our notification and will be fine. They will not attach themselves to stores because we marked those first (in step 1).
If shared Transients entry is unavailable at (2) but becomes available just before (3) wont we face same problems?
We will! Glad you found this bug. I also realized that my step 4 was missing unlinkByKeyIfFound() for Transients, but adding that would not fix this bug.
Also, I really do not want to broadcast based on unlocked Transient index values because key:index mapping is (or will be) unreliable (if StoreMap supports hash collision resolution).
I do not have any good solutions to offer right now. Back to the drawing board... If you can offer something that addresses all these problems, please do!
There was a problem hiding this comment.
I tested latest adjustments with our Daft test. One problem occasionally happens: when several clients request a disk-cached entry, Squid returns unexpected "503 Service Unavailable" responses instead of "200" hits. Cache.log discovery showed that the reason is Transients entry creation collision: two workers simultaneously find the entry in Rock::SwapDir::get(), create StoreEntry objects and try to create Transients entry, locking it for writing. Unexpected 503s looks like the test imperfection, in reality we will get unexpected cache misses in this case.
There was a problem hiding this comment.
Glad our tests are capable of exposing such hidden problems!
In general, it is acceptable to return a miss when we cannot guarantee reliable hit delivery. A hit test must either accommodate/tolerate/retry such misses or avoid them.
However, we should also avoid making hit delivery unreliable without a good reason. In the context of your "concurrent hits" example, I do not see such a reason (from an independent reviewer point of view; ignoring internal problems of our own making). There are two different cases to consider here:
- A complete() store entry. AFAICT, such an entry does not need a Transients entry. There is no need to signal other workers, even when handling DELETE or similar purging events.
- An in-complete() store entry. Such an entry must already have a Transients entry so there should be no creation conflicts at hit time.
If the above analysis is correct, then we should relax the earlier "every active Store entry must have a corresponding Transients entry" rule to "every Store entry being written must have a corresponding Transients entry", right? A store reader must still attach to the existing Transients entry (if any), of course, but it should not try to create that Transients entry.
If you agree with the above analysis, please make the necessary code adjustments. Otherwise, please discuss what I have missed.
There was a problem hiding this comment.
A complete() store entry. AFAICT, such an entry does not need a Transients entry. There is no need to signal other workers, even when handling DELETE or similar purging events.
I am thinking about possible problems with scenario:
- There is a disk-cached entry.
- Two clients request it at the same time, only one of them manages to create Transients entry.
- The two clients start receiving cache hit.
- DELETE comes on another worker, marking the the shared cache and releasing only the first StoreEntry(via notification).
- The first client finishes getting the response and the corresponding Transients entry is freed. Our Store::Controller::markedForDeletion() protection stops working since now.
- Another client hits the the second worker, thus accessing its local still active StoreEntry (and ignoring previous DELETE request).
There was a problem hiding this comment.
- Two clients request an existing disk-cached entry at the same time, only one of them manages to create Transients entry.
With the adjusted rule, none of the reading workers will try to create a Transients entry.
- Another client hits the the second worker, thus accessing its local still active StoreEntry (and ignoring previous DELETE request).
This concern is valid (and we do not even need two clients in step 2 to trigger this problem!).
I see two ways to fix this problem:
A. Go back to the original "all active store entries have Transients entries" rule and accept the increased probability of rejecting perfectly valid cache hits.
B. Stay with the reduced "store writers must create Transients entries" rule but adjust the existing Controller::markedForDeletion() method to return true for keys that have local StoreEntry objects reading store entries marked for deletion. Please note that such StoreEntry objects will have a lock on the store entries they are reading.
I think solution B is preferable. Does it miss any use cases?
There was a problem hiding this comment.
Stay with the reduced "store writers must create Transients entries" rule
be9511bb implements this suggestion. While re-running the test, the above problem has gone, but a new, similar one appeared. I found the source of that new problem and posted the fix. After that, the test succeeded for both shared memory and disk cache.
I believe I answered all the pending questions. I also added a few additional (minor) change requests. I think we are getting really close to being done here. The ball is on your side. |
rousskov
left a comment
There was a problem hiding this comment.
Nice choice for the new flag name -- much better than writerGone I have suggested.
I added a few minor change requests for the last commit but it looks good to me overall.
The only big issue remaining is the purgeEntriesByUrl() race condition discussed a bit earlier: #46 (comment)
|
|
||
| public: | ||
| mutable ReadWriteLock lock; ///< protects slot data below | ||
| std::atomic<uint8_t> waitingToBeFreed; ///< may be accessed w/o a lock |
There was a problem hiding this comment.
Please do not change this type unless we have to for some reason. The correct type here may actually be std::atomic_bool, but this PR is not about improving that type and such improvements are not trivial.
There was a problem hiding this comment.
Adding this old problem to the new review: Please do not change this type unless we have to for some reason. The correct type here may actually be std::atomic_bool, but this PR is not about improving that type and such improvements are not trivial.
src/ipc/StoreMap.h
Outdated
| mutable ReadWriteLock lock; ///< protects slot data below | ||
| std::atomic<uint8_t> waitingToBeFreed; ///< may be accessed w/o a lock | ||
| std::atomic<bool> waitingToBeFreed; ///< may be accessed w/o a lock | ||
| /// true when StoreMap::abortWriting() was called but there were still readers |
There was a problem hiding this comment.
I think we can simplify because we do not need to document the free entry state:
std::atomic<uint8_t> writerHalted; ///< whether abortWriting() has been called
There was a problem hiding this comment.
Adding this old problem to the new review: I think we can simplify because we do not need to document the free entry state:
std::atomic<uint8_t> writerHalted; ///< whether abortWriting() has been called
src/Transients.cc
Outdated
| const auto idx = entry.mem_obj->xitTable.index; | ||
| const auto &anchor = collapsedWriter(entry) ? | ||
| map->writeableEntry(idx) : map->readableEntry(idx); | ||
|
|
There was a problem hiding this comment.
I would remove this empty line: All the lines above and below it are about extracting those two flags.
There was a problem hiding this comment.
Still open. Please also remove the new empty line in Transients::copyFromShm().
(http://lists.squid-cache.org/pipermail/squid-dev/2017-July/009220.html): * MemStore::updateCollapsed(): restored old behavior with returning true when the given collapsed entry is disconnected from he cache. * unlinkByKeyIfFound(): do nothing if the 'map' is null. * Corrected several methods documentation and code comments. Still not addressed: * Whether(and how) remove RELEASE_REQUEST flag direct manipulation for a just created StoreEntry.
Before this change, the code abused RELEASE_REQUEST flag setting, violating general 'releaseRequest()' API. For now, RELEASE_REQUEST flag means an (locked) entry marked for removal. As a consequence, such StoreEntry becomes not reachable for new store clients. On the other hand, if a StoreEntry is private, it is not usable by new store clients also. This patch aggregates these two related notions: all work of making StoreEntry 'unusable for new store clients' is performed now inside StoreEntry::setPrivateKey(). RELEASE_REQUEST flag now gets a proper meaning: it means a private StoreEntry which can't become public anymore. In other words, this new meaning implies that after setting RELEASE_REQUEST (making the StoreEntry 'permanently' private), the other code must not clear that flag (and change the key from private to public). Some existing EBIT_CLR(RELEASE_REQUEST) lines were in the way. It was carefully analysed that removing them is safe (i.e., there are no scenarios when StoreEntry has the flag set at these points).
Keeping to this new meaning introduced at 9cfa044, simplified with a new StoreEntry::permanentlyPrivate().
rousskov
left a comment
There was a problem hiding this comment.
I am done fixing and polishing this PR branch. I will freshen up the description and am asking @eduard-bagdasaryan to review my commits, but if there are no surprises, then this code can be finally committed.
After checking a few, I am going to assume that all build failures were those mysterious "reference is not a tree" Jenkins plugin(?) failures that I keep asking @kinkie about. I am going to ignore them in hope that the next build will magically avoid them. |
|
There are some real build errors remaining to be fixed in the bb533f0 test results. Please can one of you attend to those and get this merged into master ASAP. This is now not going into the Jan v4 beta, but I would really like it to have as long a period of use as possible for the Feb beta. |
|
Looks like these errors were caused by the fact that only a couple of SwapDir virtual members are overridden. We should override all such methods or none of them. |
|
@eduard-bagdasaryan, please remove those inconsistent overrides while you are reviewing the changes. |
* Removed unused EntryGuard::releaseLocked(). * Protected EntryGuard destructor from the throwing code inside of it.
... and honor this rule in Transients::evictCached().
|
Checking merge bot, attempt 2. |
... this is what previous 5c09e58 missed.
The removing of writing state from Transients was incorrect: there are several places where writers and readers behave differently. For example, broadcasting should only work when there are readers and should not work when there is only writer. Another example is checking for transient readers (giving incorrect results) inside cheCheckQuickAbortIsReasonable().
* Fixed Transients::addEntry() comment * Fixed wrong argument passing to StoreEntry::release() * Removed wrong statement in Controller::transientsCompleteWriting()
Before this change (and after 5c09e58) a Transients entry sometimes could not be properly evicted using Transients::evictCached() because the StoreEntry had already been disconnected from the Transients cache, while evicting shared memory and destroying memory object. To overcome this, we need to evict the entry from Transients before destroying memory object. It may seem that doing so would violate the "transients invariant" (every public StoreEntry must have the corresponding Transients entry), for a very short period of time. However this is not a problem, because the StoreEntry is not accessible via Controller interface after it has been evicted form Transients.
|
Status update: @eduard-bagdasaryan is investigating one potential problem discovered during his last review. Once he addresses that concern, I plan to merge these changes. |
|
Status update: Problem confirmed. The fix is being tested. |
StoreEntry::release() (during disk rebuild) and storeGetMemSpace() were purging unlocked entries from the memory cache without abandon()ing them. Purging an entry from a memory cache destroys mem_obj and, hence, its Transients (if any). Such silent Transients loss violates the public=hasTransients invariant and leads to bugs, especially if StoreEntry was not cached in memory and, hence, did not broadcast its eviction. Also do not memoryEvictCached() in StoreEntry::release() because all memoryEvictCached() calls are dangerous, and that call was not needed as releaseRequest() yields to a (safe and broader) evictCached() call. Dangerous purgeMem() is gone, and memoryEvictCached() is now private. Also possibly fixed purging of incomplete on-disk entries that managed to lose their mem_obj (in non-SMP caching environment): Controller::handleIdleEntry() was not releasing such entries because it relied on purgeMem() that did nothing without a mem_obj.
|
Tests failed with the usual I plan to merge this PR if the tests are successful. |
|
Three out of ten Jenkins tests failed with the usual
Given the combination of the above facts, I plan to ignore similar future failures for this PR and merge this PR in a few hours. P.S. I tried to restart just the failed Jenkins test runs, but could not find a way to do it. I then tried to delete the failed tests in hope that Jenkins will report an overall success to GitHub, but it looks like Jenkins deleted the whole build 312 (all tests) instead. Sigh. |
|
After triggering another round of tests, I forced-merged ignoring several Jenkins test failures. All failures were due to the same I expect these changes to trigger new and expose old bugs: While we did perform several rounds of fairly extensive lab testing, the vast scope and complexity of these changes virtually guarantee problems in some untested use cases. The next big step is to fix Squid bug 4579 mentioned as an important XXX in this PR commit message. We also need to officially record a dozen out-of-scope problems discovered while working on this PR. |
When Squid finds a requested entry in the memory cache, it does not check whether the same entry is also stored in a cache_dir. The StoreEntry object may become associated with its store entry in the memory cache but not with its store entry on disk. This inconsistency causes two known problems: 1. Squid may needlessly swap out the memory hit to disk, either overwriting an existing (and identical) disk entry or, worse, creating a duplicate entry on another disk. In the second case, the two disk entries are not synchronized and may eventually start to differ if one of them is removed or updated. 2. Squid may not delete a stale disk entry when needed, violating various HTTP MUSTs, and eventually serving stale [disk] cache entries to clients. Another purging problem is not caused by the above inconsistency: 3. A DELETE request or equivalent may come for the entry which is still locked for writing. Squid fails to get a lock for such an entry (in order to purge it) and the entry remains in disk and/or memory cache. To solve the first two problems: * StoreEntry::mayStartSwapout() now avoids needless swapouts by checking whether StoreEntry was fully loaded, is being loaded, or could have been loaded from disk. To be able to reject swapouts in the last case, we now require that the newer (disk) entries explicitly delete their older variants instead of relying on the Store to overwrite the older (unlocked) variant. That explicit delete should already be happening in higher-level code (that knows which entry is newer and must mark any stale entries for deletion anyway). To fix problem squid-cache#3: * A new Store::Controller::evictIfFound(key) method purges (or marks for deletion if purging is impossible) all the matching store entries, without loading the StoreEntry information from stores. Avoiding StoreEntry creation reduces waste of resources (the StoreEntry object would have to be deleted anyway) _and_ allows us to mark being-created entries (that are locked for writing and, hence, cannot be loaded into a StoreEntry object). XXX: SMP cache purges may continue to malfunction when the Transients table is missing. Currently, Transients are created only when the collapsed_forwarding is on. After Squid bug 4579 is fixed, every public StoreEntry will have the corresponding Transients entry and vice versa, extending these fixes to all SMP environments. Note that even if Squid properly avoids storing duplicate disk entries, some cache_dir manipulations by humans and Squid crashes may lead to such duplicates being present. This patch leaves dealing with potential duplicates out of scope except it guarantees that if an entry is deleted, then all [possible] duplicates are deleted as well. Fixing the above problems required (and/or benefited from) many related improvements, including some Store API changes. It is impractical to detail each change here, but several are highlighted below. To propagate DELETEs across workers, every public StoreEntry now has a Transients entry. Prevented concurrent cache readers from aborting when their entry is release()d. Unlike abort, release should not affect current readers. Fixed store.log code to avoid "Bug: Missing MemObject::storeId value". Removed Transients extras used to initialize MemObject StoreID/method in StoreEntry objects created by Transients::get() for collapsed requests. Controlled::get() and related Controller APIs do not _require_ setting those MemObject details: get() methods for all cache stores return StoreEntry objects without them (because entry basics lack Store ID and request method). The caller is responsible for cache key collision detection. Controlled::get() parameters could include Store ID and request method for early cache key collision detection, but adding a StoreQuery class and improving collision detection code is outside this project scope (and requires many changes). Found more cases where release() should not prevent sharing. Remaining cases need further analysis as discussed in master 39fe14b. Greatly simplified UFS store rebuilding, possibly fixing subtle bug(s). Clarified RELEASE_REQUEST flag meaning, becoming 'a private StoreEntry which can't become public anymore'. Refactored the related code, combining two related notions: 'a private entry' and 'an entry marked for removal'. Do not abort collapsed StoreEntries during syncing just because the corresponding being stored shared entry was marked for deletion. Abort them if the shared entry has been also aborted. Added StoreEntry helper methods to prevent direct manipulation of individual disk-related data members (swap_dirn, swap_filen, and swap_status). These methods help keep these related data members in a coherent state and minimize code duplication. Conflicts: src/MemObject.cc
When Squid finds a requested entry in the memory cache, it does not check whether the same entry is also stored in a cache_dir. The StoreEntry object may become associated with its store entry in the memory cache but not with its store entry on disk. This inconsistency causes two known problems: 1. Squid may needlessly swap out the memory hit to disk, either overwriting an existing (and identical) disk entry or, worse, creating a duplicate entry on another disk. In the second case, the two disk entries are not synchronized and may eventually start to differ if one of them is removed or updated. 2. Squid may not delete a stale disk entry when needed, violating various HTTP MUSTs, and eventually serving stale [disk] cache entries to clients. Another purging problem is not caused by the above inconsistency: 3. A DELETE request or equivalent may come for the entry which is still locked for writing. Squid fails to get a lock for such an entry (in order to purge it) and the entry remains in disk and/or memory cache. To solve the first two problems: * StoreEntry::mayStartSwapout() now avoids needless swapouts by checking whether StoreEntry was fully loaded, is being loaded, or could have been loaded from disk. To be able to reject swapouts in the last case, we now require that the newer (disk) entries explicitly delete their older variants instead of relying on the Store to overwrite the older (unlocked) variant. That explicit delete should already be happening in higher-level code (that knows which entry is newer and must mark any stale entries for deletion anyway). To fix problem squid-cache#3: * A new Store::Controller::evictIfFound(key) method purges (or marks for deletion if purging is impossible) all the matching store entries, without loading the StoreEntry information from stores. Avoiding StoreEntry creation reduces waste of resources (the StoreEntry object would have to be deleted anyway) _and_ allows us to mark being-created entries (that are locked for writing and, hence, cannot be loaded into a StoreEntry object). XXX: SMP cache purges may continue to malfunction when the Transients table is missing. Currently, Transients are created only when the collapsed_forwarding is on. After Squid bug 4579 is fixed, every public StoreEntry will have the corresponding Transients entry and vice versa, extending these fixes to all SMP environments. Note that even if Squid properly avoids storing duplicate disk entries, some cache_dir manipulations by humans and Squid crashes may lead to such duplicates being present. This patch leaves dealing with potential duplicates out of scope except it guarantees that if an entry is deleted, then all [possible] duplicates are deleted as well. Fixing the above problems required (and/or benefited from) many related improvements, including some Store API changes. It is impractical to detail each change here, but several are highlighted below. To propagate DELETEs across workers, every public StoreEntry now has a Transients entry. Prevented concurrent cache readers from aborting when their entry is release()d. Unlike abort, release should not affect current readers. Fixed store.log code to avoid "Bug: Missing MemObject::storeId value". Removed Transients extras used to initialize MemObject StoreID/method in StoreEntry objects created by Transients::get() for collapsed requests. Controlled::get() and related Controller APIs do not _require_ setting those MemObject details: get() methods for all cache stores return StoreEntry objects without them (because entry basics lack Store ID and request method). The caller is responsible for cache key collision detection. Controlled::get() parameters could include Store ID and request method for early cache key collision detection, but adding a StoreQuery class and improving collision detection code is outside this project scope (and requires many changes). Found more cases where release() should not prevent sharing. Remaining cases need further analysis as discussed in master 39fe14b. Greatly simplified UFS store rebuilding, possibly fixing subtle bug(s). Clarified RELEASE_REQUEST flag meaning, becoming 'a private StoreEntry which can't become public anymore'. Refactored the related code, combining two related notions: 'a private entry' and 'an entry marked for removal'. Do not abort collapsed StoreEntries during syncing just because the corresponding being stored shared entry was marked for deletion. Abort them if the shared entry has been also aborted. Added StoreEntry helper methods to prevent direct manipulation of individual disk-related data members (swap_dirn, swap_filen, and swap_status). These methods help keep these related data members in a coherent state and minimize code duplication. Conflicts: src/MemObject.cc
When Squid finds a requested entry in the memory cache, it does not check whether the same entry is also stored in a cache_dir. The StoreEntry object may become associated with its store entry in the memory cache but not with its store entry on disk. This inconsistency causes two known problems: 1. Squid may needlessly swap out the memory hit to disk, either overwriting an existing (and identical) disk entry or, worse, creating a duplicate entry on another disk. In the second case, the two disk entries are not synchronized and may eventually start to differ if one of them is removed or updated. 2. Squid may not delete a stale disk entry when needed, violating various HTTP MUSTs, and eventually serving stale [disk] cache entries to clients. Another purging problem is not caused by the above inconsistency: 3. A DELETE request or equivalent may come for the entry which is still locked for writing. Squid fails to get a lock for such an entry (in order to purge it) and the entry remains in disk and/or memory cache. To solve the first two problems: * StoreEntry::mayStartSwapout() now avoids needless swapouts by checking whether StoreEntry was fully loaded, is being loaded, or could have been loaded from disk. To be able to reject swapouts in the last case, we now require that the newer (disk) entries explicitly delete their older variants instead of relying on the Store to overwrite the older (unlocked) variant. That explicit delete should already be happening in higher-level code (that knows which entry is newer and must mark any stale entries for deletion anyway). To fix problem #3: * A new Store::Controller::evictIfFound(key) method purges (or marks for deletion if purging is impossible) all the matching store entries, without loading the StoreEntry information from stores. Avoiding StoreEntry creation reduces waste of resources (the StoreEntry object would have to be deleted anyway) _and_ allows us to mark being-created entries (that are locked for writing and, hence, cannot be loaded into a StoreEntry object). XXX: SMP cache purges may continue to malfunction when the Transients table is missing. Currently, Transients are created only when the collapsed_forwarding is on. After Squid bug 4579 is fixed, every public StoreEntry will have the corresponding Transients entry and vice versa, extending these fixes to all SMP environments. Note that even if Squid properly avoids storing duplicate disk entries, some cache_dir manipulations by humans and Squid crashes may lead to such duplicates being present. This patch leaves dealing with potential duplicates out of scope except it guarantees that if an entry is deleted, then all [possible] duplicates are deleted as well. Fixing the above problems required (and/or benefited from) many related improvements, including some Store API changes. It is impractical to detail each change here, but several are highlighted below. To propagate DELETEs across workers, every public StoreEntry now has a Transients entry. Prevented concurrent cache readers from aborting when their entry is release()d. Unlike abort, release should not affect current readers. Fixed store.log code to avoid "Bug: Missing MemObject::storeId value". Removed Transients extras used to initialize MemObject StoreID/method in StoreEntry objects created by Transients::get() for collapsed requests. Controlled::get() and related Controller APIs do not _require_ setting those MemObject details: get() methods for all cache stores return StoreEntry objects without them (because entry basics lack Store ID and request method). The caller is responsible for cache key collision detection. Controlled::get() parameters could include Store ID and request method for early cache key collision detection, but adding a StoreQuery class and improving collision detection code is outside this project scope (and requires many changes). Found more cases where release() should not prevent sharing. Remaining cases need further analysis as discussed in master 39fe14b. Greatly simplified UFS store rebuilding, possibly fixing subtle bug(s). Clarified RELEASE_REQUEST flag meaning, becoming 'a private StoreEntry which can't become public anymore'. Refactored the related code, combining two related notions: 'a private entry' and 'an entry marked for removal'. Do not abort collapsed StoreEntries during syncing just because the corresponding being stored shared entry was marked for deletion. Abort them if the shared entry has been also aborted. Added StoreEntry helper methods to prevent direct manipulation of individual disk-related data members (swap_dirn, swap_filen, and swap_status). These methods help keep these related data members in a coherent state and minimize code duplication. Conflicts: src/MemObject.cc
When Squid finds a requested entry in the memory cache, it does not
check whether the same entry is also stored in a cache_dir. The
StoreEntry object may become associated with its store entry in the
memory cache but not with its store entry on disk. This inconsistency
causes two known problems:
Squid may needlessly swap out the memory hit to disk, either
overwriting an existing (and identical) disk entry or, worse,
creating a duplicate entry on another disk. In the second case, the
two disk entries are not synchronized and may eventually start to
differ if one of them is removed or updated.
Squid may not delete a stale disk entry when needed, violating
various HTTP MUSTs, and eventually serving stale [disk] cache entries
to clients.
Another purging problem is not caused by the above inconsistency:
locked for writing. Squid fails to get a lock for such an entry (in
order to purge it) and the entry remains in disk and/or memory cache.
To solve the first two problems:
whether StoreEntry was fully loaded, is being loaded, or could have
been loaded from disk. To be able to reject swapouts in the last case,
we now require that the newer (disk) entries explicitly delete their
older variants instead of relying on the Store to overwrite the older
(unlocked) variant. That explicit delete should already be happening
in higher-level code (that knows which entry is newer and must mark
any stale entries for deletion anyway).
To fix problem #3:
deletion if purging is impossible) all the matching store entries,
without loading the StoreEntry information from stores. Avoiding
StoreEntry creation reduces waste of resources (the StoreEntry object
would have to be deleted anyway) and allows us to mark being-created
entries (that are locked for writing and, hence, cannot be loaded into
a StoreEntry object).
Note that even if Squid properly avoids storing duplicate disk entries,
some cache_dir manipulations by humans and Squid crashes may lead to
such duplicates being present. This patch leaves dealing with potential
duplicates out of scope except it guarantees that if an entry is
deleted, then all [possible] duplicates are deleted as well.
Fixing the above problems required (and/or benefited from) many related
improvements, including some Store API changes. It is impractical to
detail each change here, but several are highlighted below.
To propagate DELETEs across workers, every public StoreEntry now has a
Transients entry.
Prevented concurrent cache readers from aborting when their entry is
release()d. Unlike abort, release should not affect current readers.
Fixed store.log code to avoid "Bug: Missing MemObject::storeId value".
Removed Transients extras used to initialize MemObject StoreID/method in
StoreEntry objects created by Transients::get() for collapsed requests.
Controlled::get() and related Controller APIs do not require setting
those MemObject details: get() methods for all cache stores return
StoreEntry objects without them (because entry basics lack Store ID and
request method). The caller is responsible for cache key collision
detection. Controlled::get() parameters could include Store ID and
request method for early cache key collision detection, but adding a
StoreQuery class and improving collision detection code is outside this
project scope (and requires many changes).
Found more cases where release() should not prevent sharing.
Remaining cases need further analysis as discussed in master 39fe14b.
Greatly simplified UFS store rebuilding, possibly fixing subtle bug(s).
Clarified RELEASE_REQUEST flag meaning, becoming 'a private StoreEntry
which can't become public anymore'. Refactored the related code,
combining two related notions: 'a private entry' and 'an entry marked
for removal'.
Do not abort collapsed StoreEntries during syncing just because the
corresponding being stored shared entry was marked for deletion. Abort
them if the shared entry has been also aborted.
Added StoreEntry helper methods to prevent direct manipulation of
individual disk-related data members (swap_dirn, swap_filen, and
swap_status). These methods help keep these related data members in a
coherent state and minimize code duplication.
XXX: SMP cache purges may continue to malfunction when the Transients
table is missing. Currently, Transients are created only when the
collapsed_forwarding is on. After Squid bug 4579 is fixed, every public
StoreEntry will have the corresponding Transients entry and vice versa,
extending these fixes to all SMP environments.