Bug 4616: store_client.cc:92: "mem" assertion#50
Conversation
|
rebuild |
3 similar comments
|
rebuild |
|
rebuild |
|
rebuild |
|
rebuild |
rousskov
left a comment
There was a problem hiding this comment.
Please check whether we actually need both createMemObject() and ensureMemObject() methods. In other words, do existing createMemObject() callers overwrite the old URL/method by choice or by accident?
For example, why would purgeFoundObject() want to overwrite the request method (or the URL) of the found cache entry? And overwrite the request method with what, the DELETE method?!
Looks like this 'example' is the only one where using createMemObject() is controversial, because other usecases have a StoreEntry just created. When purging (a "PURGE" request), we get first the corresponding StoreEntry from storage ( using METHOD_GET for that) and then reset the method to the actual, i.e., METHOD_PURGE. I assume this is needed, e.g., to do correct logging into store.log (if enabled via cache_store_log parameter). |
If true, then purgeFoundObject() is currently the only possible justification for having both createMemObject() and ensureMemObject() methods.
Your assumption sounds strange to me because the same StoreEntry object may be shared by many Store clients -- modifying it using a particular client information seems wrong. Each transaction may log its to-Squid method ( It is possible that the old code is buggy in this area. If so: Fixing that bug is probably outside this PR scope, but we should add an XXX to mark the bug and note that one of the methods should be removed once that bug is fixed. It may be a good idea to mark the "wrong" or "extra" method name with an XXX so that others are not tempted to use it. |
The old code already has a TODO inside clientReplyContext::purgeRequestFindObjectToPurge(): I am not yet sure what are 'wrong' methods which should be removed, but to fix these problems we probably need to rework the last code block inside clientReplyContext::purgeRequest() with purgeAllCached(). |
That feels out of scope to me. Edit: Sorry, I was thinking about the other pending PR. This PR is still small, but fixing method iteration in purgeRequestFindObjectToPurge() sounds completely out of this PR scope. |
|
Jenkins test |
|
rebuild |
2 similar comments
|
rebuild |
|
rebuild |
|
@eduard-bagdasaryan, looks like the basic Jenkins build test is working. Please switch to the full "matrix" tests when you get a chance. |
FYI: as you can see, the context is 'default' for Jenkins check. Jenkins allows to configure this, but this option works incorrectly, confusing Github with wrong statuses. There is an open bug issue, describing this. |
I did not notice that earlier, but I do see now that GitHub refers to the check as "default" instead of something more reasonable like "Jenkins build test" or "basic build test".
If the label "default" works, we can continue using that uninformative label until the Jenkins bug is fixed. If we add another independent/parallel test (job?) to Jenkins (e.g., Co-Advisor or Coverity), will GitHub get confused with two Jenkins jobs having the same "default" label? In other words, does this bug limit us to using just one Jenkins job when integrating with GitHub CI? |
|
rebuild |
I found a workaround for this bug: we should configure this context for each job independently. This can be done inside "Trigger Setup" withing job configuration. Moreover, I think this is the only correct way to manage our configuration, since we will have several independent jobs with different names, like "Jenkins(build test)", "Jenkins(Co-Advisor test), etc.
I configured our Jenkins "5-pr-test" job, but the "default" status is stuck still there (due to that bug). I checked on other(test) project that new pull requests will not have this "default", only proper "Jenkins(build test)" (and other) build statuses.
Github is not confused with the same "default". Though all related jobs are triggered via Webhook in this case, Github displays only one status for one of the completed jobs (probably the last finished). That is not a problem, once we configure a specific 'build context' for each job (as I described above). |
|
rebuild |
rousskov
left a comment
There was a problem hiding this comment.
Refreshing the review because the author thought that all change requests have been addressed, and I disagree: Please avoid adding a second mem_obj initialization method/logic if the code can be adjusted to use a single method/logic.
You have argued that (only) the entry purging code may need a special method/logic, but there were serious doubts in the validity of that argument.
|
rebuild |
Please note that I just rephrased what the code does. These facts were not some secret knowledge that I magically possessed before writing that comment. Discovering that knowledge required manual labor, but I am sure you could have done it as well or better.
AFAICT, we should not separate the two URIs from the method -- those three pieces always go together. No valid StoreEntry user may want to change just one them. Please correct me if I am wrong. If I am right, then to make all three constant, we would have to provide all three at MemObject construction time. I think that is doable, but I did not check carefully. If you think it would work well, please try to implement it. |
|
@rousskov, I made Jenkins do complete build tests with test_builds.sh, so you can mark this check as 'required' if needed. |
|
rebuild |
I would not be surprised if The current primitive Whether we should ignore errors from out-of-sync builds or not test out-of-sync builds is an open question. |
Done. IIRC, you have added stable Jenkins nodes to that test. When you get a chance, please rename that job from |
src/MemStore.cc
Outdated
| // store the response for the Root().get() callers to be happy because they | ||
| // expect IN_MEMORY entries to already have the response headers and body. | ||
| e->makeMemObject(); | ||
| e->ensureMemObject(nullptr, nullptr, HttpRequestMethod()); |
There was a problem hiding this comment.
Something does not add up here. This ensureMemObject(nullptr, nullptr...) call will always trigger the (moved) Bug: Missing MemObject::storeId warning, right? I am sure you can adjust the code further to avoid that warning in this specific case, but does not this problem tell us that we cannot make the URIs/method trio constant?! AFAICT, here we are creating a StoreEntry with MemObject that lacks any URIs/method information. After this PR, this StoreEntry will remain in this half-baked state because nobody will be able to reset URIs/method. Would not exposing Store clients to such half-baked StoreEntry/MemObject objects create all sorts of problems?
I am guessing that before this PR the half-baked StoreEntry/MemObject were updated with URIs/method information as soon as that info was loaded from the memory cache by/in MemStore::copyFromShm() or perhaps shortly after that.
If my suspicions are correct, then either we have to go back to mutable URIs/method trio (at least for the specific case where the original trio was empty) or we need to reshape this code so that it does not need the MemObject object until it knows what the actual URIs/method trio is. The latter approach would address the above XXX comment, so it is better in general, but I suspect it may not be feasible without revamping a lot of code. In that case, we would have to allow trio updates, at least from the empty to non-empty state.
There are other small problems with the recent changes, but let's address this big issue first.
|
rebuild |
rousskov
left a comment
There was a problem hiding this comment.
I really do not like how the semi-repeated calls in this PR iteration look. Please see the code comments for a specific fix suggestion.
src/MemObject.cc
Outdated
| vary_headers(nullptr) | ||
| vary_headers(nullptr), | ||
| storeId_(aStoreId), | ||
| logUri_((aLogUri == aStoreId) ? nullptr : aLogUri) |
There was a problem hiding this comment.
Until the trio becomes constant, I do not think it is a good idea to silently duplicate this tricky trio initialization logic (and the URL_CHECKSUM_DEBUG hack). If the sketch described in another change request is implemented, then the old constructor may continue to work because setUris() will only be called with a set/non-nil trio.
If you do need to add a MemObject(trio) constructor for some reason, then the new constructor can probably call the old MemObject(void) constructor (C++11 allows constructor reuse) and then call setUris().
There was a problem hiding this comment.
Undone previous changes, postponed creating MemObject(trio) until it is required by other changes.
src/MemObject.h
Outdated
|
|
||
| /// sets store ID, log URI, and request method; TODO: find a better name | ||
| /// XXX: remove this method and make corresponding URI fields constant | ||
| /// when another XXX within MemStore::get() is addressed. |
There was a problem hiding this comment.
Please s/another XXX within /the XXX in/
src/MemObject.h
Outdated
|
|
||
| /// whether setUris() has been called | ||
| /// Whether entry StoreID was provided. | ||
| /// TODO: probably misnamed. |
There was a problem hiding this comment.
If the sketch described in another change request is implemented, then the old setUri() documentation will remain valid because setUris() will only be called with a set/non-nil trio.
src/Store.h
Outdated
| MemObject *makeMemObject(); | ||
|
|
||
| /// initialize mem_obj member (if needed) and supply URI-related info | ||
| /// initialize mem_obj (if needed) |
There was a problem hiding this comment.
If the sketch described in another change request is implemented, then this documentation will need to be adjusted further. Please check other affected methods as well.
src/client_side_reply.cc
Outdated
|
|
||
| if (entry) { | ||
| entry->createMemObject(url, http->log_uri, http->request->method); | ||
| entry->mem_obj->setUris(url, http->log_uri, http->request->method); |
There was a problem hiding this comment.
This repetition and similar code elsewhere in this PR look really bad. We need to find a better way to express what we know. AFAICT, we need to handle three or four distinct cases (from the caller point of view):
-
The caller knows that the StoreEntry object does not have a MemObject. The caller needs to create the MemObject but the caller does not have the trio information yet. This case is related to the old MemStore problem that we are not going to fix in this PR (and that makes it impossible to make the trio members constant).
-
The caller knows that the StoreEntry object does not have a MemObject. The caller needs to create the MemObject and set its trio information.
-
The caller knows that the StoreEntry object has a MemObject. The caller needs to update the MemObject trio if it is unset. I am not sure these cases actually exist.
-
The caller does not know whether the StoreEntry object has a MemObject member. If the StoreEntry does not have MemObject, then the caller needs to create one (with the caller-supplied trio). Otherwise, the caller needs to update the MemObject trio if it is unset.
I suggest using the following APIs to support all of the above cases:
MemObject::MemObject(void): Old MemObject constructor. No changes here.MemObject::setUris(trio): Old method adjusted to do nothing when hasUris() is already true and to assert that hasUris() becomes true otherwise.- `StoreEntry::ensureMemObject(trio): Creates MemObject if needed and always calls setUri().
- `StoreEntry::createMemObject(trio): Asserts that there is no MemObject and calls ensureMemObject().
- `StoreEntry::createMemObject(void): Asserts that there is no MemObject and creates one.
When/if we no longer need to update the trio, MemObject(void) and setUris() will disappear, with their code merged into MemObject(trio). The createMemObject(void) method will disappear as well as unneeded.
The use cases listed earlier can use the following APIs:
- createMemObject(void)
- createMemObject(trio)
assert(mem_obj); mem_obj->setUris(trio)- ensureMemObject(trio)
Since each case gets its own API, it would be easier to find and adjust them later. Again, I hope that case 3 does not actually exist.
If you do not have a better idea, please see whether the above sketch results in a better looking but still correct code.
There was a problem hiding this comment.
I decided to implement your sketch "as is".
4d33a73 to
0d4baf8
Compare
|
I removed the setUris() exit assertion, added a few polishing touches, rebased, and squashed. You can still see individual changes at SQUID-314-bug-4616-mem-assertion.bak @eduard-bagdasaryan, if the tests pass, do you have any objections to merging Edit: I had to force push a trivial change in hope to awake SemaphoreCI that got stuck/confused. That did not help so I will try to find another trick to unstuck it. |
This bug was probably caused by Bug 2833 feature/fix (1a210de). The primary fix here is limited to clientReplyContext::processExpired(): Collapsed forwarding code must ensure StoreEntry::mem_obj existence. It was missing for cache hits purged from (or never admitted into) the memory cache. Most storeClientListAdd() callers either have similar code or call storeCreateEntry() which also creates StoreEntry::mem_obj. Also avoided clobbering known StoreEntry URIs/method in some cases. The known effect of this change is fixed store.log URI and method fields when a hit transaction did not match the stored entry exactly (e.g., a HEAD hit for a GET cached entry), but this improvement may have even more important consequences: The original method is used by possibly still-running entry filling code (e.g., determining the end of the incoming response, validating the entry length, finding vary markers, etc.). Changing the method affects those actions, essentially corrupting the entry state. The same argument may apply to store ID and log URI. We even tried to make URIs/method constant, but that is impractical w/o addressing an XXX in MemStore::get(), which is outside this issue scope. To facilitate that future fix, the code now distinguishes these cases: * createMemObject(void): Buggy callers that create a new memory object but do not know what URIs/method the hosting StoreEntry was based on. Once these callers are fixed, we can make the URIs/method constant. * createMemObject(trio): Callers that create a new memory object with URIs/method that match the hosting StoreEntry. * ensureMemObject(trio): Callers that are not sure whether StoreEntry has a memory object but have URIs/method to create one if needed.
0d4baf8 to
4da97c7
Compare
|
For the record, closing and reopening this PR unstuck Semaphore CI. Please note that not everybody has the right to reopen and that pushing into the branch of a closed PR may force you to jump through additional hoops later. |
I do not mind merging it. |
This bug was probably caused by Bug 2833 feature/fix (1a210de). The primary fix here is limited to clientReplyContext::processExpired(): Collapsed forwarding code must ensure StoreEntry::mem_obj existence. It was missing for cache hits purged from (or never admitted into) the memory cache. Most storeClientListAdd() callers either have similar code or call storeCreateEntry() which also creates StoreEntry::mem_obj. Also avoided clobbering known StoreEntry URIs/method in some cases. The known effect of this change is fixed store.log URI and method fields when a hit transaction did not match the stored entry exactly (e.g., a HEAD hit for a GET cached entry), but this improvement may have even more important consequences: The original method is used by possibly still-running entry filling code (e.g., determining the end of the incoming response, validating the entry length, finding vary markers, etc.). Changing the method affects those actions, essentially corrupting the entry state. The same argument may apply to store ID and log URI. We even tried to make URIs/method constant, but that is impractical w/o addressing an XXX in MemStore::get(), which is outside this issue scope. To facilitate that future fix, the code now distinguishes these cases: * createMemObject(void): Buggy callers that create a new memory object but do not know what URIs/method the hosting StoreEntry was based on. Once these callers are fixed, we can make the URIs/method constant. * createMemObject(trio): Callers that create a new memory object with URIs/method that match the hosting StoreEntry. * ensureMemObject(trio): Callers that are not sure whether StoreEntry has a memory object but have URIs/method to create one if needed.
This bug was probably caused by Bug 2833 feature/fix (1a210de). The primary fix here is limited to clientReplyContext::processExpired(): Collapsed forwarding code must ensure StoreEntry::mem_obj existence. It was missing for cache hits purged from (or never admitted into) the memory cache. Most storeClientListAdd() callers either have similar code or call storeCreateEntry() which also creates StoreEntry::mem_obj. Also avoided clobbering known StoreEntry URIs/method in some cases. The known effect of this change is fixed store.log URI and method fields when a hit transaction did not match the stored entry exactly (e.g., a HEAD hit for a GET cached entry), but this improvement may have even more important consequences: The original method is used by possibly still-running entry filling code (e.g., determining the end of the incoming response, validating the entry length, finding vary markers, etc.). Changing the method affects those actions, essentially corrupting the entry state. The same argument may apply to store ID and log URI. We even tried to make URIs/method constant, but that is impractical w/o addressing an XXX in MemStore::get(), which is outside this issue scope. To facilitate that future fix, the code now distinguishes these cases: * createMemObject(void): Buggy callers that create a new memory object but do not know what URIs/method the hosting StoreEntry was based on. Once these callers are fixed, we can make the URIs/method constant. * createMemObject(trio): Callers that create a new memory object with URIs/method that match the hosting StoreEntry. * ensureMemObject(trio): Callers that are not sure whether StoreEntry has a memory object but have URIs/method to create one if needed.
This bug was probably caused by Bug 2833 feature/fix (39fe14b).
Collapsed forwarding code must ensure StoreEntry::mem_obj existence. It
is missing for hits purged from (or never addmitted into) the memory
cache. Most storeClientListAdd() callers either have similar code or
call storeCreateEntry() which also creates StoreEntry::mem_obj.
Also: eliminated code duplication with StoreEntry::createMemObject().
XXX: another(related) problem was discovered: MemObject URIs/method
should not be overwritten. For example, it is wrong to change
MemObject::method when another request, having different method, hits
this entry, because the originally stored method is used for many purposes
(determining size of the incoming response, validating entry length,
finding vary markers, etc.). We tried to fix this problem, making
MemObject URIs/method constant and removing MemObject::setUris(). This
attempt was rejected, because of a negative effect for shared memory
cache: URIs/method would stay uninitialized. Since this is the only
place where MemObject is created without providing URIs/method
information, we decided to postpone these changes until the related XXX
within MemStore::get() is fixed.