Skip to content

Bug 4616: store_client.cc:92: "mem" assertion#50

Merged
rousskov merged 1 commit intosquid-cache:masterfrom
measurement-factory:SQUID-314-bug-4616-mem-assertion
Sep 15, 2017
Merged

Bug 4616: store_client.cc:92: "mem" assertion#50
rousskov merged 1 commit intosquid-cache:masterfrom
measurement-factory:SQUID-314-bug-4616-mem-assertion

Conversation

@eduard-bagdasaryan
Copy link
Contributor

@eduard-bagdasaryan eduard-bagdasaryan commented Aug 21, 2017

This bug was probably caused by Bug 2833 feature/fix (39fe14b).

Collapsed forwarding code must ensure StoreEntry::mem_obj existence. It
is missing for hits purged from (or never addmitted into) the memory
cache. Most storeClientListAdd() callers either have similar code or
call storeCreateEntry() which also creates StoreEntry::mem_obj.

Also: eliminated code duplication with StoreEntry::createMemObject().

XXX: another(related) problem was discovered: MemObject URIs/method
should not be overwritten. For example, it is wrong to change
MemObject::method when another request, having different method, hits
this entry, because the originally stored method is used for many purposes
(determining size of the incoming response, validating entry length,
finding vary markers, etc.). We tried to fix this problem, making
MemObject URIs/method constant and removing MemObject::setUris(). This
attempt was rejected, because of a negative effect for shared memory
cache: URIs/method would stay uninitialized. Since this is the only
place where MemObject is created without providing URIs/method
information, we decided to postpone these changes until the related XXX
within MemStore::get() is fixed.

@eduard-bagdasaryan
Copy link
Contributor Author

rebuild

3 similar comments
@eduard-bagdasaryan
Copy link
Contributor Author

rebuild

@eduard-bagdasaryan
Copy link
Contributor Author

rebuild

@eduard-bagdasaryan
Copy link
Contributor Author

rebuild

@eduard-bagdasaryan
Copy link
Contributor Author

rebuild

@yadij yadij added the S-waiting-for-committer privileged action is expected (and usually required) label Aug 21, 2017
Copy link
Contributor

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check whether we actually need both createMemObject() and ensureMemObject() methods. In other words, do existing createMemObject() callers overwrite the old URL/method by choice or by accident?

For example, why would purgeFoundObject() want to overwrite the request method (or the URL) of the found cache entry? And overwrite the request method with what, the DELETE method?!

@rousskov rousskov added S-waiting-for-author author action is expected (and usually required) and removed S-waiting-for-committer privileged action is expected (and usually required) labels Aug 21, 2017
@eduard-bagdasaryan
Copy link
Contributor Author

For example, why would purgeFoundObject() want to overwrite the request method (or the URL) of the found cache entry? And overwrite the request method with what, the DELETE method?!

Looks like this 'example' is the only one where using createMemObject() is controversial, because other usecases have a StoreEntry just created. When purging (a "PURGE" request), we get first the corresponding StoreEntry from storage ( using METHOD_GET for that) and then reset the method to the actual, i.e., METHOD_PURGE. I assume this is needed, e.g., to do correct logging into store.log (if enabled via cache_store_log parameter).

@rousskov
Copy link
Contributor

rousskov commented Aug 24, 2017

Looks like purgeFoundObject() is the only one where using createMemObject() is controversial, because other usecases have a StoreEntry just created.

If true, then purgeFoundObject() is currently the only possible justification for having both createMemObject() and ensureMemObject() methods.

When purging (a "PURGE" request), we get first the corresponding StoreEntry from storage ( using METHOD_GET for that) and then reset the method to the actual, i.e., METHOD_PURGE. I assume this is needed, e.g., to do correct logging into store.log (if enabled via cache_store_log parameter).

Your assumption sounds strange to me because the same StoreEntry object may be shared by many Store clients -- modifying it using a particular client information seems wrong. Each transaction may log its to-Squid method (%>rm) and/or the original from-Squid method (%<rm), but not the method of the last request that found the StoreEntry object:

[http::]rm    Request method (GET/POST etc)
[http::]>rm  Request method from client
[http::]<rm  Request method sent to server or peer

It is possible that the old code is buggy in this area. If so: Fixing that bug is probably outside this PR scope, but we should add an XXX to mark the bug and note that one of the methods should be removed once that bug is fixed. It may be a good idea to mark the "wrong" or "extra" method name with an XXX so that others are not tempted to use it.

@eduard-bagdasaryan
Copy link
Contributor Author

It is possible that the old code is buggy in this area. If so: Fixing that bug is probably outside this PR scope, but we should add an XXX to mark the bug and note that one of the methods should be removed once that bug is fixed. It may be a good idea to mark the "wrong" or "extra" method name with an XXX so that others are not tempted to use it.

The old code already has a TODO inside clientReplyContext::purgeRequestFindObjectToPurge():

// TODO: can we use purgeAllCached() here instead of doing the
// getPublicByRequestMethod() dance?

I am not yet sure what are 'wrong' methods which should be removed, but to fix these problems we probably need to rework the last code block inside clientReplyContext::purgeRequest() with purgeAllCached().

@rousskov
Copy link
Contributor

rousskov commented Aug 24, 2017

to fix these problems we probably need to rework the last code block inside clientReplyContext::purgeRequest() with purgeAllCached().

That feels out of scope to me. The PR is already huge. Let's not make it even bigger if we do not have to.

Edit: Sorry, I was thinking about the other pending PR. This PR is still small, but fixing method iteration in purgeRequestFindObjectToPurge() sounds completely out of this PR scope.

@squid-prbot
Copy link
Collaborator

Jenkins test

@eduard-bagdasaryan
Copy link
Contributor Author

rebuild

2 similar comments
@eduard-bagdasaryan
Copy link
Contributor Author

rebuild

@eduard-bagdasaryan
Copy link
Contributor Author

rebuild

@rousskov
Copy link
Contributor

@eduard-bagdasaryan, looks like the basic Jenkins build test is working. Please switch to the full "matrix" tests when you get a chance.

@eduard-bagdasaryan
Copy link
Contributor Author

looks like the basic Jenkins build test is working

FYI: as you can see, the context is 'default' for Jenkins check. Jenkins allows to configure this, but this option works incorrectly, confusing Github with wrong statuses. There is an open bug issue, describing this.

@rousskov
Copy link
Contributor

as you can see, the context is 'default' for Jenkins check.

I did not notice that earlier, but I do see now that GitHub refers to the check as "default" instead of something more reasonable like "Jenkins build test" or "basic build test".

Jenkins allows to configure this, but this option works incorrectly, confusing Github with wrong statuses. There is an open bug issue, describing this.

If the label "default" works, we can continue using that uninformative label until the Jenkins bug is fixed.

If we add another independent/parallel test (job?) to Jenkins (e.g., Co-Advisor or Coverity), will GitHub get confused with two Jenkins jobs having the same "default" label? In other words, does this bug limit us to using just one Jenkins job when integrating with GitHub CI?

@eduard-bagdasaryan
Copy link
Contributor Author

rebuild

@eduard-bagdasaryan
Copy link
Contributor Author

Jenkins allows to configure this, but this option works incorrectly, confusing Github with wrong statuses.

I found a workaround for this bug: we should configure this context for each job independently. This can be done inside "Trigger Setup" withing job configuration. Moreover, I think this is the only correct way to manage our configuration, since we will have several independent jobs with different names, like "Jenkins(build test)", "Jenkins(Co-Advisor test), etc.

instead of something more reasonable like "Jenkins build test" or "basic build test"

I configured our Jenkins "5-pr-test" job, but the "default" status is stuck still there (due to that bug). I checked on other(test) project that new pull requests will not have this "default", only proper "Jenkins(build test)" (and other) build statuses.

If we add another independent/parallel test (job?) to Jenkins (e.g., Co-Advisor or Coverity), will GitHub get confused with two Jenkins jobs having the same "default" label?

Github is not confused with the same "default". Though all related jobs are triggered via Webhook in this case, Github displays only one status for one of the completed jobs (probably the last finished). That is not a problem, once we configure a specific 'build context' for each job (as I described above).

@eduard-bagdasaryan
Copy link
Contributor Author

rebuild

@rousskov rousskov added S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box and removed S-waiting-for-author author action is expected (and usually required) labels Aug 28, 2017
Copy link
Contributor

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refreshing the review because the author thought that all change requests have been addressed, and I disagree: Please avoid adding a second mem_obj initialization method/logic if the code can be adjusted to use a single method/logic.

You have argued that (only) the entry purging code may need a special method/logic, but there were serious doubts in the validity of that argument.

@rousskov rousskov added S-waiting-for-author author action is expected (and usually required) and removed S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box labels Aug 28, 2017
@eduard-bagdasaryan
Copy link
Contributor Author

rebuild

@rousskov
Copy link
Contributor

rousskov commented Sep 1, 2017

Thanks for describing this in detail

Please note that I just rephrased what the code does. These facts were not some secret knowledge that I magically possessed before writing that comment. Discovering that knowledge required manual labor, but I am sure you could have done it as well or better.

remove third parameter from MemObject::setUris()

AFAICT, we should not separate the two URIs from the method -- those three pieces always go together. No valid StoreEntry user may want to change just one them. Please correct me if I am wrong.

If I am right, then to make all three constant, we would have to provide all three at MemObject construction time. I think that is doable, but I did not check carefully. If you think it would work well, please try to implement it.

@eduard-bagdasaryan
Copy link
Contributor Author

Though SemaphoreCI reported about whitespace errors, it is unrelated to this PR and caused by 5107d2c and 70cfe22 master revisions.

@eduard-bagdasaryan
Copy link
Contributor Author

@rousskov, I made Jenkins do complete build tests with test_builds.sh, so you can mark this check as 'required' if needed.

@eduard-bagdasaryan
Copy link
Contributor Author

rebuild

@rousskov
Copy link
Contributor

rousskov commented Sep 9, 2017

Though SemaphoreCI reported about whitespace errors, it is unrelated to this PR and caused by 5107d2c and 70cfe22 master revisions.

I would not be surprised if git diff --check may report false errors, but it is probably pointless to discuss them until the branch is in sync with master because the diff for an an out-of-sync branch is different that a diff for an in-sync branch. This is one more case where the check should be done on the fully merged and about-to-be-committed code rather than some approximation of that.

The current primitive git diff --check on Semaphore CI essentially takes whatever branch GitHub/Semaphore offers it and compares it to the guessed branching point in master. That is not the right approach. The right approach is to do all git manipulations that an auto-merging plugin/bot would do up to "git commit" and then run git diff --check.

Whether we should ignore errors from out-of-sync builds or not test out-of-sync builds is an open question.

@rousskov
Copy link
Contributor

rousskov commented Sep 9, 2017

I made Jenkins do complete build tests with test_builds.sh, so you can mark this check as 'required' if needed.

Done. IIRC, you have added stable Jenkins nodes to that test. When you get a chance, please rename that job from Jenkins(build test) to Comprehensive build test.

src/MemStore.cc Outdated
// store the response for the Root().get() callers to be happy because they
// expect IN_MEMORY entries to already have the response headers and body.
e->makeMemObject();
e->ensureMemObject(nullptr, nullptr, HttpRequestMethod());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something does not add up here. This ensureMemObject(nullptr, nullptr...) call will always trigger the (moved) Bug: Missing MemObject::storeId warning, right? I am sure you can adjust the code further to avoid that warning in this specific case, but does not this problem tell us that we cannot make the URIs/method trio constant?! AFAICT, here we are creating a StoreEntry with MemObject that lacks any URIs/method information. After this PR, this StoreEntry will remain in this half-baked state because nobody will be able to reset URIs/method. Would not exposing Store clients to such half-baked StoreEntry/MemObject objects create all sorts of problems?

I am guessing that before this PR the half-baked StoreEntry/MemObject were updated with URIs/method information as soon as that info was loaded from the memory cache by/in MemStore::copyFromShm() or perhaps shortly after that.

If my suspicions are correct, then either we have to go back to mutable URIs/method trio (at least for the specific case where the original trio was empty) or we need to reshape this code so that it does not need the MemObject object until it knows what the actual URIs/method trio is. The latter approach would address the above XXX comment, so it is better in general, but I suspect it may not be feasible without revamping a lot of code. In that case, we would have to allow trio updates, at least from the empty to non-empty state.

There are other small problems with the recent changes, but let's address this big issue first.

@eduard-bagdasaryan
Copy link
Contributor Author

rebuild

@eduard-bagdasaryan
Copy link
Contributor Author

@rousskov , please review e400119.

Copy link
Contributor

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really do not like how the semi-repeated calls in this PR iteration look. Please see the code comments for a specific fix suggestion.

src/MemObject.cc Outdated
vary_headers(nullptr)
vary_headers(nullptr),
storeId_(aStoreId),
logUri_((aLogUri == aStoreId) ? nullptr : aLogUri)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until the trio becomes constant, I do not think it is a good idea to silently duplicate this tricky trio initialization logic (and the URL_CHECKSUM_DEBUG hack). If the sketch described in another change request is implemented, then the old constructor may continue to work because setUris() will only be called with a set/non-nil trio.

If you do need to add a MemObject(trio) constructor for some reason, then the new constructor can probably call the old MemObject(void) constructor (C++11 allows constructor reuse) and then call setUris().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undone previous changes, postponed creating MemObject(trio) until it is required by other changes.

src/MemObject.h Outdated

/// sets store ID, log URI, and request method; TODO: find a better name
/// XXX: remove this method and make corresponding URI fields constant
/// when another XXX within MemStore::get() is addressed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please s/another XXX within /the XXX in/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

src/MemObject.h Outdated

/// whether setUris() has been called
/// Whether entry StoreID was provided.
/// TODO: probably misnamed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the sketch described in another change request is implemented, then the old setUri() documentation will remain valid because setUris() will only be called with a set/non-nil trio.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

src/Store.h Outdated
MemObject *makeMemObject();

/// initialize mem_obj member (if needed) and supply URI-related info
/// initialize mem_obj (if needed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the sketch described in another change request is implemented, then this documentation will need to be adjusted further. Please check other affected methods as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


if (entry) {
entry->createMemObject(url, http->log_uri, http->request->method);
entry->mem_obj->setUris(url, http->log_uri, http->request->method);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This repetition and similar code elsewhere in this PR look really bad. We need to find a better way to express what we know. AFAICT, we need to handle three or four distinct cases (from the caller point of view):

  1. The caller knows that the StoreEntry object does not have a MemObject. The caller needs to create the MemObject but the caller does not have the trio information yet. This case is related to the old MemStore problem that we are not going to fix in this PR (and that makes it impossible to make the trio members constant).

  2. The caller knows that the StoreEntry object does not have a MemObject. The caller needs to create the MemObject and set its trio information.

  3. The caller knows that the StoreEntry object has a MemObject. The caller needs to update the MemObject trio if it is unset. I am not sure these cases actually exist.

  4. The caller does not know whether the StoreEntry object has a MemObject member. If the StoreEntry does not have MemObject, then the caller needs to create one (with the caller-supplied trio). Otherwise, the caller needs to update the MemObject trio if it is unset.

I suggest using the following APIs to support all of the above cases:

  • MemObject::MemObject(void): Old MemObject constructor. No changes here.
  • MemObject::setUris(trio): Old method adjusted to do nothing when hasUris() is already true and to assert that hasUris() becomes true otherwise.
  • `StoreEntry::ensureMemObject(trio): Creates MemObject if needed and always calls setUri().
  • `StoreEntry::createMemObject(trio): Asserts that there is no MemObject and calls ensureMemObject().
  • `StoreEntry::createMemObject(void): Asserts that there is no MemObject and creates one.

When/if we no longer need to update the trio, MemObject(void) and setUris() will disappear, with their code merged into MemObject(trio). The createMemObject(void) method will disappear as well as unneeded.

The use cases listed earlier can use the following APIs:

  1. createMemObject(void)
  2. createMemObject(trio)
  3. assert(mem_obj); mem_obj->setUris(trio)
  4. ensureMemObject(trio)

Since each case gets its own API, it would be easier to find and adjust them later. Again, I hope that case 3 does not actually exist.

If you do not have a better idea, please see whether the above sketch results in a better looking but still correct code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to implement your sketch "as is".

@eduard-bagdasaryan
Copy link
Contributor Author

@rousskov , I addressed latest review comments.

@rousskov rousskov added S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box and removed S-waiting-for-author author action is expected (and usually required) labels Sep 14, 2017
@rousskov rousskov force-pushed the SQUID-314-bug-4616-mem-assertion branch from 4d33a73 to 0d4baf8 Compare September 15, 2017 01:03
@rousskov
Copy link
Contributor

rousskov commented Sep 15, 2017

I removed the setUris() exit assertion, added a few polishing touches, rebased, and squashed. You can still see individual changes at SQUID-314-bug-4616-mem-assertion.bak

@eduard-bagdasaryan, if the tests pass, do you have any objections to merging 0d4baf8 4da97c7?

Edit: I had to force push a trivial change in hope to awake SemaphoreCI that got stuck/confused. That did not help so I will try to find another trick to unstuck it.

@rousskov rousskov added S-waiting-for-author author action is expected (and usually required) and removed S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box labels Sep 15, 2017
This bug was probably caused by Bug 2833 feature/fix (1a210de).

The primary fix here is limited to clientReplyContext::processExpired():
Collapsed forwarding code must ensure StoreEntry::mem_obj existence. It
was missing for cache hits purged from (or never admitted into) the
memory cache. Most storeClientListAdd() callers either have similar code
or call storeCreateEntry() which also creates StoreEntry::mem_obj.

Also avoided clobbering known StoreEntry URIs/method in some cases. The
known effect of this change is fixed store.log URI and method fields
when a hit transaction did not match the stored entry exactly (e.g., a
HEAD hit for a GET cached entry), but this improvement may have even
more important consequences: The original method is used by possibly
still-running entry filling code (e.g., determining the end of the
incoming response, validating the entry length, finding vary markers,
etc.). Changing the method affects those actions, essentially corrupting
the entry state. The same argument may apply to store ID and log URI.

We even tried to make URIs/method constant, but that is impractical w/o
addressing an XXX in MemStore::get(), which is outside this issue scope.
To facilitate that future fix, the code now distinguishes these cases:

* createMemObject(void): Buggy callers that create a new memory object
  but do not know what URIs/method the hosting StoreEntry was based on.
  Once these callers are fixed, we can make the URIs/method constant.

* createMemObject(trio): Callers that create a new memory object with
  URIs/method that match the hosting StoreEntry.

* ensureMemObject(trio): Callers that are not sure whether StoreEntry
  has a memory object but have URIs/method to create one if needed.
@rousskov rousskov force-pushed the SQUID-314-bug-4616-mem-assertion branch from 0d4baf8 to 4da97c7 Compare September 15, 2017 03:22
@rousskov rousskov closed this Sep 15, 2017
@rousskov rousskov reopened this Sep 15, 2017
@rousskov
Copy link
Contributor

For the record, closing and reopening this PR unstuck Semaphore CI. Please note that not everybody has the right to reopen and that pushing into the branch of a closed PR may force you to jump through additional hoops later.

@eduard-bagdasaryan
Copy link
Contributor Author

do you have any objections to merging

I do not mind merging it.

@rousskov rousskov merged commit 76d6111 into squid-cache:master Sep 15, 2017
@rousskov rousskov deleted the SQUID-314-bug-4616-mem-assertion branch September 15, 2017 14:13
squidadm pushed a commit to squidadm/squid that referenced this pull request Nov 27, 2017
This bug was probably caused by Bug 2833 feature/fix (1a210de).

The primary fix here is limited to clientReplyContext::processExpired():
Collapsed forwarding code must ensure StoreEntry::mem_obj existence. It
was missing for cache hits purged from (or never admitted into) the
memory cache. Most storeClientListAdd() callers either have similar code
or call storeCreateEntry() which also creates StoreEntry::mem_obj.

Also avoided clobbering known StoreEntry URIs/method in some cases. The
known effect of this change is fixed store.log URI and method fields
when a hit transaction did not match the stored entry exactly (e.g., a
HEAD hit for a GET cached entry), but this improvement may have even
more important consequences: The original method is used by possibly
still-running entry filling code (e.g., determining the end of the
incoming response, validating the entry length, finding vary markers,
etc.). Changing the method affects those actions, essentially corrupting
the entry state. The same argument may apply to store ID and log URI.

We even tried to make URIs/method constant, but that is impractical w/o
addressing an XXX in MemStore::get(), which is outside this issue scope.
To facilitate that future fix, the code now distinguishes these cases:

* createMemObject(void): Buggy callers that create a new memory object
  but do not know what URIs/method the hosting StoreEntry was based on.
  Once these callers are fixed, we can make the URIs/method constant.

* createMemObject(trio): Callers that create a new memory object with
  URIs/method that match the hosting StoreEntry.

* ensureMemObject(trio): Callers that are not sure whether StoreEntry
  has a memory object but have URIs/method to create one if needed.
yadij pushed a commit that referenced this pull request Nov 27, 2017
This bug was probably caused by Bug 2833 feature/fix (1a210de).

The primary fix here is limited to clientReplyContext::processExpired():
Collapsed forwarding code must ensure StoreEntry::mem_obj existence. It
was missing for cache hits purged from (or never admitted into) the
memory cache. Most storeClientListAdd() callers either have similar code
or call storeCreateEntry() which also creates StoreEntry::mem_obj.

Also avoided clobbering known StoreEntry URIs/method in some cases. The
known effect of this change is fixed store.log URI and method fields
when a hit transaction did not match the stored entry exactly (e.g., a
HEAD hit for a GET cached entry), but this improvement may have even
more important consequences: The original method is used by possibly
still-running entry filling code (e.g., determining the end of the
incoming response, validating the entry length, finding vary markers,
etc.). Changing the method affects those actions, essentially corrupting
the entry state. The same argument may apply to store ID and log URI.

We even tried to make URIs/method constant, but that is impractical w/o
addressing an XXX in MemStore::get(), which is outside this issue scope.
To facilitate that future fix, the code now distinguishes these cases:

* createMemObject(void): Buggy callers that create a new memory object
  but do not know what URIs/method the hosting StoreEntry was based on.
  Once these callers are fixed, we can make the URIs/method constant.

* createMemObject(trio): Callers that create a new memory object with
  URIs/method that match the hosting StoreEntry.

* ensureMemObject(trio): Callers that are not sure whether StoreEntry
  has a memory object but have URIs/method to create one if needed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-for-author author action is expected (and usually required)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants