-
Notifications
You must be signed in to change notification settings - Fork 87
Proposal for run-time lock validation #805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jiridanek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| //// | ||
|
|
||
| = Mutex Locking Validation | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose a header for RIPs, consisting of the following metadata.
Authors: Jane Doe
Status: Submitted | Superseded | Implemented
Last updated: 2020-04-05
(I am taking this from https://docs.google.com/document/d/1ef7_drjTl4NnzfAmGPzC4_R9vGHkpIzUrlGhxPqTbwg/edit)
| //// | ||
|
|
||
| = Mutex Locking Validation | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be a README.adoc that explains what the RIPs are. I think that the following phrase (or a variant of it) should be part of it
It’s possible that designs change as they are implemented in practice. The published design documents capture the initial design, and not the ongoing changes as designs are implemented.
Always go to the documentation for descriptions of current Qpid Dispatch functionality.
(this is from https://bazel.build/designs/)
| In order to track lock nesting within a thread each thread will | ||
| maintain a per-thread stack of active locks. Taking a new lock will | ||
| result in that lock being pushed onto the stack. Attempting to push a | ||
| lock in the wrong priority order will result in calling abort(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you intend to check that the lock being unlocked is the one on top of the lock stack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An example would be really helpful. Assign some priorities to existing locks (connection work lock, log lock, python lock, etc.) and see what and what isn't allowed.
|
|
||
| A lock with a priority of zero is an exclusive lock. A thread must | ||
| never hold another lock when taking a priority 0 lock. While holding a | ||
| priority 0 lock it is an error to take any other lock. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any example of a lock that would get priority 0? Is this actually good for anything?
I was thinking of possibly using MAX_INT instead of 0, to avoid a special case. But there still would be a special case, because more than one lock may need to have this priority (and the rest of the proposal afaik disallows having multiple locks with the same priority).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this feels inverted and it will be difficult to choose an appropriate priority for your lock.
Instead of saying 0 can't be acquired while another lock is held, could we say that when a 0 is held, no other locks may be acquired? This would suggest a default priority of 0, which would apply to locks that are acquired and released in-line while data structures are manipulated and no functions are called.
This would mean that a lock's priority would be based on the priorities of other locks that may be recursively acquired, establishing a lock hierarchy. Locks like the Python lock should have a relatively high priority since they are held over long-running operations.
| //// | ||
|
|
||
| = Mutex Locking Validation | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That README.adoc should explain what RIP stands for ;) I actually don't know. (But it sounds really cool.)
| while a thread is running within the Python interpreter. Occasionally | ||
| the Python interpreter will invoke native C functions. These | ||
| functions may take addional locks, such as the global logging lock | ||
| (log_lock). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
offtopic: is it necessary to have logging lock? There is lock-free logging in Java, so maybe C has it too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Log locking is used to keep the written log in strict chronological order. gettimeofday, vsnprintf, and write_log are the items that need to be locked but the code locks only a little more than that.
It would be bad to have the logs out of order with respect to time of day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant something like http://logging.apache.org/log4j/2.x/manual/async.html, this keeps a queue of in-flight log messages and adding to the queue does not require holding a lock.
| //// | ||
|
|
||
| = Mutex Locking Validation | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposal files could be numbered, so possibly 001-lock-validation.adoc or maybe dated, 2020-07-31-lock-validation.adoc.
| state. The locking API is defined in | ||
| include/qpid/dispatch/threading.h | ||
|
|
||
| In general lock nesting (e.g. taking a lock while holding another) is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this statement is a bit strong. There are a number of cases where locks are nested in the code base.
|
|
||
| Every lock will be assigned a fixed priority on creation. The | ||
| priority will be a simple positive integer. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you proposing that the sys_mutex function be modified to include a lock priority? Alternatively, are you planning to add an alternate constructor (sys_mutex_priority or similar) and having the existing one default to a particular value? If so, what's the default?
ted-ross
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks like a valuable enhancement. We still need to answer some questions and hammer out some details.
|
I propose putting these under /docs/rips/ (as a peer to notes/, man/, and so on). |
|
Somebody built a tool that does a lot of what's proposed here http://manpages.ubuntu.com/manpages/bionic/man8/deadlock_detector-bpfcc.8.html. TSan is of course another such tool. |
No description provided.