[py] Improve handling of py interpreter errors#290
Conversation
In this specific case we could investigate who did reset the error state but not sure it's worth it, the point is by the moment we release the lock we should assume anything can happen.
It will, but working with the embedded interpreter in an async fashion like we do will be always error prone since bugs are only exposed under particular circumstances.
there's a disclaimer already but feel free to make it more dramatic 😄 If it helps to avoid the confusion with this specific issue, we could make |
|
Thanks @masci for the review, I made |
There was a problem hiding this comment.
Not sure if it's worth it but we could make this thread safe with an atomic write.
UnlockOSThread is a noop if called again, and PyGILState_Release is safe to call passing an UNLOCKED state, see https://github.com/python/cpython/blob/master/Python/pystate.c#L951, if we add an atomic write we can safely state unlock is thread safe and operates as a noop when called on an unlocked stickylock
|
One small comment but I confirm my 👍 |
Changing locks between a python call that sets the error indicator on the interpreter and a fetch of that error can result in the error indicator not being there anymore when the fetch is attempted. I'm not sure exactly why this happens. The fact that it happens "randomly" makes me think that it's related to how the goroutines are scheduled on OS threads by the go runtime, and to how this can affect the state of the python interpreter.
* Use `getPythonError` in py's `check.Run` to fetch errors from the python interpreter. We should use it wherever possible. * Improvements on `getPythonError`: * check that an error occurred before trying to fetch it (improves quality of error that's returned by the function) * normalize the python error (recommended by the python C API docs) * use the string representation of `pvalue` (improves error message)
`getPythonError` is now a method of the `stickyLock` struct for clarity
b4a5f0c to
96778b0
Compare
|
|
Bumps [github.com/aws/aws-sdk-go-v2/service/secretsmanager](https://github.com/aws/aws-sdk-go-v2) from 1.40.3 to 1.40.5. - [Release notes](https://github.com/aws/aws-sdk-go-v2/releases) - [Changelog](https://github.com/aws/aws-sdk-go-v2/blob/main/changelog-template.json) - [Commits](aws/aws-sdk-go-v2@service/efs/v1.40.3...service/sfn/v1.40.5) --- updated-dependencies: - dependency-name: github.com/aws/aws-sdk-go-v2/service/secretsmanager dependency-version: 1.40.5 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
What does this PR do?
2 things:
getPythonError()(see commit description of 392287c)Motivation
Specifically, changing locks between a python call that sets the error indicator
on the interpreter and a fetch of that error can result in the
error indicator not being there anymore when the fetch is attempted. When that happens
we "lose" the error and have no idea what the py interpreter error was.
Generally, this change makes the py loader and runner more reliable, which is pretty
critical when troubleshooting issues on the agent.
Additional Notes
Generally, I'm a bit concerned about how the goroutine thread scheduling made by the go runtime can affect the python interpreter state. Touching anything in the
pypackage is pretty error-prone... Having "system" tests on the bindings will definitely help. We could also add a big warning in the README of thepypackage :)