In https://bugs.python.org/issue38644 many internal functions added overhead by calling _PyThreadState_GET().
The effect was pervasive (over 200 occurrences throughout the core) and hit many time critical code paths. For example in commit 1726909094, type_call() became slower for every single object instantiation. The cost varies depending on the cost of memory fencing and the compiler's implementation of atomic loads. It doesn't look like these changes had any discussion, review, tests, or demonstrable user benefits.