-
Notifications
You must be signed in to change notification settings - Fork 113
Sessions leaking from Pool #134
Description
We are experiencing an issue where sessions are not being constrained. During our stress testing we are making 4 requests in parallel all by primary keys, 50k times.
During initial test runs of 2000 records, the sessions count grew by 711. Effectively was we scaled it up in our sandbox we quickly ran out of all 10k sessions in minutes, locking the database.
The session count is being confirmed with gcloud spanner databases sessions list --database=users --instance=users-test | grep "name" | wc -l.
We looked at the previous pool leak fix and confirmed we were using that code but it did not solve the issue. (#91)
Environment details
- OS: Mac OS high sierra
- Node.js version: v8.9.0
- npm version: v5.6.0
- @google-cloud/spanner version: v1.1.0
Steps to reproduce
The queries are simple run requests, here is a code sample where we can reproduce.
This code is obviously poor practice, but we needed queries to take a longer than near immediate response to force error manifestation.
const spanner = require('@google-cloud/spanner');
const Promise = require('bluebird');
const crypto = require('crypto');
const database = spanner({ projectId: 'test-project' }).instance('test-instance').database('users',
{
acquireTimeout: 30000,
keepAlive: 1,
max: 10,
}
);
function getOffset() {
return Math.floor(Math.random() * 10000)
}
function getReadPromise() {
return database.run({
sql:
'SELECT * FROM users limit 1 offset @offset',
params: {
offset: getOffset(),
},
});
}
function start() {
console.log('Starting');
const arr = new Array(5).fill(1);
return Promise.map(arr, () => Promise.all([
getReadPromise(),
getReadPromise(),
getReadPromise(),
]));
}
setTimeout(() => {
start().then(start).then(start).then(start).then(start);
}, 3000);
The issue doesn’t manifest itself in version 0.8.2, which 0.8.3 is when the pooling was updated.
Environment details
OS: Mac OS High sierra
Node.js version: v8.9.0
npm version: v5.6.0
version: v1.1.0
We have a solution and will make the PR soon.
Issue Analysis
Here is our analysis of the issue after investigation:
In the following code, the code removes the session either from reads_ or writes_ array.
getNextAvailableSession_ method is called by getSession_ which is called by acquireSession_. getNextAvailableSession_ removes the session from reads_ or writes_ array and acquireSession_ adds the session to borrowed_ array as shown in the code below.
This code is asynchronous, which means it is creating a race condition. It may end up with a session being borrowed from reads or writes and it has also been yet to be marked as borrowed. This has the effect of not being counted towards the maximum amount of sessions available to the session. Which is ultimately the error issue causes excess sessions to be created and these sessions are not accounted for such that it will eventually consume all of the available sessions for the spanner instance.
Proposed Solution
We can fix the problem by making sure that when we remove a session from reads_ or writes_ array, we should mark the session as burrowed synchronously.


