The recent instability in service today seems to mostly be due to continuous stop-the-world gc.
I know this because each time there was a failure I checked:
jstat gcutil and
jmap -heap
Increasing the heap size helped but it's still happening.
Suggestions are to look at the access log and run the memory profiler on the app, looking for large object hierarchies being created and not freed on the most common pages, such as the homepage.
The recent instability in service today seems to mostly be due to continuous stop-the-world gc.
I know this because each time there was a failure I checked:
jstat gcutil and
jmap -heap
Increasing the heap size helped but it's still happening.
Suggestions are to look at the access log and run the memory profiler on the app, looking for large object hierarchies being created and not freed on the most common pages, such as the homepage.