Skip to content

PubSub locks up Node process at 100% CPU #2572

@ehacke

Description

@ehacke

Environment details

  • OS: Linux Mint 18.2 (dev)
  • Node.js version: 8.3.0
  • npm version: 5.3.0
  • @google-cloud/pubsub version: 0.13.2 and 0.14

Steps to reproduce

1.Try to read from PubSub subscription (even if it's completely empty)
2. Node process goes to 100% of CPU and eventually is killed by K8S
3. Apologize for not having any good reproduction steps

So we've been having intermittent issues with reading from subscriptions attached to one specific topic using Node. The same code will run fine for weeks, and then suddenly every Node process attached to that subscription is pegged to 100% CPU until the process is killed externally. The crazy part is that even after the subscription is completely drained, this continues to happen. There is nothing to read, and yet it's completely locked up. It will continue to lock up every process attached to it for days, then suddenly stop.

I completely agree that this sounds like an implementation issue on my side, but hear me out.

Important details:.

  • It does not seem to matter if there is anything in the subscription. An empty subscription also locks up the reading process.
  • Running a flamegraph analysis (attached) shows that 97.18% of the time, some function called Node::MakeCallback is on the top of the stack. In my mind this points to the GPRC library being involved in the problem.
  • The same application code functions fine using Kafka as the queue, with the only difference being the very simple read and write operations.
  • Other Node applications, written significantly differently, also attached to subscriptions on that topic will also fail in the same way, but at different times. Which suggests to me that it's not an issue with the content of the subscriptions (it can even be empty), but the operation of reading from the subscription itself.
  • We have many many other Node processes, using the same reading and writing code, connected to different topics, that never experience this issue.
  • If it helps, another unique detail is that the source of this topic is our only Dataflow process

Is there anything else I can do to try to trace this problem?

I accept that I haven't given enough info for this be resolvable on your side, but I'm at a loss on how to debug this further.

makecallback

Metadata

Metadata

Assignees

Labels

api: pubsubIssues related to the Pub/Sub API.priority: p1Important issue which blocks shipping the next release. Will be fixed prior to next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions