Skip to content

Consul SD causing 100% cpu usage if the service is down #627

@Xzya

Description

@Xzya

Hey,

It seems that if the consul service is down, the Instancer will constantly try to get the list of services, causing 100% CPU usage until the service is back up. There should probably be a backoff mechanism which will only try after X amount of time. Here is how I'm using the client:

var client consulsd.Client
{
	consulConfig := consulapi.DefaultConfig()
	consulConfig.WaitTime = 10 * time.Second
	consulConfig.Address = config.Consul.Address
	consulClient, err := consulapi.NewClient(consulConfig)
	if err != nil {
		return Endpoints{}, err
	}
	client = consulsd.NewClient(consulClient)
}

var (
	retryMax     = 1
	retryTimeout = 2 * time.Second
	tags         = []string{environment}
	passingOnly  = true
	endpoints    = Endpoints{}
	instancer    = consulsd.NewInstancer(client, logger, ServiceName, tags, passingOnly)
)
{
	factory := MainDBClientFactory(MakeRegisterEndpoint)
	endpointer := sd.NewEndpointer(instancer, factory, logger)
	balancer := lb.NewRoundRobin(endpointer)
	retry := lb.Retry(retryMax, retryTimeout, balancer)
	endpoints.RegisterEndpoint = retry
}
...

Terminal output:
ts=2017-10-26T11:08:33.168160872Z caller=instancer.go:69 service=maindb tags=[local] err="Get http://192.168.99.100:8500/v1/health/service/maindb?passing=1&tag=local&wait=10000ms: dial tcp 192.168.99.100:8500: getsockopt: connection refused" ts=2017-10-26T11:08:33.171253831Z caller=instancer.go:69 service=maindb tags=[local] err="Get http://192.168.99.100:8500/v1/health/service/maindb?passing=1&tag=local&wait=10000ms: dial tcp 192.168.99.100:8500: getsockopt: connection refused" ts=2017-10-26T11:08:33.175242518Z caller=instancer.go:69 service=maindb tags=[local] err="Get http://192.168.99.100:8500/v1/health/service/maindb?passing=1&tag=local&wait=10000ms: dial tcp 192.168.99.100:8500: getsockopt: connection refused" ts=2017-10-26T11:08:33.176126961Z caller=instancer.go:69 service=maindb tags=[local] err="Get http://192.168.99.100:8500/v1/health/service/maindb?passing=1&tag=local&wait=10000ms: dial tcp 192.168.99.100:8500: getsockopt: connection refused" ...

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions