as per @itaiy suggested will close then try to connect.#3669
as per @itaiy suggested will close then try to connect.#3669nishantmonu51 merged 4 commits intoapache:masterfrom
Conversation
| Thread.currentThread().interrupt(); | ||
| } else if (e instanceof SocketException) { | ||
| pickledGraphite.close(); | ||
| log.info("had exception [%s] trying to re-connect to graphite server", e.getMessage()); |
There was a problem hiding this comment.
The exception is already logged a few lines above, maybe this logging statement should just tell "trying to re-connect.."?
| @@ -175,11 +173,12 @@ public void run() | |||
| if (e instanceof InterruptedException) { | |||
| Thread.currentThread().interrupt(); | |||
There was a problem hiding this comment.
What practically happens after this statement? We continue with the next iteration in the loop?
There was a problem hiding this comment.
if it is interrupted it will terminate the current thread right ?
There was a problem hiding this comment.
@b-slim as far as I know it will just set interrupted flag and continue executing as normal: https://docs.oracle.com/javase/7/docs/api/java/lang/Thread.html#interrupt() The next blocking operation may fail immediately with InterruptedException.
There was a problem hiding this comment.
oh i see you are right thanks for the catch.
| if (e instanceof InterruptedException) { | ||
| Thread.currentThread().interrupt(); | ||
| } else if (e instanceof SocketException) { | ||
| pickledGraphite.close(); |
There was a problem hiding this comment.
General Closeable contract in Java is that after close() the object is useless: either doesn't react to any subsequent calls or throws runtime exceptions. If this is not so for PickledGraphite, maybe it should be commented.
|
@leventov thanks for looking at this i have updated the pr |
| } | ||
| } | ||
| } | ||
| pickledGraphite.flush(); |
There was a problem hiding this comment.
@b-slim - why did you remove flush()?
I might be wrong, but seems to me that if you have some metrics in pickledGraphite that weren't written to the OutputStream in the while loop, they will never be sent (as pickledGraphite.send() only writes to the OutputStream when the size of the metrics' list is equal or greater than the batch size, see PickledGraphite.send())
|
@itaiy close will do the flush as well. |
|
@b-slim - you mean |
| public void run() | ||
| { | ||
| try { | ||
| try (PickledGraphite pickledGraphite = new PickledGraphite( |
There was a problem hiding this comment.
@itaiy this try will call the close once run is done.
|
👍 |
* as per @itaiy suggested will close then try to connect. * use close instead of flush * git fix comments * break the loop in case of interrupted
|
Will this fix be backported to 0.9.2 branch? At the moment this fix is not included when pulling down the extension with pull-deps through tools. |
|
@sata No, it will be a part of the upcoming 0.10.0 release. |
|
I tried this fix by backporting it myself and I'm finding the solution not to be working for the scenario I'm seeing. HostedGraphite seems to have very aggressive timeouts in longer periods where they close their end roughly after a minute and a few seconds. Our interval window is sending each minute - if we have enough metrics to send. The metric library which the extension uses, drops the metrics if it cannot write to the socket. If this happens regularly then we loose metrics. I have seen this happening for a full day on several occasions. It seems this issue isn't new either[1]. I tried to leverage TCP keep alives on the socket[2] in conjunction with these kernel settings:
And it didn't help. Since I don't know how HostedGraphite firewallls look like it's quite hard to reason about it. One thing didn't investigate is to see if our probes actually were acknowledge or not. The approach I tried with first which did work but less elegant was to send a garbage byte as payload to the end point. [3]. This does the job but it isn't pretty. Preferably an application layer keep alive would be better but that would require a change to Carbon. Has anyone else encountered this problem with HostedGraphite? I would preferably avoid maintaining a fork if that's possible. [1] dropwizard/metrics#669 |
* as per @itaiy suggested will close then try to connect. * use close instead of flush * git fix comments * break the loop in case of interrupted
* as per @itaiy suggested will close then try to connect. * use close instead of flush * git fix comments * break the loop in case of interrupted
fix to #2952 (comment)