Skip to content

as per @itaiy suggested will close then try to connect.#3669

Merged
nishantmonu51 merged 4 commits intoapache:masterfrom
b-slim:fixing_reconnenct_graphite
Dec 13, 2016
Merged

as per @itaiy suggested will close then try to connect.#3669
nishantmonu51 merged 4 commits intoapache:masterfrom
b-slim:fixing_reconnenct_graphite

Conversation

@b-slim
Copy link
Copy Markdown
Contributor

@b-slim b-slim commented Nov 8, 2016

Thread.currentThread().interrupt();
} else if (e instanceof SocketException) {
pickledGraphite.close();
log.info("had exception [%s] trying to re-connect to graphite server", e.getMessage());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception is already logged a few lines above, maybe this logging statement should just tell "trying to re-connect.."?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -175,11 +173,12 @@ public void run()
if (e instanceof InterruptedException) {
Thread.currentThread().interrupt();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What practically happens after this statement? We continue with the next iteration in the loop?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it is interrupted it will terminate the current thread right ?

Copy link
Copy Markdown
Member

@leventov leventov Nov 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@b-slim as far as I know it will just set interrupted flag and continue executing as normal: https://docs.oracle.com/javase/7/docs/api/java/lang/Thread.html#interrupt() The next blocking operation may fail immediately with InterruptedException.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i see you are right thanks for the catch.

if (e instanceof InterruptedException) {
Thread.currentThread().interrupt();
} else if (e instanceof SocketException) {
pickledGraphite.close();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General Closeable contract in Java is that after close() the object is useless: either doesn't react to any subsequent calls or throws runtime exceptions. If this is not so for PickledGraphite, maybe it should be commented.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@b-slim
Copy link
Copy Markdown
Contributor Author

b-slim commented Nov 8, 2016

@leventov thanks for looking at this i have updated the pr

Copy link
Copy Markdown
Member

@leventov leventov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

}
}
}
pickledGraphite.flush();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@b-slim - why did you remove flush()?
I might be wrong, but seems to me that if you have some metrics in pickledGraphite that weren't written to the OutputStream in the while loop, they will never be sent (as pickledGraphite.send() only writes to the OutputStream when the size of the metrics' list is equal or greater than the batch size, see PickledGraphite.send())

@b-slim
Copy link
Copy Markdown
Contributor Author

b-slim commented Nov 9, 2016

@itaiy close will do the flush as well.

@itaiy
Copy link
Copy Markdown

itaiy commented Nov 10, 2016

@b-slim - you mean GraphiteEmitter.close()?
I think it will still cause you to lose metrics, as you create a new PickledGraphite each time ConsumerRunnable.run() is invoked, so metrics left in the metrics' list inside the PickledGraphite won't be sent if PickledGraphite.flush() won't be called.

public void run()
{
try {
try (PickledGraphite pickledGraphite = new PickledGraphite(
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@itaiy this try will call the close once run is done.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I missed it, thanks :)

@b-slim b-slim added the Bug label Nov 11, 2016
@b-slim b-slim added this to the 0.9.3 milestone Nov 11, 2016
@fjy
Copy link
Copy Markdown
Contributor

fjy commented Dec 13, 2016

👍

Copy link
Copy Markdown
Member

@nishantmonu51 nishantmonu51 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@nishantmonu51 nishantmonu51 merged commit 7b18fb7 into apache:master Dec 13, 2016
dgolitsyn pushed a commit to metamx/druid that referenced this pull request Feb 14, 2017
* as per @itaiy suggested will close then try to connect.

* use close instead of flush

* git fix comments

* break the loop in case of interrupted
@sata
Copy link
Copy Markdown

sata commented Feb 20, 2017

Will this fix be backported to 0.9.2 branch? At the moment this fix is not included when pulling down the extension with pull-deps through tools.

@fjy
Copy link
Copy Markdown
Contributor

fjy commented Feb 20, 2017

@sata No, it will be a part of the upcoming 0.10.0 release.

@sata
Copy link
Copy Markdown

sata commented Feb 28, 2017

I tried this fix by backporting it myself and I'm finding the solution not to be working for the scenario I'm seeing.

HostedGraphite seems to have very aggressive timeouts in longer periods where they close their end roughly after a minute and a few seconds. Our interval window is sending each minute - if we have enough metrics to send.

The metric library which the extension uses, drops the metrics if it cannot write to the socket. If this happens regularly then we loose metrics. I have seen this happening for a full day on several occasions.

It seems this issue isn't new either[1].

I tried to leverage TCP keep alives on the socket[2] in conjunction with these kernel settings:

net.ipv4.tcp_keepalive_time=20
net.ipv4.tcp_keepalive_probes=3
net.ipv4.tcp_keepalive_intvl=5

And it didn't help. Since I don't know how HostedGraphite firewallls look like it's quite hard to reason about it. One thing didn't investigate is to see if our probes actually were acknowledge or not.

The approach I tried with first which did work but less elegant was to send a garbage byte as payload to the end point. [3].

This does the job but it isn't pretty. Preferably an application layer keep alive would be better but that would require a change to Carbon.

Has anyone else encountered this problem with HostedGraphite?

I would preferably avoid maintaining a fork if that's possible.

[1] dropwizard/metrics#669
[2] GameAnalytics/metrics@abedd44
[3] GameAnalytics/metrics@312e5b5

lastres pushed a commit to GameAnalytics/druid that referenced this pull request Dec 5, 2017
* as per @itaiy suggested will close then try to connect.

* use close instead of flush

* git fix comments

* break the loop in case of interrupted
@b-slim b-slim deleted the fixing_reconnenct_graphite branch April 26, 2018 01:28
miquel14 pushed a commit to smadex/incubator-druid that referenced this pull request Feb 7, 2019
* as per @itaiy suggested will close then try to connect.

* use close instead of flush

* git fix comments

* break the loop in case of interrupted
seoeun25 pushed a commit to seoeun25/incubator-druid that referenced this pull request Feb 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants