Skip to content

Conversation

@jnturton
Copy link
Contributor

@jnturton jnturton commented Sep 7, 2022

DRILL-8295: Probable resource leak in the HTTP storage plugin

Description

Adds close() calls in a number of places where HTTP requests are made in the HTTP storage plugin.

Documentation

N/A

Testing

Existing unit tests.

@jnturton jnturton added bug backport-to-stable This bug fix is applicable to the latest stable release and should be considered for inclusion there labels Sep 7, 2022
@pjfanning
Copy link
Member

@jnturton there are also similar issues in

  • org.apache.drill.exec.store.http.util.SimpleHttp
  • org.apache.drill.exec.store.http.oauth.OAuthUtils

@jnturton
Copy link
Contributor Author

jnturton commented Sep 7, 2022

@jnturton there are also similar issues in

* org.apache.drill.exec.store.http.util.SimpleHttp

* org.apache.drill.exec.store.http.oauth.OAuthUtils

@pjfanning thanks I picked up a couple of extra instances.

rowWriter.start();
if (jsonLoader.parser().next()) {
rowWriter.save();
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we explicitly close the results InputStream here as well? Would mind testing this on a query that produces multiple batches?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cgivre

  1. The JsonLoader closes the input streams it's been working off of when it is closed so I don't think so.
  2. Multiple batch datasets do not work with these UDFs yet from what I recall? I think @vdiravka continues to work on that, perhaps he can comment on the closing of the JsonLoader here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my recollection, this function does handle the multiple batches. It was the convert_fromJSON that @vdiravka was working on.

Copy link
Contributor Author

@jnturton jnturton Sep 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cgivre yes, you're right. I tried a couple of things. First I provided a JSON response that would normally produce 64k+1 rows if queried to http_get but it looked to me like it was being handled in a single batch since, I guess, the row count of a query based on VALUES(1) is still 1. I then wrote a query to SELECT http_get(some simple JSON) from a mock table containing 64k+1 rows. This overwhelms the okhttp3 mock server and fails with a timeout. I'm not sure if there some other test to try here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fed the http_get function a string containing 50 million little JSON objects from sequence {"foo": 1} {"foo": 2} {"foo": 3}... and it got through it (took about 45s). I just don't know if that answers the right question.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works for me :-)

Copy link
Contributor

@cgivre cgivre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-to-stable This bug fix is applicable to the latest stable release and should be considered for inclusion there bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants