Skip to content

Move GCP to a core extension#6953

Merged
gianm merged 17 commits intoapache:masterfrom
drcrallen:gcp/core
Mar 27, 2019
Merged

Move GCP to a core extension#6953
gianm merged 17 commits intoapache:masterfrom
drcrallen:gcp/core

Conversation

@drcrallen
Copy link
Copy Markdown
Contributor

@drcrallen drcrallen commented Jan 30, 2019

#6910

Special changes:

  • Move google extension to a core extension
  • Create aws-common and gcp-common. Put them in server
  • Move ec2 autoscaler to its own extension
  • Fix up default extension inclusions
  • Fix up documentation

As a side note, it seems like the EC2 autoscaling stuff is not actually documented anywhere.

@drcrallen drcrallen requested a review from gianm January 30, 2019 20:47
Copy link
Copy Markdown
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drcrallen, apologies for not having a chance to get to this til now. Would definitely love to see it happen.

{
GoogleCredential credential = GoogleCredential.getApplicationDefault(transport, factory);
if (credential.createScopedRequired()) {
credential = credential.createScoped(Collections.singleton("https://www.googleapis.com/auth/cloud-platform"));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any value in allowing the URL to be configurable?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since people can now override HttpRequestInitializer in an extension, they can do so with a custom extension. This is more laborious than a simple config change though. For a simple config change it could make sense but I don't have a specific use case in mind and would rather leave such a feature add to someone who understands the value of such a feature better.

@Override
public void configure(Binder binder)
{

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would be nice to have a comment that this is intentionally blank, like // Nothing to do. (Whenever reading empty methods I always wonder if it was intentional or not.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment thread server/pom.xml
</dependency>
<dependency>
<groupId>org.apache.druid</groupId>
<artifactId>druid-gcp-common</artifactId>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish we didn't have to include gcp-common in the standard lib directory, even for people that don't use GCP. It feels like continuing the sins of our forebears (aws-common, that is). Which / how many jars does this end up adding? Trying to convince myself to be okay with this and not fight it :)

I think you said you tried doing a transitive dependency but had issues with it. For sake of getting it written down - what'd you try & what went wrong?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll get a list of dependencies via the maven dependency tree here.

As far as transitive dependency, this goes back to "an extension can't depend on another extension" problem, where the implementation in an extension classloader is not accessible from any other extension. So if I put the AWS injectables in an extension, there is no other extension that can use those. All the overrides must be in the same extension.

As written elsewhere, the only place this does not apply is for things used in the hadoop world, where everything is mangled onto the same ClassLoader (often leading to all kinds of version problems). Since things in the hadoop world get mangled all together, they all can reach each other in the same classloader.

The only "real" solution I can think of for this is to have an Extension Class Loader, whereby an extension can declare a class as "exported" and have that classloader make sure any other extensions can use that common exported class from a single classloader.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it work to add aws-common and gcp-common as dependencies for each extension individually, not as their own extensions? So we aren't doing extension-depends-on-extension, but are instead loading the aws/gcp common jars, separately, into each extension classloader?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they only provided libraries and no bindings it would. But with that approach it does not allow me to make a third party extension to inject new overrides to the HttpInitializer (for example).

And since things are bound guice makes the classloader isolation items weird for classes named the same thing but in different classloaders.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I recall it fails with a "you bound HttpIntializer multiple times" kind of error

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the extension view point, each implementation of DruidModule is unique in its classloader, so its bindings fire independently.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. That makes sense and is a bummer. But thanks for explaining it. I also guess that doing it the way I suggested would mean every extension gets its own GCP client, which is not ideal. OK, I give up, let's do it this way and hopefully someone has a flash of inspiration around making extensions able to depend on other extensions.

I built a tarball to see what new stuff this would add to lib/, and got:

./lib/checker-qual-2.5.7.jar
./lib/druid-gcp-common-0.15.0-incubating-SNAPSHOT.jar
./lib/google-api-client-1.22.0.jar
./lib/google-http-client-1.22.0.jar
./lib/google-http-client-jackson2-1.22.0.jar
./lib/google-oauth-client-1.22.0.jar
./lib/jackson-module-guice-2.6.7.jar

Not too bad! I was afraid it would pull in a bunch of extra deps, but, it doesn't.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm super careful dude :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just paranoid - thanks for indulging :)

Comment thread pom.xml
<module>extensions-core/protobuf-extensions</module>
<module>extensions-core/lookups-cached-global</module>
<module>extensions-core/lookups-cached-single</module>
<module>extensions-core/ec2-extensions</module>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely a nice change. The EC2 autoscaling stuff is better suited to an extension than to core. Hopefully it opens the door to removing Druid's dependency on aws-common.

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Mar 7, 2019

@drcrallen in extensions-core/google-extensions/pom.xml, the groupId's gotta be org.apache.druid.extensions

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Mar 7, 2019

@drcrallen Just saw you push a commit; the groupId in extensions-core/google-extensions/pom.xml still needs adjustment to org.apache.druid.extensions.

@drcrallen
Copy link
Copy Markdown
Contributor Author

@gianm green now

Copy link
Copy Markdown
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Also hopeful that someone thinks of a clever way to have extensions depend on each other 🤞

@drcrallen
Copy link
Copy Markdown
Contributor Author

I still need to update the docs before merging

@drcrallen
Copy link
Copy Markdown
Contributor Author

Related: #7218

@drcrallen
Copy link
Copy Markdown
Contributor Author

Ping to keep open. need to resolve conflicts and change docs. Trying to get that done soon

@drcrallen drcrallen removed the Discuss label Mar 26, 2019
Copy link
Copy Markdown
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍. I think we should probably make follow-up issues about creating docs for druid-ec2-extensions and finding a way to share common extensions with other extensions after this is merged.

|druid-parquet-extensions|Support for data in Apache Parquet data format. Requires druid-avro-extensions to be loaded.|[link](../development/extensions-core/parquet.html)|
|druid-protobuf-extensions| Support for data in Protobuf data format.|[link](../development/extensions-core/protobuf.html)|
|druid-s3-extensions|Interfacing with data in AWS S3, and using S3 as deep storage.|[link](../development/extensions-core/s3.html)|
|druid-ec2-extensions|Interfacing with AWS EC2 for autoscaling middle managers|UNDOCUMENTED|
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is I guess sort of documented at ../../configuration/index.html#autoscaler, but I think it would probably be better in the long run to make a doc for this extension and link those autoscaler docs to it. We should also mention in the existing autoscaler docs in the giant configuration page that the druid-ec2-extensions need to be loaded for it to work, but I don't really consider either of these blockers for this PR to get merged since it isn't documented super well in the first place.

@gianm gianm added this to the 0.15.0 milestone Mar 27, 2019
@gianm gianm merged commit eeb3dbe into apache:master Mar 27, 2019
@drcrallen drcrallen deleted the gcp/core branch March 27, 2019 16:07
@jihoonson jihoonson mentioned this pull request Mar 27, 2019
@clintropolis clintropolis mentioned this pull request Mar 27, 2019
justinborromeo pushed a commit to justinborromeo/incubator-druid that referenced this pull request Apr 1, 2019
* Move GCP to a core extension

* Don't provide druid-core >.<

* Keep AWS and GCP modules separate

* Move AWSModule to its own module

* Add aws ec2 extension and more modules in more places

* Fix bad imports

* Fix test jackson module

* Include AWS and GCP core in server

* Add simple empty method comment

* Update version to 15

* One more 0.13.0-->0.15.0 change

* Fix multi-binding problem

* Grep for s3-extensions and update docs

* Update extensions.md
@jihoonson
Copy link
Copy Markdown
Contributor

Hi @drcrallen, I'm writing a draft of 0.15.0 release notes. Is there anything should be mentioned in the release notes other than that druid-google-extensions extension is now a core extension and bundled in the binary distribution by default?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants