Skip to content

Fixing json creator for s3 storage connector provider#12948

Merged
abhishekagarwal87 merged 2 commits intoapache:masterfrom
cryptoe:STORAGE_CONNECTOR_FIX
Aug 25, 2022
Merged

Fixing json creator for s3 storage connector provider#12948
abhishekagarwal87 merged 2 commits intoapache:masterfrom
cryptoe:STORAGE_CONNECTOR_FIX

Conversation

@cryptoe
Copy link
Copy Markdown
Contributor

@cryptoe cryptoe commented Aug 23, 2022

Description

While trying out the new MSQ task based ingestion #12918 , found a bug in the storage connector where Guice was not finding the 'S3OutputConfig`.

DurableStorageConfiguration: Durable storage mode can only be enabled when druid.msq.intermediate.storage.enable is set to true and the connector is configured correctly. Check the documentation on how to enable durable storage mode. If you want to still query without durable storage mode, set durableShuffleStorage to false in the query context. Got error com.google.inject.ProvisionException: Unable to provision, see the following errors:

1) Could not find a suitable constructor in org.apache.druid.storage.s3.output.S3OutputConfig. Classes must have either one (and only one) constructor annotated with @Inject or a zero-argument constructor that is not private.
  at org.apache.druid.storage.s3.output.S3OutputConfig.class(S3OutputConfig.java:53)
  while locating org.apache.druid.storage.s3.output.S3OutputConfig
  at org.apache.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:151) (via modules: com.google.inject.util.Modules$OverrideModule


Fixed the bug ...


Key changed/added classes in this PR
  • S3StorageConnectorProvider

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Copy link
Copy Markdown
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch LGTM. However there should be a test of some kind, at some point. Is it possible to add a unit test for this? Or were you thinking of adding one later as an integration test?

@cryptoe
Copy link
Copy Markdown
Contributor Author

cryptoe commented Aug 23, 2022

I was planning to add one integration test once #12918 is merged and the new IT framework is fully merged

@cryptoe
Copy link
Copy Markdown
Contributor Author

cryptoe commented Aug 24, 2022

@gianm managed to cook up a UT.

@rohangarg
Copy link
Copy Markdown
Member

rohangarg commented Aug 24, 2022

A doubt I had was that is it possible to do the StorageConnector injection via PolyBind instead of JsonConfigurator? PolyBind provides support to add multiple implementations of an interface via a MapBinder and also provides to choose between them using a user property via createChoice method.
If that's possible, then the StorageConnector interface could be implemented as a simple guice injectable object in the extensions. The StorageConnectorProvider then would be replaced by ConfiggedProvider which is an implementation detail of PolyBind.

@cryptoe
Copy link
Copy Markdown
Contributor Author

cryptoe commented Aug 24, 2022

IIRC I tried using the polybind first.
Where it fell short was I wanted "namespaced" config providers.
If you check out the test case, I registered a custom extension with a custom namespace.
Now you can have multiple such namespaces on the same JVM.
That means each "choice" has a custom prefix with the name of the extension.

If there is a better way, I am all ears.

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Aug 24, 2022

LGTM still after CI passes.

About PolyBind, as far as I know it can only work with a single property. Since we want multiple connected properties here, I think we need the JsonConfigurator.

@rohangarg
Copy link
Copy Markdown
Member

Thanks for the explanation!

Now you can have multiple such namespaces on the same JVM.
That means each "choice" has a custom prefix with the name of the extension.

Can we provide a named annotated choice to PolyBind using something like you've used in test (Key.get(StorageConnectorProvider.class, Names.named(CUSTOM_NAMESPACE)))? Then for each (property, extension) pair, the PolyBind will have different choices where the property is passed in the createChoice method as is and the extension is passed as a part of the Key object for interfaceKey parameter.

@rohangarg
Copy link
Copy Markdown
Member

@gianm @cryptoe : I tried some experiment with PolyBind locally and wasn't able to make it work out of the box due to errors with multiple bindings via multiple choices to the same interface.
I added a new method to do named binding in PolyBind and was able to construct a test : https://gist.github.com/rohangarg/838489720d0ef5093fe60df7f7b922d7

Can you please take a look and see if it seems right and that fits the StorageConnector usecase? One thing I'm not sure right now is of the Singleton nature of objects created by the ConfiggedProvider.

BTW, the investigation need not block the current changes in this PR from my side.

@cryptoe
Copy link
Copy Markdown
Contributor Author

cryptoe commented Aug 25, 2022

Thanks for the experimentation. I guess it still won't work. We wanna define the choice on a suffix of the property type and bind it to the choice provider only once.

xxxx.type="choiceA"
xxxx.choiceA_path
xxxx.choiceA_base

Similarly in another extension, we can have

yyyy.type="choiceB"
yyyy.choiceB_dir
yyyy.choiceB_count
zzzz.type="choiceA"
zzzz.choiceA_path
zzzz.choiceA_base

from your example, looks like you are binding the choice multiple times for each namespace and prefix.

@rohangarg
Copy link
Copy Markdown
Member

The example only allows one binding per namespace and prefix. If more than one binding is tried for the same namespace, the guice injector will throw a binding error.
BTW, the current example works on simple Provider interface where the instance creator doesn't accept the property_prefix to create objects with different properties at runtime. For such case where the provider is meant to be more like a factory, we'd could try to use FactoryProvider from guice or do a semantic implementation of factory using Provider ourselves.

Copy link
Copy Markdown
Member

@rohangarg rohangarg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change LGTM since it is necessary to try out s3 durable storage for MSQ. The discussion around the implementation can happen independently.

@abhishekagarwal87 abhishekagarwal87 merged commit 31db3be into apache:master Aug 25, 2022
@abhishekagarwal87 abhishekagarwal87 added this to the 24.0.0 milestone Aug 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants