Skip to content

Make QueryComponentSupliers independent from test classes#16275

Merged
rohangarg merged 26 commits intoapache:masterfrom
kgyrtkirk:extract-suppliers
Apr 25, 2024
Merged

Make QueryComponentSupliers independent from test classes#16275
rohangarg merged 26 commits intoapache:masterfrom
kgyrtkirk:extract-suppliers

Conversation

@kgyrtkirk
Copy link
Copy Markdown
Member

  • core change is to remove implements QueryComponentSupplier, PlannerComponentSupplier from BaseCalciteQueryTest - and make those methods override methods of a descendant of StandardQueryComponentSupplier
  • the supplier will be loaded by the SqlFrameworkConfig from an annotation

this fixes the tangled dependency between the testclass and the create frameworks - as after this the framework can be created without the testclass.

I would recommend to turn on ignore whitespace to see the real changes - because the new inner classes have added an extra level of indentation

@github-actions github-actions Bot added Area - Batch Ingestion Area - Querying Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Apr 12, 2024
Copy link
Copy Markdown
Member

@rohangarg rohangarg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the changes @kgyrtkirk 👍
Looks good overall to me, have left some comments.
Please let me know your thoughts on them

public boolean skipVectorize = false;

private QueryComponentSupplier baseComponentSupplier;
public PlannerComponentSupplier basePlannerComponentSupplier = new StandardPlannerComponentSupplier();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doubt : I don't understand how we removed implementing PlannerComponentSupplier interface, but still have this field in the base calcite class. How would a test class (either in open source implementations or custom forks) implement a custom planner component supplier now?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there was a separate PR to change this; however I've came to the conclusion that whatever this PlannerComponentSupplier wanted to be - it never really started getting used.

its not used in the current master ; providing the default implementation like this is enough.

but...if it needs to be used somewhere - maybe the QueryComponentSupplier could just have a new interface method which returns the PlannerComponentSupplier stuff

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made the following changes:

  • added a QueryComponentSupplier#getPlannerComponentSupplier method
  • StandardComponentSupplier#buildPlannerComponentSupplier to enable implementation to only build it once StandardComponentSupplier
  • SqlTestFramework#plannerFixture doesn't anymore need PlannerComponentSupplier to be passed separetly (as its defined implicitly by the QueryComponentSupplier

I think this may make it possible to retain control over these things - custom implementation could move their custom PlannerComponentSupplier into their custom QueryComponentSupplier

}


public File newTempFolder()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need these? or could we just expose the temporaryFolder itself via a method?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately yes - its kinda against the rules rules to use rules from other rules....
I feel like breaking such a thing for temporary folders is not worth it...if it would be some bigger more expensive and configurable object I would possibly try to work on it more.

we could have it in a utility class or something - maybe it could be moved into the ProjectPathUtils the other patch introduces...not sure

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

altered how this is being handled by:

  • replaced temporaryFolder with TempFolderProducer
  • inlined these method to invoke tempFolderProducer.newTempFolder

protected File createTempFolder(String prefix)
{
File tempDir = FileUtils.createTempDir(prefix);
Runtime.getRuntime().addShutdownHook(new Thread()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need this? can this be replaced via a temporary folder or deleteOnExit?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately deleteOnExit is kinda equiv to an unlink call - so its not recursive
https://stackoverflow.com/questions/11165253/deleting-a-directory-on-exit-in-java

public void beforeAll(ExtensionContext context) throws Exception
{
Class<?> testClass = context.getTestClass().get();
SqlTestFramework.SqlTestFrameWorkModule moduleAnnotation = getModuleAnnotationFor(testClass);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doubt : could there be a way in future to build the test framework without annotations as well? because while it does work very cleanly till now, I'm not sure how manageable it will be if more things keep getting added to the config.
Maybe annotations is the only way to deal with this, but I'm not sure.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: this SqlTestFramework.SqlTestFrameWorkModule annotation class will be removed in 2 patches - but it was necessary to make this patch a standalone change

in a followup the QueryComponentSupplier will be part of the SqlTestFrameWorkConfig and around the same time there will be new ways to do it as well: most importantly it will be possible to specify it in the URI as well - so quidem tests could decide which one they want to use

Copy link
Copy Markdown
Contributor

@imply-cheddar imply-cheddar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it overall looks generally good. But I'm scared about the lifecycle management of the Temporary folders. Lifecycle management is one of those things that comes back and bites hard in interesting ways, so I think I'd feel a lot more comfortable if we addressed that first. After that, I'm good to approve.

import com.fasterxml.jackson.databind.SerializationFeature;
import com.google.common.collect.ImmutableList;
import com.google.inject.Injector;
import org.apache.druid.compressedbigdecimal.CompressedBigDecimalSqlAggregatorTestBase.CompressedBigDecimalComponentSupplier;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly just a curiousity, but isn't this importing a sub-class from this same file? That doesn't seem like it should require an import?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - its an inner class; I think at some point it there could be classes share between multiple test classes

adding such imports was more-or-less just the consequence of some mechanical work needed to be done to add all these annotations

there will be a followup later which will merge this module annotation into the SqlTestFrameworkConfig ; I could try to remove them at that point.

private TempDirProducer()
{
tempDir = FileUtils.createTempDir("druid-tempdir");
Runtime.getRuntime().addShutdownHook(new Thread()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing the cleanup via a shutdown hook means that, unfortunate usage of this class in a JVM that's running all tests for the entire test suite could end up generating tons and tons and tons of little directory turds that don't really get cleaned up as they go along.

Also, registering an object with a shutdown hook in the constructor is kinda a sad pattern. It would be much nicer if the test framework could perhaps instantiate one of these and then clean it up as part of its own lifecycle instead of relying on an singleton instance and a shutdown hook for cleanup.

Is there a reason that we cannot create one of these and close it more cleanly than a shutdown hook?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgot to left a comment for this: we've discussed that there is no real need to retain this class as a singleton:

  • every Supplier can have its own directory
  • adding Closeable makes it possible to move the cleanup to close
  • Suppliers can be closeable too

these changes were done :)

Copy link
Copy Markdown
Member

@rohangarg rohangarg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixups @kgyrtkirk - the changes LGTM! 👍
Approving the change, will wait sometime for Eric's approval as well before merge.

Copy link
Copy Markdown
Contributor

@imply-cheddar imply-cheddar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving.

One Philosophical soap-box comment. The TempDirProducer class follows a lifecycle of

  1. Constructor does some things that require closing
  2. Object gets closed

Generally speaking, this pattern of getting resources in the constructor can lead to problems sometimes when you want one bit of code to create objects and another bit of code to actually use them. The lifecycle of

  1. Object is constructor
  2. Object is started
  3. Object is closed

Is a cleaner lifecycle. For the TempDirProducer it's not really a big deal, which is why I'm leaving this as a passing comment on the approval PR, but if this was a true business logic object, it would be better to have 3 steps.

If it is common to want to constructor AND start the object at the same time, a

public static ObjectToConstructAndStart makeAndStart(Object ...)

style static method that calls .start() before returning the object can preserve call-site simplicity.

@rohangarg rohangarg merged commit 9c0bd56 into apache:master Apr 25, 2024
@kfaraz kfaraz added this to the 31.0.0 milestone Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Area - Querying

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants