Skip to content

Make tempStorageDirectory configuration optional and rely on task dir instead#17015

Merged
adarshsanjeev merged 16 commits intoapache:masterfrom
adarshsanjeev:temp-dir-config
Oct 29, 2024
Merged

Make tempStorageDirectory configuration optional and rely on task dir instead#17015
adarshsanjeev merged 16 commits intoapache:masterfrom
adarshsanjeev:temp-dir-config

Conversation

@adarshsanjeev
Copy link
Copy Markdown
Contributor

@adarshsanjeev adarshsanjeev commented Sep 6, 2024

Currently, durable storage and export both require configuring a temporary directory to be used using druid.export.storage.<connectorType>.tempLocalDir and druid.msq.intermediate.storage.tempDir.

Tasks on middle manager already have a configured temporary directory. This PR aims to reduce the configuration required by using the task directory as a default if it is not explicitly configured, thus reducing the number of configs that a user has to set.

Please note that preference would be given to the user configured, druid.*.storage.temp*Dir, on the tasks. If that is not configured, we then use the configured temporary directory.

Overlord and brokers also require storage connector configurations (for the durableStorageCleanerOverlordDuty and to fetch results of async queries respectively), but do not have a default temporary task directory. The configuration is still required for these services.


Release notes

druid.export.storage.<google/s3>.tempLocalDir and druid.msq.intermediate.storage.tempDir are no longer required configurations. If not configured, the task defaults to using the task temp directory.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@github-actions github-actions Bot added Area - Batch Ingestion Area - Querying Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Sep 6, 2024
@LakshSingla
Copy link
Copy Markdown
Contributor

Overlord and brokers also require storage connector configurations (for the durabelStorageCleanerOverlordDuty and to fetch results of async queries respectively), but do not have a default temporary task directory

Should Druid rely on java.io.tmpDir in that case? FileUtils has multiple options to create temp directories. InputSourceSampler in overlord uses one of those methods to create a temp directory.

public interface StorageConnectorProvider extends Provider<StorageConnector>
public interface StorageConnectorProvider
{
StorageConnector createStorageConnector(File tempDir);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add java docs for this method ?
The current implementation just ignores this tempDir if the property is already set which might confuse the caller.

Is it possible to not change this interface and have the implementation take the "tempFIle" as the constructor param. Based on that, we can set the correct tempDir values.

Copy link
Copy Markdown
Contributor

@cryptoe cryptoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything else seems cool to me.

@cryptoe
Copy link
Copy Markdown
Contributor

cryptoe commented Oct 16, 2024

@adarshsanjeev I updated the description. Please check.

@adarshsanjeev
Copy link
Copy Markdown
Contributor Author

@cryptoe could you please take a look at the changes I made now to MSQDurableStorageModule?

@adarshsanjeev adarshsanjeev merged commit b7c661b into apache:master Oct 29, 2024
jtuglu1 pushed a commit to jtuglu1/druid that referenced this pull request Nov 20, 2024
…r instead (apache#17015)

Currently, durable storage and export both require configuring a temporary directory to be used using druid.export.storage.<connectorType>.tempLocalDir and druid.msq.intermediate.storage.tempDir.

Tasks on middle manager already have a configured temporary directory. This PR aims to reduce the configuration required by using the task directory as a default if it is not explicitly configured, thus reducing the number of configs that a user has to set.

Please note that preference would be given to the user configured, druid.*.storage.temp*Dir, on the tasks. If that is not configured, we then use the configured temporary directory.

Overlord and brokers also require storage connector configurations (for the durableStorageCleanerOverlordDuty and to fetch results of async queries respectively), but do not have a default temporary task directory. The configuration is still required for these services.
@adarshsanjeev adarshsanjeev added this to the 32.0.0 milestone Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area - Batch Ingestion Area - Documentation Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Area - Querying

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants