Allow users to pass task payload via deep storage instead of environment variable by georgew5656 · Pull Request #14887 · apache/druid

georgew5656 · 2023-08-21T16:10:57Z

This change is meant to fix a issue where passing too large of a task payload to the mm-less task runner will cause the peon to fail to startup because the payload is passed (compressed) as a environment variable (TASK_JSON). In linux systems the limit for a environment variable is commonly 128KB, for windows systems less than this. Setting a env variable longer than this results in a bunch of "Argument list too long" errors.

Description

(1) Problem
The goal of this patch is to prevent larger tasks from failing with mm-less ingestion due to the TASK_JSON being too large as described above.

(2) Solution
Part 1. Optionally stop setting TASK_JSON

To address the immediate problem (setting environment variables that are too large), I added a additional config for the KubernetesTaskRunner (druid.indexer.runner.taskPayloadAsEnvVariable) that defaults to true but can be optionally set to false. Setting this config to false will cause the K8s adapters to not set the task payload as the TASK_JSON env variable. This prevents the Jobs from failing to come up.

Part 2. We still need to pass the task.json payload to the peons somehow. I explored three options for this, and ended up going with the below solution.

Push the task payload into task logs deep storage and have the peon read the payload.
I ended up going with this option because it was the most simple to implement and the most future-proof (no worry about task payloads getting larger than 1MB). The task logs killer will automatically delete the task.json in deep storage alongside he task logs whenever it is run.

Changes Made

Introduced a new interface (TaskPayloadManager) that exposes two methods (push task payload, pull task payload) and is implemented by TaskLogs. (I only implemented it for S3TaskLogs, but it should be easy to implement for other deep storage systems).
Introduce a new config in TaskConfig (druid.indexer.task.enableTaskPayloadManagerPerTask). When set on the overlord, mm-less ingestion will push the task.json payload to deep storage before launching a k8s job. When set on the peon, the CliPeon task injector will read the task.json from deep storage instead of assuming it is available on the file system. This is currently only useable by mm-less ingestion but it technically could be used in any task running scenario.
The K8s adapters in mm-less ingestion will check deep storage if the config is set when converting K8s jobs to tasks in toTask

(3) Alternative solutions to passing the task.json payload

Using k8s configmaps to store the task payload and then mounting them onto the created peon pods.
I decided not to go with this option because configmaps still have a 1MB size limit and I was concerned with the KubernetesTaskRunner having to manage a bunch of configmaps in addition to jobs. Having this many configmaps also pollutes K8s metadata, making it hard to see anything else going on when you're looking at configmaps.
Updating CliPeon to use the getTaskPayload endpoint on the overlord to pull the task.json payload on startup. This didn't work because we currently have a guice injector in the peon that requires the task.json be available at injection time. In order to pull the task.json from the overlord, we need to use the ServiceLocator class which is only available once the peon lifecycle has already started (after injection). Changing this would have required many changes to the code so I didn't want to do it. Additionally, I would have had to deprecate the toTask interface on the overlords since there would be no way for the overlord to turn a K8s Job into a task definition.

Release note

Support compressed task payloads larger than 128KB in mm-less ingestion.

Key changed/added classes in this PR

CliPeon
KubernetesPeonLifecycle
PodTemplateTaskAdapter
K8sTaskAdapter
S3TaskLogs TaskLogs

I can add some more documentation to this PR later but I wanted to get some feedback on this approach before doing so.

This PR has:

cryptoe · 2023-08-22T01:20:05Z

@georgew5656
The current approach also SGTM but I would not have it behind a feature flag. If you want to have a feature flag, I would optionally turn it on by default .
This also means we implement that TaskPayloadManager for other deep storage mainly 'GCS,AZURE,HDFS. If the implementation is not found, we should through a nice error message instructing the user to set druid.indexer.runner.taskPayloadAsEnvVariable to true or if we are not using a flag, instructing the user to implement the methods to implement 'TaskPayloadManager for there custom deep storage impl.

churromorales · 2023-08-22T20:08:03Z

@cryptoe I think the way it is implemented now behind the feature flag is better. For most usecases it is much better to just pass the task.json directly. It is much faster, also for our customers we use deep-storage which is slow and less featured than s3, thus this would be a feature we would only turn on if necessary. Additionally, our deep storage provider wont allow us to do batch deletes so the cleaner approach to remove the task files is not great. I know its a special case for us, but it is always better to pass something directly if possible than use indirection...disregarding our usecase.

LGTM overall, but one feature request I would like to request is that at the end of the task in AbstractTask to delete the task.json file, not leave it up to the cleaner. Quite a few folks that have their own k8s which they launch in their datacenters. While this works for cloud providers, it might not work for everyone else. This can be done in another PR or I can one up after this PR is merged if needed.

georgew5656 · 2023-08-23T17:08:21Z

@georgew5656 The current approach also SGTM but I would not have it behind a feature flag. If you want to have a feature flag, I would optionally turn it on by default . This also means we implement that TaskPayloadManager for other deep storage mainly 'GCS,AZURE,HDFS. If the implementation is not found, we should through a nice error message instructing the user to set druid.indexer.runner.taskPayloadAsEnvVariable to true or if we are not using a flag, instructing the user to implement the methods to implement 'TaskPayloadManager for there custom deep storage impl.

i didn't really want to break anyone who didn't want to use deep storage for task payloads for whatever reason. i think maybe later on if this gets used in production a bit more and the performance is okay maybe we could consider flipping it on as a default

cryptoe · 2023-08-24T07:06:13Z

Thanks @churromorales and @georgew5656 for the responses. I was more worried about adding another config which needs to be set for the end users.
The current opt out by default lgtm.

YongGang

LGTM generally, left some new comments along with my previous one #14887 (comment)

abhishekagarwal87 · 2023-09-04T08:19:50Z

I have concerns about the config change as well. In a single PR adding one config doesn't seem much, but after a year worth of work, you suddenly realize that the feature has become very complex to tune and use. Is there a way to check if the deep storage supports storing payload and if not, then using the environment variable? Because then we can change the default easily.

churromorales · 2023-09-12T17:31:52Z

Why not just check the size of the task.json, if its larger than then MAX_SIZE do it in deep storage, if it is smaller then just use the env? I agree 100% with Druid in general having too many configuration options, makes it hard to remember everything you need include. When creating this feature, the whole goal I had in mind was to have this work with as few configuration options as possible.

…agerForTaskPayload

abhishekagarwal87 · 2023-09-22T09:40:46Z

+      Path file = Files.createTempFile(taskId.getOriginalTaskId(), "task.json");
+      try {
+        FileUtils.writeStringToFile(file.toFile(), mapper.writeValueAsString(task), Charset.defaultCharset());
+        taskLogs.pushTaskPayload(task.getId(), file.toFile());


is it possible that a log cleanup job removes the task payload from the deep storage while task is still in progress? How are these payloads cleaned up from deep storage?

the task reads the file to disk on startup so i wouldn't be worried about the log cleanup job to clean it up that soon. we are relying on the log cleanup job to cleanup deep storage

abhishekagarwal87 · 2023-09-22T09:44:17Z

      DruidNode node,
-      ObjectMapper mapper
+      ObjectMapper mapper,
+      TaskLogs taskLogs


the name of this class has been confusing since it does so much more than dealing with task logs. could be fixed in some other PR someday.

abhishekagarwal87 · 2023-09-23T06:04:05Z

thank you for addressing comments @georgew5656. Looks good to me except for the exception handling in some places.

abhishekagarwal87 · 2023-09-28T09:27:47Z

+  {
+    com.google.common.base.Optional<InputStream> taskBody = taskLogs.streamTaskPayload(getTaskId(from).getOriginalTaskId());
+    if (!taskBody.isPresent()) {
+      throw InternalError.exception("Could not load task payload for job [%s]", from.getMetadata().getName());


is there an action you can associate with this error message? Like should they verify that overlord is successfully uploading task jsons to deep storage.

updated the message

abhishekagarwal87 · 2023-09-28T09:28:19Z

+  {
+    Map<String, String> annotations = from.getSpec().getTemplate().getMetadata().getAnnotations();
+    if (annotations == null) {
+      throw new IOE("No annotations found on pod spec for job [%s]", from.getMetadata().getName());


can this be replaced with DruidException.defensive()?

abhishekagarwal87

Almost there. Can you look into the test failures?

…a/org/apache/druid/k8s/overlord/taskadapter/K8sTaskAdapter.java Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>

georgew5656 · 2023-09-29T20:55:12Z

Almost there. Can you look into the test failures?

i think the only failing tests are for coverage now so we should be good

…ent variable (apache#14887) This change is meant to fix a issue where passing too large of a task payload to the mm-less task runner will cause the peon to fail to startup because the payload is passed (compressed) as a environment variable (TASK_JSON). In linux systems the limit for a environment variable is commonly 128KB, for windows systems less than this. Setting a env variable longer than this results in a bunch of "Argument list too long" errors.

…ent variable (#14887) This change is meant to fix a issue where passing too large of a task payload to the mm-less task runner will cause the peon to fail to startup because the payload is passed (compressed) as a environment variable (TASK_JSON). In linux systems the limit for a environment variable is commonly 128KB, for windows systems less than this. Setting a env variable longer than this results in a bunch of "Argument list too long" errors.

Implement pushTaskPayload/streamTaskPayload as introduced in #14887 for HDFS storage to allow larger mm-less ingestion payloads when using HDFS as the deep storage location.

Implement pushTaskPayload/streamTaskPayload as introduced in apache#14887 for HDFS storage to allow larger mm-less ingestion payloads when using HDFS as the deep storage location.

) * Some debug configs * use postgresql as the default metadata store and set a few debug log * Add s3 extension, update local storage directory, use emoji in website title * Update favicon, easier to find the console tab * Add indexer server, add some basic security config, updated historical and broker to use the common druid root directory * Some policy config * add checks for SegmentMetadataQuery * Add thread.sleep for flaky. * auth config * format, and remove temp folder rules * added NoopPolicyEnforcer and RestrictAllTablesPolicyEnforcer class * Support pushing and streaming task payload for HDFS (#17742) Implement pushTaskPayload/streamTaskPayload as introduced in #14887 for HDFS storage to allow larger mm-less ingestion payloads when using HDFS as the deep storage location. * Remove usages of deprecated API Files.write() (#17761) * Add deprecated com.google.common.io.Files#write to forbiddenApis * Replace deprecated Files.write() * Doc: Fix description typo for sqlserver metadata store (#17771) Mistakenly categories under deep storage instead of metadata store. * Fix binding of segment metadata cache on CliOverlord (#17772) Changes --------- - Bind `SegmentMetadataCache` only once to `HeapMemorySegmentMetadataCache` in `SQLMetadataStorageDruidModule` - Invoke start and stop of the cache from `DruidOverlord` rather than on lifecycle start/stop - Do not override the binding in `CliOverlord` * Docs: Remove semicolon from example (#17759) * Restrict segment metadata kill query till maxInterval from last kill task time (#17770) Changes --------- - Use `maxIntervalToKill` to determine search interval for killing unused segments. - If no segment has been killed for the datasource yet, use durationToRetain * Update the Supervisor endpoint to not restart the Supervisor if the spec was unmodified (#17707) Add an optional query parameter called skipRestartIfUnmodified to the /druid/indexer/v1/supervisor endpoint. Callers can set skipRestartIfUnmodified=true to not restart the supervisor if the spec is unchanged. Example: curl -X POST --header "Content-Type: application/json" -d @supervisor.json localhost:8888/druid/indexer/v1/supervisor?skipRestartIfUnmodified=true * Reduce noisy coordinator logs (#17779) * Emit time lag from Kafka supervisor (#17735) Changes --------- - Emit time lag from Kafka similar to Kinesis as metrics `ingest/kafka/lag/time`, `ingest/kafka/maxLag/time`, `ingest/kafka/avgLag/time` - Add new method in `KafkaSupervisor` to fetch timestamps of latest records in stream to compute time lag - Add new field `emitTimeLagMetrics` in `KafkaSupervisorIOConfig` to toggle emission of new metrics * fix processed row formatting (#17756) * Web console: add suggestions for table status filtering. (#17765) * suggest filter values when known * update snapshots * add more d * fix load rule clamp * better segment timeline init * Remove all usages of skife config (#17776) Changes --------- - Usages of skife config had been deprecated in #14695 and `LegacyBrokerParallelMergeConfig` is the last config class that still uses it. - Remove `org.skife.config` from pom, licenses, log4j2.xml, etc. - Add validation for deleted property paths in `StartupInjectorBuilder.PropertiesValidator` - Use the replacement flattened configs (which remove the `.task` and `.pool` substring) * Add field `taskLimits` to worker select strategies (#16889) Changes --------- - Add field `taskLimits` to the following worker select strategies `equalDistribution`, `equalDistributionWithCategorySpec`, `fillCapacityWithCategorySpec`, `fillCapacity` - Add sub-fields `maxSlotCountByType` and `maxSlotRatioByType` to `taskLimits` - Apply these limits per worker when assigning new tasks --------- Co-authored-by: sviatahorau <mikhail.sviatahorau@deep.bi> Co-authored-by: Benedict Jin <asdf2014@apache.org> Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * remove NullValueHandlingConfig, NullHandlingModule, NullHandling (#17778) * Docs: Add SQL query example (#17593) * Docs: Add query example * Update after review * Update query * Update docs/api-reference/sql-api.md --------- Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * More logging cleanup on Overlord (#17780) * Remove maven.twttr repo from pom (#17797) remove usage of dependency:go-offline from build scripts - as it tries to download excluded artifacts --------- Co-authored-by: Zoltan Haindrich <kirk@rxd.hu> * fix bug (#17791) * Log query stack traces for DEVELOPER and OPERATOR personas. (#17790) Currently, query stack traces are logged only when "debug: true" is set in the query context. This patch additionally logs stack traces targeted at the DEVELOPER or OPERATOR personas, because for these personas, stack traces are useful more often than not. We continue to omit stack traces by default for USER and ADMIN, because these personas are meant to interact with the API, not with code or logs. Skipping stack traces minimizes clutter in the logs. * Set useMaxMemoryEstimates=false for MSQ tasks (#17792) * Web console: fix go to task selecting correct task type (#17788) * fix go to task selecting correct task type * support autocompact also * support scheduled_batch, refactor * one more state and update tests * Enable ComponentSuppliers to run queries using Dart (#17787) Enables Calcite*Test-s and quidem tests to run queries with Dart. needed some minor tweaks: changed to use interfaces at some places renamed DartWorkerClient to DartWorkerClientImpl and made DartWorkerClient an interface reused existing parts of the MSQ test system to run the query * Fix single container config creates failing peon tasks (#17794) * Fix single container config creates failing peon tasks * More obvious array error output * Update `k8s-jobs.md` reference (#17805) Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com> * Footer Copyright Year Update (#17751) * Update docusaurus.config.js * Update docusaurus.config.js * [Revert] Reduce number of metadata transaction retries (#17808) * Revert "Run JDK 21 workflows with latest JDK. (#17694)" (#17806) * Revert "Run JDK 21 workflows with latest JDK. (#17694)" This reverts commit 31ede5c * Review comments. * Review comments. * Revert "reject publishing actions with a retriable error code if a earlier task is still publishing (#17509)" This reverts commit aca56d6. * Fix unstable tests after #17787 and dart usage in quidem-ut (#17814) * fixes * fix cleanup * Use "mix" shuffle spec for target size with nil clusterBy. (#17810) When a nil clusterBy is used, we have no way of achieving a particular target size, so we need to fall back to a "mix" spec (unsorted single partition). This comes up for queries like "SELECT COUNT(*) FROM FOO LIMIT 1" when results use a target size, such as when we are inserting into another table or when we are writing to durable storage. * Docs: Recommend using runtime property javaOptsArray instead of javaOpts * Add minor checks in jetty utils (#17817) Add minor checks in jetty utils class * CI improvement: Leverage cancelled() instead of always() for CI jobs (#17819) * Make MSQ tests use the same datasets as other similar tests (#17818) MSQ tests had their own way of creating the segments/etc - this have lead to that custom datasets didn't worked with them. This patch alters a few things to make it possible to access CompleteSegment for the active segments - which fixed the issue and also enabled the removal of the extra loading codes. * Add unnest tests to quidem (#17825) This PR adds the sql-native unnest tests to quidem. This set of tests has 6392 queries in total, with 5247 positive tests and 1145 negative tests. * Web console: show loader on aux queries (#17804) * show loader on aux queries * show supervisors if not on page 0 * refactor * fix bug fetching data when columns are added or removed * update test * Use compaction dynamic config to enable compaction supervisors (#17782) Changes --------- - Remove runtime property object `CompactionSupervisorConfig` - Add fields `useSupervisors` and `engine` to cluster-level compaction dynamic config - Remove unused field `useAutoScaleSlots` * Retry segment publish task actions without holding locks (#17816) #17802 reverted a retry of failed segment publish actions. This patch attempts to address the original issue by retrying the segment publish task actions on the client (i.e. task) side without holding any locks so that other transactions are not blocked. Changes Add retries to TransactionalSegmentPublisher Add field retryable to SegmentPublishResult Remove class DataStoreMetadataUpdateResult and use SegmentPublishResult instead * Add the capability to turboload segments onto historicals (#17775) Add the capability to set Historicals into a turbo loading mode, to focus on loading segments at the cost of query performance. Context -------- Currently, when a new Historical is started, it initially starts out using a bootstrap thread pool. It uses this thread pool to load any existing cached segments and broadcast segments. Once it loads any segments from both these sources, the historical switches to a smaller thread-pool and begins to serve queries. In certain cases, it would be useful to have the historical switch back to this mode, and focus on loading segments, either to continue loading the initial non-bootstrap segments, or to catch up with assigned segments. This PR adds a coordinator dynamic config that allows servers to be configured to use the larger bootstrap threadpool to load segments faster. Changes --------- - Added a new dynamic coordinator configuration, `turboLoadingNodes`. - Ignore `druid.coordinator.loadqueuepeon.http.batchSize` for servers in `turboLoadingNodes` - Add API on historical to return loading capabilities i.e. num loading threads in normal and turbo mode * Fix resource leak for GroupBy query merge buffer when query matched result cache (#17823) * Fix resource leak for GroupBy query merge buffer when match result cache * Fix resource leak for GroupBy query merge buffer when match result cache * Add test * Add test * Add comment * Add test * Add metric and simulation test for turbo loading mode (#17830) Changes --------- - Add field `loadingMode` to `SegmentChangeStatus` - Including loading mode in `DataSegmentChangeResponse` - Include loading mode in the `description` of metrics emitted from `HttpLoadQueuePeon` - Add simulation test to verify loading mode metrics * Update query example (#17811) * String util upgrade for jdk9+ (#17795) * Update StringUtils.replace() after fix in JDK9 * Upgrade optimized string replace algorithm * Update methods by re-using declared StringUtils#replace method * Replace hard-coded UTF-8 encodings with StandardCharsets * Documentation Fix (#17826) * Enable to run quidem tests against multiple configurations; add conditionals; cleanup framework init (#17829) * cleans up `SqlTestFramework` initialization to leave the `OverrideModule` empty - so that tests could more easily take over parts * remove the `QueryComponentSupplier#createEngine` factory method - instead uses a `Class<SqlEngine>` and use the `injector` to initialize it * enables the usage of `!disabled <supplier> <message>` - to mark cases which are not yet supported with a specific configuration for some reason * fixes that `datasets` was not respecting the `rollup` specification of the ingest * enables to use `MultiComponentSupplier` backed tests - these will turn into matrix tests over multiple componentsuppliers - enabling running the same testcase in different scenarios * Fix failing test in DimensionSchemaUtilsTest (#17832) * Improve performance of segment metadata cache on Overlord (#17785) Description ----------- #17653 introduces a cache for segment metadata on the Overlord. This patch is a follow up to that to make the cache more robust, performant and debug-friendly. Changes --------- - Do not cache unused segments This significantly reduces sync time in cases where the cluster has a lot of unused segments. Unused segments are needed only during segment allocation to ensure that a duplicate ID is not allocated. This is a rare DB query which is supported by sufficient indexes and thus need not be cached at the moment. - Update cache directly when segments are marked as unused to avoid race conditions with DB sync. - Fix NPE when using segment metadata cache with concurrent locks. - Atomically update segment IDs and pending segments in a `HeapMemoryDatasourceSegmentCache` using methods `syncSegmentIds()` and `syncPendingSegments()` rather than updating one by one. This ensures that the locks are held for a shorter period and the update made to the cache is atomic. Main updated classes ---------------------- - `IndexerMetadataStorageCoordinator` - `OverlordDataSourcesResource` - `HeapMemorySegmentMetadataCache` - `HeapMemoryDatasourceSegmentCache` Cleaner cache sync -------------------- In every sync, the following steps are performed for each datasource: - Retrieve ALL used segment IDs from metadata store - Atomically update segment IDs in cache and determine list of segment IDs which need to be refreshed. - Fetch payloads of segments that need to be refreshed - Atomically update fetched payloads into the cache - Fetch ALL pending segments - Atomically update pending segments into the cache - Clean up empty intervals from datasource caches * GroupBy: Fix offsets on outer queries. (#17837) Prior to this patch, an offset specified on a groupBy that itself has an inner groupBy would lead to an error like "Cannot push down offsets". This happened because of a violated assumption: the processing logic assumes that offsets have been pushed into limits (so limit pushdown optimizations can safely be used). This patch adjusts processing to incorporate offsets into limits during processing of subqueries. Later on, in post-processing, offsets are applied as written. * Enable build cache for web-console (#17831) * run audit fix (#17836) * Do not block task actions on Overlord if segment metadata cache is syncing (#17824) * Do not use segment metadata cache until leader has synced * Read from cache only when synced, but write even if sync is pending * Fix compilation * Fix checkstyle, test * Revert some extra changes * Add 3 modes of cache usage * Move enum to SegmentMetadataCache * Run tests in all 3 cache modes * Fix docs and IT configs * Fix config binding * Remove forbidden api * Fix typos, docs and enum casing * Fix doc * Add json, array, aggregation function tests to quidem (#17842) This PR adds the sql-native portion of the json, array, and aggregation function tests to quidem. It adds a total of 9965 queries, with 6752 positive tests and 3213 negative tests. * Optionally include Content-Disposition header in statement results API response (#17840) Adds support for an optional filename query parameter to the /druid/v2/sql/statements/{queryId}/results API. When provided, the response will include a header Content-Disposition: attachment; filename="{filename}", which will instruct a web browser to save the response as a file rather than displaying it inline. This save-as-attachment behavior could be achieved by adding a "download" attribute to the results link, but this only works for same-origin URLs (as in the Web Console). If the UI origin is different from the Druid API origin, browsers will ignore the attribute and serve the results inline, which is poor UX for files that are potentially very large. For the sake of consistency, all successful responses in SqlStatementResource.doGetResults may include this header, even if there are no results. Release note Improved: The "Get query results" statements API supports an optional filename query parameter. When provided, the response will instruct web browsers to save the results as a file instead of showing them inline (via the Content-Disposition header). * Web console: download follow up (#17845) * set filename * update download button * added markdown support * add test * better download * fix TSV * better download behaviour and tests * always show download all button * Fix flaky unit tests in SegmentBootstrapperTest and KinesisIndexTaskTest (#17841) Changes: - Fix flakiness in SegmentBootstrapperTest - Make TestSegmentCacheManager thread safe by moving from ArrayList to CopyOnWriteArrayList - Modify assertions to disregard list ordering since order of list modifications is not always deterministic - Fix flaky KinesisIndexTask tests. * Web console: responding to user feedback about the explore view and fixing bugs (#17844) * better debounce * better cumpose filter * hook up preview filters * better stack handling * fix some props * refactor stack to facet * fix hover part 1 * line hover part 2 * start adding moduleWhere * info popover * add filter icon * toggle button * module filter bar * update TestSegmentCacheManager * revert some style changes * validate datasource in CachingClusteredClient as well * fix build failure and update style * changes * add inlineds test * add sanity check on segment * inject policy enforcer * add PolicyEnforcer binding in MSQTestBase * add check in SinkQuerySegmentWalker * more tests in realtime server * revert config change in examples * revert config change in integration test config * more tests in msq * another test for unnest in msq * add support for policy from extension * more test * refactor MSQTaskQueryMakerTest to use an instance of MSQTaskQueryMaker * Add test for JoinDataSource * add policyEnforcer to withPolicies, and validate segment after segment mapping * fix binding and test * add policy module * mock planner toolbox * revert some injection * add test for stream appenderator * update PolicyEnforcer to take ReferenceCountingSegment as param * update to QueryLifecycleTest * update to SqlTestFramework * pass enforcer to BroadcastJoinSegmentMapFnProcessor and add test. PolicyEnforcer should also deal with multiple layer wrapped segments/ * ReferenceCountingSegment is not allowed to wrap with a SegmentReference, and PolicyEnforcer now validates all segments, remove test cases for inline/lookup. * moving ReferenceCountingSegment to another pr * Revert "Merge remote-tracking branch 'cecemei/debug' into policy" This reverts commit 25ffb7c, reversing changes made to 1e6632f. --------- Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com> Co-authored-by: Virushade <70288012+GWphua@users.noreply.github.com> Co-authored-by: Eyal Yurman <eyal.yurman@gmail.com> Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> Co-authored-by: Frank Chen <frank.chen021@outlook.com> Co-authored-by: Chetan Patidar <122344823+chetanpatidar26@users.noreply.github.com> Co-authored-by: aho135 <ash023@ucsd.edu> Co-authored-by: Adithya Chakilam <35785271+adithyachakilam@users.noreply.github.com> Co-authored-by: Vadim Ogievetsky <vadim@ogievetsky.com> Co-authored-by: Misha <mikhailsviatohorof@gmail.com> Co-authored-by: sviatahorau <mikhail.sviatahorau@deep.bi> Co-authored-by: Benedict Jin <asdf2014@apache.org> Co-authored-by: Clint Wylie <cwylie@apache.org> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Zoltan Haindrich <kirk@rxd.hu> Co-authored-by: Gian Merlino <gianmerlino@gmail.com> Co-authored-by: Emmanuel Ferdman <emmanuelferdman@gmail.com> Co-authored-by: Om Kenge <88768848+omkenge@users.noreply.github.com> Co-authored-by: Karan Kumar <karankumar1100@gmail.com> Co-authored-by: Lars Francke <lars.francke@stackable.tech> Co-authored-by: Adarsh Sanjeev <adarshsanjeev@gmail.com> Co-authored-by: Akshat Jain <akjn11@gmail.com> Co-authored-by: Andy Tsai <61856143+weishiuntsai@users.noreply.github.com> Co-authored-by: Maytas Monsereenusorn <maytasm@apache.org> Co-authored-by: jtuglu-netflix <jtuglu@netflix.com> Co-authored-by: Lucas Capistrant <capistrant@users.noreply.github.com>

…ent variable (apache#14887) This change is meant to fix a issue where passing too large of a task payload to the mm-less task runner will cause the peon to fail to startup because the payload is passed (compressed) as a environment variable (TASK_JSON). In linux systems the limit for a environment variable is commonly 128KB, for windows systems less than this. Setting a env variable longer than this results in a bunch of "Argument list too long" errors.

George Wu added 10 commits August 17, 2023 11:17

Separate out task logs

7a86726

working with cleaner configs

151b5c0

Remove unneeded changes

455dfcf

Working with new configs

6dd5713

Merge branch 'master' of github.com:georgew5656/druid into saveTaskLogs

4823adc

Pulling remote changes in

14650c9

Fixing checkstyle

378a472

Cleanup unit tests

12374be

fix checkstyle

ea8a37b

Add more unit tests

39c62ac

abhishekagarwal87 reviewed Aug 21, 2023

View reviewed changes

Comment thread processing/src/main/java/org/apache/druid/tasklogs/TaskPayloadManager.java

Comment thread processing/src/main/java/org/apache/druid/tasklogs/TaskPayloadManager.java Outdated

YongGang reviewed Aug 21, 2023

View reviewed changes

Clean up check failures

ca17d62

kfaraz added Kubernetes Area - Ingestion labels Aug 22, 2023

PR changes

bf21854

github-actions Bot added the Area - Documentation label Aug 23, 2023

Fix spellign errors

be50e45

georgew5656 requested review from YongGang and abhishekagarwal87 August 25, 2023 14:10

YongGang reviewed Aug 28, 2023

View reviewed changes

Comment thread docs/development/extensions-contrib/k8s-jobs.md Outdated

Comment thread ...rlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunnerConfig.java Outdated

George Wu added 3 commits August 28, 2023 16:11

Fix spacing in docs

3a59f3f

Don't overwrite table format

3bc19de

don't fix table format

4b05f39

George Wu added 3 commits September 19, 2023 11:41

Merge branch 'master' of github.com:georgew5656/druid into useTaskMan…

519b046

…agerForTaskPayload

remove unneeded arg

4afa8f3

Remove unused import

0af80bf

abhishekagarwal87 reviewed Sep 22, 2023

View reviewed changes

PR changes

ba82cb1

georgew5656 requested a review from abhishekagarwal87 September 22, 2023 20:07

George Wu added 2 commits September 25, 2023 11:41

more pr changes

f200439

Merge branch 'master' into useTaskManagerForTaskPayload

9f171fe

abhishekagarwal87 reviewed Sep 28, 2023

View reviewed changes

More pr changes

59dc958

georgew5656 requested a review from abhishekagarwal87 September 28, 2023 16:33

abhishekagarwal87 reviewed Sep 29, 2023

View reviewed changes

Comment thread ...rlord-extensions/src/main/java/org/apache/druid/k8s/overlord/taskadapter/K8sTaskAdapter.java Outdated

George Wu and others added 2 commits September 29, 2023 10:15

Fix static checks

6123a3e

Update extensions-contrib/kubernetes-overlord-extensions/src/main/jav…

99b68f0

…a/org/apache/druid/k8s/overlord/taskadapter/K8sTaskAdapter.java Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>

georgew5656 requested a review from abhishekagarwal87 September 29, 2023 20:55

abhishekagarwal87 approved these changes Oct 3, 2023

View reviewed changes

abhishekagarwal87 added the Release Notes label Oct 3, 2023

abhishekagarwal87 changed the title ~~Use task manager for task payload~~ Allow users to pass task payload via deep storage instead of environment variable Oct 3, 2023

abhishekagarwal87 merged commit 64754b6 into apache:master Oct 3, 2023

LakshSingla added this to the 28.0 milestone Oct 12, 2023

LakshSingla mentioned this pull request Nov 4, 2023

[DRAFT] 28.0.0 release notes #15326

Closed

GWphua mentioned this pull request Feb 20, 2025

Support task payload for HDFS #17742

Merged

5 tasks

Conversation

georgew5656 commented Aug 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Release note

Key changed/added classes in this PR

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cryptoe commented Aug 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

churromorales commented Aug 22, 2023

Uh oh!

georgew5656 commented Aug 23, 2023

Uh oh!

cryptoe commented Aug 24, 2023

Uh oh!

YongGang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

abhishekagarwal87 commented Sep 4, 2023

Uh oh!

churromorales commented Sep 12, 2023

Uh oh!

Uh oh!

Uh oh!

abhishekagarwal87 Sep 22, 2023

Choose a reason for hiding this comment

Uh oh!

georgew5656 Sep 22, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

abhishekagarwal87 Sep 22, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

abhishekagarwal87 commented Sep 23, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

abhishekagarwal87 Sep 28, 2023

Choose a reason for hiding this comment

Uh oh!

georgew5656 Sep 28, 2023

Choose a reason for hiding this comment

Uh oh!

abhishekagarwal87 Sep 28, 2023

Choose a reason for hiding this comment

Uh oh!

abhishekagarwal87 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

georgew5656 commented Sep 29, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

georgew5656 commented Aug 21, 2023 •

edited

Loading

cryptoe commented Aug 22, 2023 •

edited

Loading