[Backport] Druid quickstart: Update task memory by findingrish · Pull Request #13570 · apache/druid

findingrish · 2022-12-15T05:52:55Z

Backports #13563

* Fix web-console snapshots * Revert changes to package and package-lock.json

* Add sketch fetching framework * Refactor code to support sequential merge * Update worker sketch fetcher * Refactor sketch fetcher * Refactor sketch fetcher * Add context parameter and threshold to trigger sequential merge * Fix test * Add integration test for non sequential merge * Address review comments * Address review comments * Address review comments * Resolve maxRetainedBytes * Add new classes * Renamed key statistics information class * Rename fetchStatisticsSnapshotForTimeChunk function * Address review comments * Address review comments * Update documentation and add comments * Resolve build issues * Resolve build issues * Change worker APIs to async * Address review comments * Resolve build issues * Add null time check * Update integration tests * Address review comments * Add log messages and comments * Resolve build issues * Add unit tests * Add unit tests * Fix timing issue in tests

* Backport firehose PR 12981 * Update migrate-from-firehose-ingestion.md

* Suppress jackson-databind CVE-2022-42003 and CVE-2022-42004 (cherry picked from commit 1f4d892) * Suppress CVEs (cherry picked from commit ed55baa) * Suppress vulnerabilities from druid-website package (cherry picked from commit c0fb364) * Add more suppressions for website package (cherry picked from commit 9bba569) Co-authored-by: Rohan Garg <7731512+rohangarg@users.noreply.github.com>

…e#13438) * fixes BlockLayoutColumnarLongs close method to nullify internal buffer. * fixes other BlockLayoutColumnar supplier close methods to nullify internal buffers. * fix spotbugs (cherry picked from commit b091b32)

apache#13421) * we can read where we want to we can leave your bounds behind 'cause if the memory is not there we really don't care and we'll crash this process of mine

…che#13422)

apache#13442) (apache#13444)

* Update and document experimental features (cherry picked from commit ccbf3ab) * Updated (cherry picked from commit d7b8fae) * Update experimental-features.md * Updated after review (cherry picked from commit 975ae24) * Updated (cherry picked from commit eb8268e) * Update materialized-view.md (cherry picked from commit 53c3bde) * Update experimental-features.md (cherry picked from commit 77148f7)

* Update nested columns docs * Update nested-columns.md

…inputRow map instead of eagerly copying (apache#13406) (apache#13447)

…13445) Detects self-redirects, redirect loops, long redirect chains, and redirects to unknown servers. Treat all of these cases as an unavailable service, retrying if the retry policy allows it. Previously, some of these cases would lead to a prompt, unretryable error. This caused clients contacting an Overlord during a leader change to fail with error messages like: org.apache.druid.rpc.RpcException: Service [overlord] redirected too many times Additionally, a slight refactor of callbacks in ServiceClientImpl improves readability of the flow through onSuccess. Co-authored-by: Gian Merlino <gianmerlino@gmail.com>

…s to parse exception in MSQ (apache#13366) (apache#13454) * initial commit * fix test * push the json changes * reduce the area of the try..catch * Trigger Build * review

…ache#13459) (apache#13464) * Fix an issue with WorkerSketchFetcher not terminating on shutdown * Change threadpool name

* add ability to make inputFormat part of the example datasets (apache#13402) * Web console: Index spec dialog (apache#13425) * add index spec dialog * add sanpshot * Web console: be more robust to aux queries failing and improve kill tasks (apache#13431) * be more robust to aux queries failing * feedback fixes * remove empty block * fix spelling * remove killAllDataSources from the console * don't render duration if aggregated (apache#13455)

(cherry picked from commit 994d7c2)

* Update LDAP configuration docs (cherry picked from commit e74bd89) * Updated after review (cherry picked from commit 882e0b2) * Update auth-ldap.md Updated. (cherry picked from commit d4f0797) * Update auth-ldap.md (cherry picked from commit fbec7b2) * Updated spelling file (cherry picked from commit ef5316b) * Update docs/operations/auth-ldap.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> (cherry picked from commit 1a9b42a) * Update docs/operations/auth-ldap.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> (cherry picked from commit 1018d9a) * Update docs/operations/auth-ldap.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> (cherry picked from commit dd81b3f) * Update auth-ldap.md (cherry picked from commit f0655cf)

) (apache#13493) In a cluster with a large number of streaming tasks (~1000), SegmentAllocateActions on the overlord can often take very long intervals of time to finish thus causing spikes in the `task/action/run/time`. This may result in lag building up while a task waits for a segment to get allocated. The root causes are: - large number of metadata calls made to the segments and pending segments tables - `giant` lock held in `TaskLockbox.tryLock()` to acquire task locks and allocate segments Since the contention typically arises when several tasks of the same datasource try to allocate segments for the same interval/granularity, the allocation run times can be improved by batching the requests together. Changes - Add flags - `druid.indexer.tasklock.batchSegmentAllocation` (default `false`) - `druid.indexer.tasklock.batchAllocationMaxWaitTime` (in millis) (default `1000`) - Add methods `canPerformAsync` and `performAsync` to `TaskAction` - Submit each allocate action to a `SegmentAllocationQueue`, and add to correct batch - Process batch after `batchAllocationMaxWaitTime` - Acquire `giant` lock just once per batch in `TaskLockbox` - Reduce metadata calls by batching statements together and updating query filters - Except for batching, retain the whole behaviour (order of steps, retries, etc.) - Respond to leadership changes and fail items in queue when not leader - Emit batch and request level metrics

…he#13495) * Update docs for useBatchedSegmentSampler * Update docs for round robin assigment

* Update to native ingestion doc (cherry picked from commit aba83f2) * Update native-batch.md * Update native-batch.md

…verview.type=http (apache#13499) (apache#13515) * fix issue with http server inventory view blocking data node http server shutdown with long polling * adjust * fix test inspections

…pache#13517) Changes: - Limit max batch size in `SegmentAllocationQueue` to 500 - Rename `batchAllocationMaxWaitTime` to `batchAllocationWaitTime` since the actual wait time may exceed this configured value. - Replace usage of `SegmentInsertAction` in `TaskToolbox` with `SegmentTransactionalInsertAction`

… (apache#13529) * Remove stray reference to fix OOM while merging sketches * Update future to add result from executor service * Update tests and address review comments * Address review comments * Moved mock * Close threadpool on teardown * Remove worker task cancel

…ache#13537) (apache#13542) The planner sets sqlInsertSegmentGranularity in its context when using PARTITIONED BY, which sets it on every native query in the stack (as all native queries for a SQL query typically have the same context). QueryKit would interpret that as a request to configure bucketing for all native queries. This isn't useful, as bucketing is only used for the penultimate stage in INSERT / REPLACE. So, this patch modifies QueryKit to only look at sqlInsertSegmentGranularity on the outermost query. As an additional change, this patch switches the static ObjectMapper to use the processwide ObjectMapper for deserializing Granularities. Saves an ObjectMapper instance, and ensures that if there are any special serdes registered for Granularity, we'll pick them up. (cherry picked from commit 5581488) Co-authored-by: Gian Merlino <gianmerlino@gmail.com>

* Web console: add arrayOfDoublesSketch and other small fixes (apache#13486) * add padding and keywords * add arrayOfDoubles * Update docs/development/extensions-core/datasketches-tuple.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/development/extensions-core/datasketches-tuple.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/development/extensions-core/datasketches-tuple.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/development/extensions-core/datasketches-tuple.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/development/extensions-core/datasketches-tuple.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * partiton int * fix docs Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Web console: improve compaction status display (apache#13523) * improve compaction status display * even more accurate * fix snapshot * MSQ: Improve TooManyBuckets error message, improve error docs. (apache#13525) 1) Edited the TooManyBuckets error message to mention PARTITIONED BY instead of segmentGranularity. 2) Added error-code-specific anchors in the docs. 3) Add information to various error codes in the docs about common causes and solutions. * update error anchors (apache#13527) * update snapshot Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Gian Merlino <gianmerlino@gmail.com>

…attening (apache#13519) (apache#13546) * add protobuf flattener, direct to plain java conversion for faster flattening, nested column tests

Cherry-picked from commit 4ebdfe2

…che#13550) (apache#13555)

* Zero-copy local deep storage. This is useful for local deep storage, since it reduces disk usage and makes Historicals able to load segments instantaneously. Two changes: 1) Introduce "druid.storage.zip" parameter for local storage, which defaults to false. This changes default behavior from writing an index.zip to writing a regular directory. This is safe to do even during a rolling update, because the older code actually already handled unzipped directories being present on local deep storage. 2) In LocalDataSegmentPuller and LocalDataSegmentPusher, use hard links instead of copies when possible. (Generally this is possible when the source and destination directory are on the same filesystem.) Co-authored-by: Gian Merlino <gianmerlino@gmail.com>

…pache#13567) * Add validation checks to worker chat handler apis * Merge things and polishing the error messages. * Minor error message change * Fixing race and adding some tests * Fixing controller fetching stats from wrong workers. Fixing race Changing default mode to Parallel Adding logging. Fixing exceptions not propagated properly. * Changing to kernel worker count * Added a better logic to figure out assigned worker for a stage. * Nits * Moving to existing kernel methods * Adding more coverage Co-authored-by: cryptoe <karankumar1100@gmail.com> (cherry picked from commit 2b605aa) Co-authored-by: Adarsh Sanjeev <adarshsanjeev@gmail.com>

Changes: * Use 80% of memory specified for running services (versus 50% earlier). * Tasks get either 512m / 1024m or 2048m now (versus 512m or 2048m earlier). * Add direct memory for router.

kfaraz and others added 30 commits November 21, 2022 20:39

Update versions for 25.0 release

0c66815

Fix web-console snapshots (apache#13408)

35580fe

* Fix web-console snapshots * Revert changes to package and package-lock.json

Backport firehose doc changes (apache#13419)

1f107cb

* Backport firehose PR 12981 * Update migrate-from-firehose-ingestion.md

Add mechanism for 'safe' memory reads for complex types (apache#13361) (

7916770

apache#13421) * we can read where we want to we can leave your bounds behind 'cause if the memory is not there we really don't care and we'll crash this process of mine

fix off by one error in nested column range index (apache#13405) (apa…

d9a79f0

…che#13422)

Add MetricsVerifier to simplify verification of metric values in tests (

8e7a32a

apache#13442) (apache#13444)

Update nested columns docs (apache#13424)

4125701

* Update nested columns docs * Update nested-columns.md

fix issues with nested data conversion (apache#13407) (apache#13448)

753d770

fix KafkaInputFormat with nested columns by delegating to underlying …

23500a4

…inputRow map instead of eagerly copying (apache#13406) (apache#13447)

Convert errors based on implicit type conversion in multi value array…

8bf4b68

…s to parse exception in MSQ (apache#13366) (apache#13454) * initial commit * fix test * push the json changes * reduce the area of the try..catch * Trigger Build * review

Fix an issue with WorkerSketchFetcher not terminating on shutdown (ap…

ff3c83f

…ache#13459) (apache#13464) * Fix an issue with WorkerSketchFetcher not terminating on shutdown * Change threadpool name

Update experimental features doc (apache#13462)

054e4e9

(cherry picked from commit 994d7c2)

Docs: Update docs for coordinator dynamic config (apache#13494) (apac…

7d106e4

…he#13495) * Update docs for useBatchedSegmentSampler * Update docs for round robin assigment

Update to native ingestion doc - backport (apache#13483)

789922a

* Update to native ingestion doc (cherry picked from commit aba83f2) * Update native-batch.md * Update native-batch.md

Use version 25.0.0 in docker-compose.yml

888311c

fix issue with jetty graceful shutdown of data servers when druid.ser…

63780ed

…verview.type=http (apache#13499) (apache#13515) * fix issue with http server inventory view blocking data node http server shutdown with long polling * adjust * fix test inspections

add protobuf flattener, direct to plain java conversion for faster fl…

93e2a7f

…attening (apache#13519) (apache#13546) * add protobuf flattener, direct to plain java conversion for faster flattening, nested column tests

[Backport] Druid automated quickstart (apache#13365) (apache#13552)

5383dc5

Cherry-picked from commit 4ebdfe2

findingrish and others added 4 commits December 13, 2022 11:31

[Backport] Druid automated quickstart: zookeeper in service list (apa…

f88164a

…che#13550) (apache#13555)

Update task memory computation in start-druid (apache#13563)

8135648

Changes: * Use 80% of memory specified for running services (versus 50% earlier). * Tasks get either 512m / 1024m or 2048m now (versus 512m or 2048m earlier). * Add direct memory for router.

findingrish closed this Dec 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Backport] Druid quickstart: Update task memory#13570

[Backport] Druid quickstart: Update task memory#13570
findingrish wants to merge 34 commits intoapache:masterfrom
findingrish:backport_update_task_memory

findingrish commented Dec 15, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Conversation

findingrish commented Dec 15, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants