[Backport] Druid quickstart: Update task memory#13570
Closed
findingrish wants to merge 34 commits intoapache:masterfrom
Closed
[Backport] Druid quickstart: Update task memory#13570findingrish wants to merge 34 commits intoapache:masterfrom
findingrish wants to merge 34 commits intoapache:masterfrom
Conversation
* Fix web-console snapshots * Revert changes to package and package-lock.json
* Add sketch fetching framework * Refactor code to support sequential merge * Update worker sketch fetcher * Refactor sketch fetcher * Refactor sketch fetcher * Add context parameter and threshold to trigger sequential merge * Fix test * Add integration test for non sequential merge * Address review comments * Address review comments * Address review comments * Resolve maxRetainedBytes * Add new classes * Renamed key statistics information class * Rename fetchStatisticsSnapshotForTimeChunk function * Address review comments * Address review comments * Update documentation and add comments * Resolve build issues * Resolve build issues * Change worker APIs to async * Address review comments * Resolve build issues * Add null time check * Update integration tests * Address review comments * Add log messages and comments * Resolve build issues * Add unit tests * Add unit tests * Fix timing issue in tests
* Backport firehose PR 12981 * Update migrate-from-firehose-ingestion.md
* Suppress jackson-databind CVE-2022-42003 and CVE-2022-42004 (cherry picked from commit 1f4d892) * Suppress CVEs (cherry picked from commit ed55baa) * Suppress vulnerabilities from druid-website package (cherry picked from commit c0fb364) * Add more suppressions for website package (cherry picked from commit 9bba569) Co-authored-by: Rohan Garg <7731512+rohangarg@users.noreply.github.com>
apache#13421) * we can read where we want to we can leave your bounds behind 'cause if the memory is not there we really don't care and we'll crash this process of mine
* Update and document experimental features (cherry picked from commit ccbf3ab) * Updated (cherry picked from commit d7b8fae) * Update experimental-features.md * Updated after review (cherry picked from commit 975ae24) * Updated (cherry picked from commit eb8268e) * Update materialized-view.md (cherry picked from commit 53c3bde) * Update experimental-features.md (cherry picked from commit 77148f7)
* Update nested columns docs * Update nested-columns.md
…inputRow map instead of eagerly copying (apache#13406) (apache#13447)
…13445) Detects self-redirects, redirect loops, long redirect chains, and redirects to unknown servers. Treat all of these cases as an unavailable service, retrying if the retry policy allows it. Previously, some of these cases would lead to a prompt, unretryable error. This caused clients contacting an Overlord during a leader change to fail with error messages like: org.apache.druid.rpc.RpcException: Service [overlord] redirected too many times Additionally, a slight refactor of callbacks in ServiceClientImpl improves readability of the flow through onSuccess. Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
…s to parse exception in MSQ (apache#13366) (apache#13454) * initial commit * fix test * push the json changes * reduce the area of the try..catch * Trigger Build * review
…ache#13459) (apache#13464) * Fix an issue with WorkerSketchFetcher not terminating on shutdown * Change threadpool name
* add ability to make inputFormat part of the example datasets (apache#13402) * Web console: Index spec dialog (apache#13425) * add index spec dialog * add sanpshot * Web console: be more robust to aux queries failing and improve kill tasks (apache#13431) * be more robust to aux queries failing * feedback fixes * remove empty block * fix spelling * remove killAllDataSources from the console * don't render duration if aggregated (apache#13455)
(cherry picked from commit 994d7c2)
* Update LDAP configuration docs (cherry picked from commit e74bd89) * Updated after review (cherry picked from commit 882e0b2) * Update auth-ldap.md Updated. (cherry picked from commit d4f0797) * Update auth-ldap.md (cherry picked from commit fbec7b2) * Updated spelling file (cherry picked from commit ef5316b) * Update docs/operations/auth-ldap.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> (cherry picked from commit 1a9b42a) * Update docs/operations/auth-ldap.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> (cherry picked from commit 1018d9a) * Update docs/operations/auth-ldap.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> (cherry picked from commit dd81b3f) * Update auth-ldap.md (cherry picked from commit f0655cf)
) (apache#13493) In a cluster with a large number of streaming tasks (~1000), SegmentAllocateActions on the overlord can often take very long intervals of time to finish thus causing spikes in the `task/action/run/time`. This may result in lag building up while a task waits for a segment to get allocated. The root causes are: - large number of metadata calls made to the segments and pending segments tables - `giant` lock held in `TaskLockbox.tryLock()` to acquire task locks and allocate segments Since the contention typically arises when several tasks of the same datasource try to allocate segments for the same interval/granularity, the allocation run times can be improved by batching the requests together. Changes - Add flags - `druid.indexer.tasklock.batchSegmentAllocation` (default `false`) - `druid.indexer.tasklock.batchAllocationMaxWaitTime` (in millis) (default `1000`) - Add methods `canPerformAsync` and `performAsync` to `TaskAction` - Submit each allocate action to a `SegmentAllocationQueue`, and add to correct batch - Process batch after `batchAllocationMaxWaitTime` - Acquire `giant` lock just once per batch in `TaskLockbox` - Reduce metadata calls by batching statements together and updating query filters - Except for batching, retain the whole behaviour (order of steps, retries, etc.) - Respond to leadership changes and fail items in queue when not leader - Emit batch and request level metrics
…he#13495) * Update docs for useBatchedSegmentSampler * Update docs for round robin assigment
* Update to native ingestion doc (cherry picked from commit aba83f2) * Update native-batch.md * Update native-batch.md
…verview.type=http (apache#13499) (apache#13515) * fix issue with http server inventory view blocking data node http server shutdown with long polling * adjust * fix test inspections
…pache#13517) Changes: - Limit max batch size in `SegmentAllocationQueue` to 500 - Rename `batchAllocationMaxWaitTime` to `batchAllocationWaitTime` since the actual wait time may exceed this configured value. - Replace usage of `SegmentInsertAction` in `TaskToolbox` with `SegmentTransactionalInsertAction`
… (apache#13529) * Remove stray reference to fix OOM while merging sketches * Update future to add result from executor service * Update tests and address review comments * Address review comments * Moved mock * Close threadpool on teardown * Remove worker task cancel
…ache#13537) (apache#13542) The planner sets sqlInsertSegmentGranularity in its context when using PARTITIONED BY, which sets it on every native query in the stack (as all native queries for a SQL query typically have the same context). QueryKit would interpret that as a request to configure bucketing for all native queries. This isn't useful, as bucketing is only used for the penultimate stage in INSERT / REPLACE. So, this patch modifies QueryKit to only look at sqlInsertSegmentGranularity on the outermost query. As an additional change, this patch switches the static ObjectMapper to use the processwide ObjectMapper for deserializing Granularities. Saves an ObjectMapper instance, and ensures that if there are any special serdes registered for Granularity, we'll pick them up. (cherry picked from commit 5581488) Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
* Web console: add arrayOfDoublesSketch and other small fixes (apache#13486) * add padding and keywords * add arrayOfDoubles * Update docs/development/extensions-core/datasketches-tuple.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/development/extensions-core/datasketches-tuple.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/development/extensions-core/datasketches-tuple.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/development/extensions-core/datasketches-tuple.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/development/extensions-core/datasketches-tuple.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * partiton int * fix docs Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Web console: improve compaction status display (apache#13523) * improve compaction status display * even more accurate * fix snapshot * MSQ: Improve TooManyBuckets error message, improve error docs. (apache#13525) 1) Edited the TooManyBuckets error message to mention PARTITIONED BY instead of segmentGranularity. 2) Added error-code-specific anchors in the docs. 3) Add information to various error codes in the docs about common causes and solutions. * update error anchors (apache#13527) * update snapshot Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
…attening (apache#13519) (apache#13546) * add protobuf flattener, direct to plain java conversion for faster flattening, nested column tests
Cherry-picked from commit 4ebdfe2
* Zero-copy local deep storage. This is useful for local deep storage, since it reduces disk usage and makes Historicals able to load segments instantaneously. Two changes: 1) Introduce "druid.storage.zip" parameter for local storage, which defaults to false. This changes default behavior from writing an index.zip to writing a regular directory. This is safe to do even during a rolling update, because the older code actually already handled unzipped directories being present on local deep storage. 2) In LocalDataSegmentPuller and LocalDataSegmentPusher, use hard links instead of copies when possible. (Generally this is possible when the source and destination directory are on the same filesystem.) Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
…pache#13567) * Add validation checks to worker chat handler apis * Merge things and polishing the error messages. * Minor error message change * Fixing race and adding some tests * Fixing controller fetching stats from wrong workers. Fixing race Changing default mode to Parallel Adding logging. Fixing exceptions not propagated properly. * Changing to kernel worker count * Added a better logic to figure out assigned worker for a stage. * Nits * Moving to existing kernel methods * Adding more coverage Co-authored-by: cryptoe <karankumar1100@gmail.com> (cherry picked from commit 2b605aa) Co-authored-by: Adarsh Sanjeev <adarshsanjeev@gmail.com>
Changes: * Use 80% of memory specified for running services (versus 50% earlier). * Tasks get either 512m / 1024m or 2048m now (versus 512m or 2048m earlier). * Add direct memory for router.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Backports #13563