Conversation
|
|
||
| This section contains important information about new and existing features. | ||
|
|
||
| ### Compaction on MSQ |
There was a problem hiding this comment.
lets rename this headline to Compaction Features. And then list
- Compaction scheduler with greater flexibility and control over when and what to compact
- MSQ Based Compaction for performant compaction jobs
- Concurrent compaction is now GA
No need to list all the nitty-gritty details as you have done right now. They just move to the different section or in the docs
There was a problem hiding this comment.
updated - asked @317brian to add the detail to the compaction docs.
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
| - Fixed an issue with `ScanQueryFrameProcessor` cursor build not adjusting intervals [#17168](https://github.com/apache/druid/pull/17168) | ||
| - Improved worker cancellation for the MSQ task engine to prevent race conditions [#17046](https://github.com/apache/druid/pull/17046) | ||
| - Improved memory management to better support multi-threaded workers [#17057](https://github.com/apache/druid/pull/17057) | ||
| - Reduced memory usage when transferring sketches between the MSQ task engine controller and worker [#16269](https://github.com/apache/druid/pull/16269) |
There was a problem hiding this comment.
This is duplicated from line 245. Also, a better way to word it might be "Add new format for serialization of sketches between MSQ controller and worker to reduce memory usage".
| - Improved memory management to better support multi-threaded workers [#17057](https://github.com/apache/druid/pull/17057) | ||
| - Reduced memory usage when transferring sketches between the MSQ task engine controller and worker [#16269](https://github.com/apache/druid/pull/16269) | ||
| - Improved error handling when retrieving Avro schemas from registry [#16684](https://github.com/apache/druid/pull/16684) | ||
| - Fixed issues related to partitioning boundaries in the MSQ task engine's window functions [#16729](https://github.com/apache/druid/pull/16729) |
There was a problem hiding this comment.
This is duplicated from line 247.
Also, it's a nit but a better message might be: Fixed issues related to partitioning boundaries for window functions in the MSQ task engine
| ##### Other streaming ingestion improvements | ||
| [#16358](https://github.com/apache/druid/pull/16358) | ||
|
|
||
| #### Other SQL-based ingestion improvements |
There was a problem hiding this comment.
Is this file expected to contain all PRs marked with milestone 31.0.0? For example, I don't see #16804 mentioned, is that expected?
There was a problem hiding this comment.
Hey Akshat, we typically don't include bug fixes unless there's a specific reason to. It's just new features/improvements. There are currently some fixes in there that I'll remove as part of the final cleanup.
It looks like 16804 and 17141 didn't have the bug labeled applied. Was that intentional?
There was a problem hiding this comment.
I see, thanks for the info!
It looks like 16804 and 17141 didn't have the bug labeled applied. Was that intentional?
Nope. I don't have the access to update PR labels, but yes both those PRs are bug-fixes.
| - Reduced memory usage when transferring sketches between the MSQ task engine controller and worker [#16269](https://github.com/apache/druid/pull/16269) | ||
| - Improved error handling when retrieving Avro schemas from registry [#16684](https://github.com/apache/druid/pull/16684) | ||
| - Fixed issues related to partitioning boundaries in the MSQ task engine's window functions [#16729](https://github.com/apache/druid/pull/16729) | ||
| - Fixed a boost column issue causing quantile sketches to incorrectly estimate the number of output partitions to create [#17141](https://github.com/apache/druid/pull/17141) |
There was a problem hiding this comment.
Nit: This is MSQ window function specific, so we can maybe add that to the message: Fixed a boost column issue causing quantile sketches to incorrectly estimate the number of output partitions to create for window functions in MSQ task engine
Also, I see this PR also mentioned in the Other querying improvements section - is that expected?
There was a problem hiding this comment.
Nope, it should not be duplicated. Will remove
LakshSingla
left a comment
There was a problem hiding this comment.
#16887 is not added in the release notes. A line item somewhere would be good.
@LakshSingla It doesn't look like it's in the milestone. Should I add it to the milestone too? |
|
|
||
| ### Projections (experimental) | ||
|
|
||
| Druid 31.0.0 includes experimental support for projections in segments. Like materialized views, projections can improve the performance of queries by optimizing the route the query takes when it executes. |
There was a problem hiding this comment.
ok, i gave this a shot, also included some instruction on how to use the feature since it isn't documented yet
Druid 31.0.0 includes experimental support for new feature called projections. Projections are grouped pre-aggregates of a segment that are automatically used at query time to optimize execution for any queries which 'fit' the shape of the projection by reducing both computation and i/o cost by reducing the number of rows which need to be processed. Projections are contained within segments of a datasource, and do increase the segment size, but are also able to share data such as value dictionaries of dictionary encoded columns with the columns of the base segment.
As an experimental feature, projections are not well documented yet, but can be defined for streaming ingestion and 'classic' batch ingestion as part of the
dataSchema. For example, using the standard wikipedia example:
"dataSchema": {
"granularitySpec": {
...
},
"dataSource": ...,
"timestampSpec": {
...
},
"dimensionsSpec": {
...
},
"projections": [
{
"type": "aggregate",
"name": "channel_page_hourly_distinct_user_added_deleted",
"groupingColumns": [
{
"type": "long",
"name": "__gran"
},
{
"type": "string",
"name": "channel"
},
{
"type": "string",
"name": "page"
}
],
"virtualColumns": [
{
"type": "expression",
"expression": "timestamp_floor(__time, 'PT1H')",
"name": "__gran",
"outputType": "LONG"
}
],
"aggregators": [
{
"type": "HLLSketchBuild",
"name": "distinct_users",
"fieldName": "user",
"round": true
},
{
"type": "longSum",
"name": "sum_added",
"fieldName": "added"
},
{
"type": "longSum",
"name": "sum_deleted",
"fieldName": "deleted"
}
]
},
...
]
},
...
The
groupingColumnsdefine the order which data is sorted in the projection. Instead of explicitly defining granularity like for the base table, it is defined by defining a virtual column; during ingestion the processing logic finds the ‘finest’ granularity virtual column that is atimestamp_floorexpression and uses it as the__timecolumn for the projection. Projections do not need to have a time column defined, in which case they can still match queries that are not grouping on time.
Projections only can currently be defined by classic ingestion, but they can still be used by queries using MSQ or the new Dart engine. Future development will allow projections to be created as part of MSQ based ingestion as well.
There are a few new query context flags which have been added to aid in experimentation with projections.
useProjectionaccepts a specific projection name and instructs the query engine that it must use that projection, and will fail the query if the projection does not match the queryforceProjectionsacceptstrueorfalseand instructs the query engine that it must use a projection, and will fail the query if it cannot find a matching projectionnoProjectionsaccpetstrueorfalseand instructs the query engines to not use any projections
We have a lot of plans to continue to improve this feature in the coming releases, but are excited to get it out there so users can begin experimentation since projections can dramatically improve query performance.
There was a problem hiding this comment.
i'm still working on the writeup for a design proposal for this, another option would be to link to that from this since it should contain some of this information
There was a problem hiding this comment.
I split this up. Part of it is in the highlight section and the details are in the Querying section. Also instead of including the JSON, I linked to it.
Release and upgrade notes for Druid 31.0.0
This PR has: