Better index for segments and pendingSegments table#4936
Better index for segments and pendingSegments table#4936yunjing wants to merge 1 commit intoapache:masterfrom smyte:yunjing.sql.index
Conversation
leventov
left a comment
There was a problem hiding this comment.
Thanks for contribution, please sign CLA here: http://druid.io/community/cla.html
| tableName, getPayloadType(), getQuoteString() | ||
| ) | ||
| ), | ||
| StringUtils.format("CREATE INDEX idx_%1$s_sequence_name ON %1$s(sequence_name)", tableName) |
There was a problem hiding this comment.
- Could you use simpler syntax
%sand two arguments:tableName, tableName? - Please add a comment in code, explaining why this index is needed
| ), | ||
| StringUtils.format("CREATE INDEX idx_%1$s_datasource ON %1$s(dataSource)", tableName), | ||
| StringUtils.format("CREATE INDEX idx_%1$s_used ON %1$s(used)", tableName) | ||
| StringUtils.format("CREATE INDEX idx_%1$s_datasource_used_time ON %1$s(dataSource,used,start,end)", tableName) |
| + ")", | ||
| tableName, getPayloadType(), getQuoteString() | ||
| ), | ||
| StringUtils.format("CREATE INDEX idx_%1$s_datasource ON %1$s(dataSource)", tableName), |
There was a problem hiding this comment.
Why these indexes are removed?
There was a problem hiding this comment.
The index for datasource is removed because the new index's prefix covers it already. As for the used index, correct me if I am wrong, it does not help any known queries by itself.
|
Probably fixed by #5149. |
|
@jihoonson can you please confirm if it is fixed? then we can either merge or close this? |
|
This PR addresses that querying on metastore may become slow if a lot of pendingSegments exist, but I think there are actually two different issues here. One is the growing pendingSegments table and another one is slow query speed on pendingSegments. The first issue is fixed in #5149. So, I think this PR is still worthwhile if we can get a noticeable performance benefit. I'm not sure how slow the query speed is. |
SELECT queries used by kafka supervisor and index workers become very slow as the number of segments grows. These new indexes would help reduce the metadata query time.
We are experiencing performance issue in production when a single index worker has to cover hourly segments for more than a few days (due to data backfill) because the total metadata query time grows as the number of (pending) segments. This is not an issue when only indexing realtime data.