Skip to content

Better index for segments and pendingSegments table#4936

Closed
yunjing wants to merge 1 commit intoapache:masterfrom
smyte:yunjing.sql.index
Closed

Better index for segments and pendingSegments table#4936
yunjing wants to merge 1 commit intoapache:masterfrom
smyte:yunjing.sql.index

Conversation

@yunjing
Copy link
Copy Markdown

@yunjing yunjing commented Oct 10, 2017

SELECT queries used by kafka supervisor and index workers become very slow as the number of segments grows. These new indexes would help reduce the metadata query time.

We are experiencing performance issue in production when a single index worker has to cover hourly segments for more than a few days (due to data backfill) because the total metadata query time grows as the number of (pending) segments. This is not an issue when only indexing realtime data.

Copy link
Copy Markdown
Member

@leventov leventov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contribution, please sign CLA here: http://druid.io/community/cla.html

tableName, getPayloadType(), getQuoteString()
)
),
StringUtils.format("CREATE INDEX idx_%1$s_sequence_name ON %1$s(sequence_name)", tableName)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Could you use simpler syntax %s and two arguments: tableName, tableName?
  • Please add a comment in code, explaining why this index is needed

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yunjing could you fix?

),
StringUtils.format("CREATE INDEX idx_%1$s_datasource ON %1$s(dataSource)", tableName),
StringUtils.format("CREATE INDEX idx_%1$s_used ON %1$s(used)", tableName)
StringUtils.format("CREATE INDEX idx_%1$s_datasource_used_time ON %1$s(dataSource,used,start,end)", tableName)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

+ ")",
tableName, getPayloadType(), getQuoteString()
),
StringUtils.format("CREATE INDEX idx_%1$s_datasource ON %1$s(dataSource)", tableName),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why these indexes are removed?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The index for datasource is removed because the new index's prefix covers it already. As for the used index, correct me if I am wrong, it does not help any known queries by itself.

@jihoonson
Copy link
Copy Markdown
Contributor

Probably fixed by #5149.

@b-slim
Copy link
Copy Markdown
Contributor

b-slim commented Jan 24, 2018

@jihoonson can you please confirm if it is fixed? then we can either merge or close this?

@jihoonson
Copy link
Copy Markdown
Contributor

This PR addresses that querying on metastore may become slow if a lot of pendingSegments exist, but I think there are actually two different issues here. One is the growing pendingSegments table and another one is slow query speed on pendingSegments.

The first issue is fixed in #5149. So, I think this PR is still worthwhile if we can get a noticeable performance benefit. I'm not sure how slow the query speed is.

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Oct 15, 2018

I think between #5149, #6348, and #6356 this issue is addressed by other means. But thank you @yunjing for your interest in contributing!

@gianm gianm closed this Oct 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants