Better index for segments and pendingSegments table by yunjing · Pull Request #4936 · apache/druid

yunjing · 2017-10-10T19:15:27Z

SELECT queries used by kafka supervisor and index workers become very slow as the number of segments grows. These new indexes would help reduce the metadata query time.

We are experiencing performance issue in production when a single index worker has to cover hourly segments for more than a few days (due to data backfill) because the total metadata query time grows as the number of (pending) segments. This is not an issue when only indexing realtime data.

leventov

Thanks for contribution, please sign CLA here: http://druid.io/community/cla.html

leventov · 2017-10-10T22:58:37Z

                tableName, getPayloadType(), getQuoteString()
-            )
+            ),
+            StringUtils.format("CREATE INDEX idx_%1$s_sequence_name ON %1$s(sequence_name)", tableName)


Could you use simpler syntax %s and two arguments: tableName, tableName?

Please add a comment in code, explaining why this index is needed

@yunjing could you fix?

leventov · 2017-10-10T22:58:43Z

            ),
-            StringUtils.format("CREATE INDEX idx_%1$s_datasource ON %1$s(dataSource)", tableName),
-            StringUtils.format("CREATE INDEX idx_%1$s_used ON %1$s(used)", tableName)
+            StringUtils.format("CREATE INDEX idx_%1$s_datasource_used_time ON %1$s(dataSource,used,start,end)", tableName)


Same as above

leventov · 2017-10-10T22:59:13Z

                + ")",
                tableName, getPayloadType(), getQuoteString()
            ),
-            StringUtils.format("CREATE INDEX idx_%1$s_datasource ON %1$s(dataSource)", tableName),


Why these indexes are removed?

The index for datasource is removed because the new index's prefix covers it already. As for the used index, correct me if I am wrong, it does not help any known queries by itself.

jihoonson · 2018-01-10T03:40:47Z

Probably fixed by #5149.

b-slim · 2018-01-24T19:47:33Z

@jihoonson can you please confirm if it is fixed? then we can either merge or close this?

jihoonson · 2018-01-24T19:55:55Z

This PR addresses that querying on metastore may become slow if a lot of pendingSegments exist, but I think there are actually two different issues here. One is the growing pendingSegments table and another one is slow query speed on pendingSegments.

The first issue is fixed in #5149. So, I think this PR is still worthwhile if we can get a noticeable performance benefit. I'm not sure how slow the query speed is.

gianm · 2018-10-15T21:34:11Z

I think between #5149, #6348, and #6356 this issue is addressed by other means. But thank you @yunjing for your interest in contributing!

Better index for segments and pendingSegments table

b7a018c

leventov reviewed Oct 10, 2017

View reviewed changes

gianm closed this Oct 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better index for segments and pendingSegments table#4936

Better index for segments and pendingSegments table#4936
yunjing wants to merge 1 commit intoapache:masterfrom
smyte:yunjing.sql.index

yunjing commented Oct 10, 2017

Uh oh!

leventov left a comment

Uh oh!

leventov Oct 10, 2017

Uh oh!

leventov Oct 12, 2017

Uh oh!

leventov Oct 10, 2017

Uh oh!

leventov Oct 10, 2017

Uh oh!

yunjing Oct 10, 2017

Uh oh!

jihoonson commented Jan 10, 2018

Uh oh!

b-slim commented Jan 24, 2018

Uh oh!

jihoonson commented Jan 24, 2018

Uh oh!

gianm commented Oct 15, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

yunjing commented Oct 10, 2017

Uh oh!

leventov left a comment

Choose a reason for hiding this comment

Uh oh!

leventov Oct 10, 2017

Choose a reason for hiding this comment

Uh oh!

leventov Oct 12, 2017

Choose a reason for hiding this comment

Uh oh!

leventov Oct 10, 2017

Choose a reason for hiding this comment

Uh oh!

leventov Oct 10, 2017

Choose a reason for hiding this comment

Uh oh!

yunjing Oct 10, 2017

Choose a reason for hiding this comment

Uh oh!

jihoonson commented Jan 10, 2018

Uh oh!

b-slim commented Jan 24, 2018

Uh oh!

jihoonson commented Jan 24, 2018

Uh oh!

gianm commented Oct 15, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants