SegmentAllocateAction (fixes #1515) by gianm · Pull Request #1896 · apache/druid

gianm · 2015-10-30T23:29:43Z

This is a feature meant to allow realtime tasks to work without being told upfront
what shardSpec they should use (so we can potentially publish a variable number
of segments per interval).

The idea is that there is a "pendingSegments" table in the metadata store that
tracks allocated segments. Each one has a segment id (the same segment id we know
and love) and is also part of a sequence.

The sequences are an idea from @cheddar that offers a way of doing replication.
If there are N tasks reading exactly the same data with exactly the same logic
(think Kafka tasks reading a fixed range of offsets) then you can place them
in the same sequence, and they will generate the same sequence of segments.

himanshug · 2015-11-03T05:46:34Z

s/pendingSegmentTable/pendingSegmentsTable just for consistency

sure, this looks like a typo

himanshug · 2015-11-03T16:26:36Z

say a task T allocates a new segment and hands it off, then it receives another event in the same interval. This time, task is supposed to use the "previousId" so that a new segment gets allocated. For this, is task supposed to retain a list of all the segments it has handed off in its lifetime? What happens if/when task is restored (do we keep that list maintained on disk as well) ?

gianm · 2015-11-03T17:49:42Z

@himanshug it needs to remember the last segment that has been allocated in its sequence (last overall, not last per interval, so just one segment).

gianm · 2015-11-03T23:36:46Z

bad merge from master, will fix shortly.

himanshug · 2015-11-09T05:47:14Z

@gianm I'm not sure how following case works.
say T1 and T2 are replicated tasks (with same group id)
T1 receives event e, allocates a segment-id for same and hands it off at some point
T2 is slow and now receives event e, will it get a new segment-id and hand it off successfully resulting in duplication of data?

himanshug · 2015-11-09T05:49:53Z

@gianm trying to ans my own question. I guess T2 will receive same segment-id that T1 "created" and will eventually try to hand-off the same segment but will fail because commit metadata wouldn't match.

himanshug · 2015-11-09T06:15:59Z

note: this assumes, tasks in different replication set "share" the lock so that they both will be able to get lock on same interval if they received events with same or close enough timestamp.

gianm · 2015-11-09T16:01:58Z

@himanshug yes, that's what would happen, T1/T2 get the same segment id because they see the same events in the same order, and they're allocating ids from the same sequence.

IMO in that case the slow task shouldn't fail- it should realize that the faster task handed off that segment first (based on inspection of the commit metadata when its attempt to commit fails) and either keep continuing on or skip forwards.

gianm · 2015-11-09T21:36:13Z

Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 600.39 sec <<< FAILURE! - in io.druid.server.namespace.cache.NamespaceExtractionCacheManagerExecutorsTest
testConcurrentAddDelete(io.druid.server.namespace.cache.NamespaceExtractionCacheManagerExecutorsTest)  Time elapsed: 600.007 sec  <<< ERROR!
java.lang.Exception: test timed out after 600000 milliseconds
    at sun.misc.Unsafe.park(Native Method)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
    at java.util.concurrent.FutureTask.get(FutureTask.java:191)
    at io.druid.server.namespace.cache.NamespaceExtractionCacheManagerExecutorsTest.testConcurrentAddDelete(NamespaceExtractionCacheManagerExecutorsTest.java:312)

@cheddar

This is a feature meant to allow realtime tasks to work without being told upfront what shardSpec they should use (so we can potentially publish a variable number of segments per interval). The idea is that there is a "pendingSegments" table in the metadata store that tracks allocated segments. Each one has a segment id (the same segment id we know and love) and is also part of a sequence. The sequences are an idea from @cheddar that offers a way of doing replication. If there are N tasks reading exactly the same data with exactly the same logic (think Kafka tasks reading a fixed range of offsets) then you can place them in the same sequence, and they will generate the same sequence of segments.

himanshug · 2015-11-12T03:01:49Z

👍

fjy · 2015-11-18T23:25:56Z

👍

SegmentAllocateAction (fixes #1515)

drcrallen · 2016-04-05T21:11:17Z

    return e != null && (e instanceof SQLTransientException
                         || e instanceof SQLRecoverableException
                         || e instanceof UnableToObtainConnectionException
+                         || e instanceof UnableToExecuteStatementException


@gianm this causes lots of retries for duplicate primary key entry errors that will never be transient unless someone manually cleans the DB

#2573 should fix this

gianm closed this Oct 31, 2015

gianm reopened this Oct 31, 2015

himanshug reviewed Nov 3, 2015
View reviewed changes

gianm force-pushed the allocate-segment branch 2 times, most recently from 6a5f119 to 07996e6 Compare November 3, 2015 22:29

gianm force-pushed the allocate-segment branch from 07996e6 to bc8f076 Compare November 4, 2015 00:01

gianm mentioned this pull request Nov 4, 2015

[Do not merge] Appenderator stuff #1907

Closed

himanshug reviewed Nov 9, 2015
View reviewed changes

gianm force-pushed the allocate-segment branch 2 times, most recently from f604300 to 6ad58ff Compare November 9, 2015 15:59

gianm closed this Nov 9, 2015

gianm reopened this Nov 9, 2015

gianm closed this Nov 11, 2015

gianm reopened this Nov 11, 2015

gianm added 2 commits November 11, 2015 12:25

SegmentIdentifier, like a pre-DataSegment.

8e743b7

gianm force-pushed the allocate-segment branch from 6ad58ff to e4e5f03 Compare November 12, 2015 00:54

fjy closed this Nov 18, 2015

fjy reopened this Nov 18, 2015

fjy added a commit that referenced this pull request Nov 19, 2015

Merge pull request #1896 from gianm/allocate-segment

21c84b5

SegmentAllocateAction (fixes #1515)

fjy merged commit 21c84b5 into apache:master Nov 19, 2015

gianm mentioned this pull request Jan 7, 2016

Appenderators, DataSource metadata, KafkaIndexTask #2220

Merged

gianm mentioned this pull request Jan 22, 2016

Failure on creating pending segments table in MySQL or MariaDB metadata storage #2319

Closed

fjy modified the milestone: 0.9.0 Feb 4, 2016

drcrallen reviewed Apr 5, 2016
View reviewed changes

drcrallen mentioned this pull request Apr 5, 2016

Duplicate primary key errors cause TaskQueue big lock to be held for way longer than it should. #2793

Closed

gianm mentioned this pull request Jul 14, 2016

Hadoop indexing - use NumberedShardSpec instead of NoneShardSpec #3241

Closed

gianm deleted the allocate-segment branch September 23, 2022 19:18

Conversation

gianm commented Oct 30, 2015

Uh oh!

himanshug Nov 3, 2015

Choose a reason for hiding this comment

Uh oh!

gianm Nov 3, 2015

Choose a reason for hiding this comment

Uh oh!

himanshug commented Nov 3, 2015

Uh oh!

gianm commented Nov 3, 2015

Uh oh!

gianm commented Nov 3, 2015

Uh oh!

himanshug commented Nov 9, 2015

Uh oh!

himanshug commented Nov 9, 2015

Uh oh!

himanshug Nov 9, 2015

Choose a reason for hiding this comment

Uh oh!

gianm commented Nov 9, 2015

Uh oh!

gianm commented Nov 9, 2015

Uh oh!

himanshug commented Nov 12, 2015

Uh oh!

fjy commented Nov 18, 2015

Uh oh!

drcrallen Apr 5, 2016

Choose a reason for hiding this comment

Uh oh!

gianm Apr 5, 2016

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants