Skip to content

Flush oldest sink when total rows in all sinks exceeds maxRowsInMemory#2459

Closed
navis wants to merge 6 commits intoapache:masterfrom
navis:flush-oldest-sink
Closed

Flush oldest sink when total rows in all sinks exceeds maxRowsInMemory#2459
navis wants to merge 6 commits intoapache:masterfrom
navis:flush-oldest-sink

Conversation

@navis
Copy link
Copy Markdown
Contributor

@navis navis commented Feb 12, 2016

Currently maxRowsInMemory in RealtimeIndexTask is applied per sink base, which can cause OOM if too many sinks under the size of the threshold are created for some reason. This patch periodically checks total count of currently active sinks and persist the least recently accessed one if it exceeds the threshold.

@navis navis changed the title flush oldest sink on when excceding maxRowsInMemory Flush oldest sink on when excceding maxRowsInMemory Feb 12, 2016
@navis navis changed the title Flush oldest sink on when excceding maxRowsInMemory Flush oldest sink when total rows in all sinks exceeds maxRowsInMemory Feb 13, 2016
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its already volatile, do we really need to take a swaplock here ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If index can be null, it's not safe to do something like return index != null && index.canAppendRow(). I thought the overhead is minimal. But volatile needed to be removed then.

@fjy fjy added this to the 0.9.1 milestone Feb 17, 2016
@navis navis force-pushed the flush-oldest-sink branch from 4f657aa to 3840815 Compare February 17, 2016 23:54
@fjy
Copy link
Copy Markdown
Contributor

fjy commented Feb 18, 2016

@navis how did you create so many sinks at the same time without triggering maxPendingPersists to begin throttling ingestion?

@navis
Copy link
Copy Markdown
Contributor Author

navis commented Feb 18, 2016

@fjy I've used a day log file in local filesystem with hour segment granularity. Any of row count for an hour didn't exceeded maxRowsInMemory and persisting didn't happened. Consequently index task throws OOM after passing half of the day(13~14 sinks, I remember).

I know it's not general use cases and we can pass it with lower maxRowsInMemory by speculation. But this small patch can lessen the pain.

@fjy
Copy link
Copy Markdown
Contributor

fjy commented Feb 22, 2016

@gianm do you mind taking a look?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is unnecessary/unrealistic, but should this be determined based on the configured maxRowsInMemory? i.e., if the user configures maxRowsInMemory < 100000, they could in theory run into the same OOM issue.

Perhaps this isn't a practical problem though unless the per-row size is colossal and/or available memory is tiny.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to Math.min(config.getMaxRowsInMemory() >> 2, MAX_ROW_EXCEED_CHECK_COUNT). Thanks.

@navis navis force-pushed the flush-oldest-sink branch from 3840815 to 011fc93 Compare March 11, 2016 04:37
@fjy
Copy link
Copy Markdown
Contributor

fjy commented Mar 14, 2016

@gianm

@navis navis force-pushed the flush-oldest-sink branch 4 times, most recently from 20eac31 to 44586bc Compare March 16, 2016 04:16
@navis navis force-pushed the flush-oldest-sink branch from 44586bc to dad4608 Compare March 31, 2016 09:11
@navis navis force-pushed the flush-oldest-sink branch from dad4608 to a4b2847 Compare April 20, 2016 00:46
@fjy fjy modified the milestones: 0.9.2, 0.9.1 Apr 27, 2016
@fjy fjy modified the milestones: 0.9.3, 0.9.2 Jun 16, 2016
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file doesn't need to change

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Floats.BYTES

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Jul 26, 2016

@navis, this looks like a great change that would make druid easier to configure. my main comments are #2459 (comment) & #2459 (comment).

@navis
Copy link
Copy Markdown
Contributor Author

navis commented Jul 27, 2016

@gianm Because currently we are not using any kind of realtime ingestion (it's very very bad part of our on-going projects) and don't have a chance to apply this, it seemed not good idea for me to make patchs for this kind of issues. I'm good to this to be assigned by someone or to be closed with 'later' state.

@fjy
Copy link
Copy Markdown
Contributor

fjy commented Dec 21, 2016

I'm removing the milestone for now based on author's comments.

@fjy fjy removed this from the 0.10.0 milestone Dec 21, 2016
@stale
Copy link
Copy Markdown

stale Bot commented Feb 28, 2019

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

@stale stale Bot added the stale label Feb 28, 2019
@stale
Copy link
Copy Markdown

stale Bot commented Mar 7, 2019

This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@stale stale Bot closed this Mar 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants