Skip to content

Passing lockTimeout as a parameter for TaskLockbox.lock()#4549

Merged
gianm merged 10 commits intoapache:masterfrom
jihoonson:lock-timeout
Aug 9, 2017
Merged

Passing lockTimeout as a parameter for TaskLockbox.lock()#4549
gianm merged 10 commits intoapache:masterfrom
jihoonson:lock-timeout

Conversation

@jihoonson
Copy link
Copy Markdown
Contributor

@jihoonson jihoonson commented Jul 15, 2017

Fixes #4533.


This change is Reviewable

private static final boolean DEFAULT_GUARANTEE_ROLLUP = false;
private static final boolean DEFAULT_REPORT_PARSE_EXCEPTIONS = false;
private static final long DEFAULT_PUBLISH_TIMEOUT = 0;
private static final Period DEFAULT_LOCK_TIMEOUT = new Period("PT5m");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that intended that this is a separate constant with the one in HadoopTuningConfig?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was intended because tasks of different types use different tuningConfig implementation. But now, I'm thinking to move the lockTimeout configuration to TaskContext (#1604). It would be better because this configuration is used by all types of tasks, and thus moving to TaskContext will reduce duplicated codes in each implementation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll move it and add some docs soon.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jihoonson I suggest to use HttpClient timeout as default lock timeout. Say if http client timeout is less than 5 minutes, client anyways can not get the response back even after waiting for 5 minutes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akashdw I agree on that the default lock timeout should be smaller than http client timeout for now, so 5 min is the default value. However, I think the lock timeout configuration should be a separated one because 1) the problem still exist if users override the lock timeout with some smaller value than http client timeout, and 2) the fundamental problem looks that any messages aren't sent while waiting for a lock and should be fixed by sending kind of heartbeat messages in the future. If this problem is fixed, the lock timeout can be longer than the http client timeout.

I'll note this problem in the doc. Does it make sense?

private static final Boolean defaultReportParseExceptions = Boolean.FALSE;
private static final long defaultHandoffConditionTimeout = 0;
private static final long defaultAlertTimeout = 0;
private static final Period defaultLockTimeoutMs = new Period("PT5m");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here

*/
public TaskLock lock(final Task task, final Interval interval, long timeoutMs) throws InterruptedException
{
long nanos = TIME_UNIT.toNanos(timeoutMs);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Milliseconds are anyway hardcoded (parameter name timeoutMs) so IMO there is no point in constant TIME_UNIT, it could be just TimeUnit.MILLISECONDS.toNanos(timeoutMs)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@jihoonson jihoonson changed the title Passing lockTimeout as a parameter for TaskLockbox.lock() [WIP] Passing lockTimeout as a parameter for TaskLockbox.lock() Jul 17, 2017
@jihoonson jihoonson changed the title [WIP] Passing lockTimeout as a parameter for TaskLockbox.lock() Passing lockTimeout as a parameter for TaskLockbox.lock() Jul 19, 2017
@jihoonson
Copy link
Copy Markdown
Contributor Author

@leventov @akashdw thanks for looking at the first patch. It's now ready for review.

*/
public TaskLock lock(final Task task, final Interval interval, long timeoutMs) throws InterruptedException
{
long nanos = TimeUnit.MILLISECONDS.toNanos(timeoutMs);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, actual timeout should be min(timeoutMs, serverConfig.getMaxIdleTime().getMillis()), say if a task gets lock after 3 minutes and serverConfig.getMaxIdleTime().getMillis() is 2 minutes then taskAquireAction can not write the response in the out channel even after getting the lock.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the http timeout waiting for a lock is actually our bug. And your suggestion sounds a workaround to avoid this bug.
I think there is no reason the maxIdleTime and taskLockTimeout should be associated except the bug. If so, is it better to fix the bug rather than adding a workaround and telling users to learn this workaround? I think adding a caveat for this bug to the doc would be enough for now.

@akashdw @leventov @gianm any thoughts?

Copy link
Copy Markdown
Contributor

@akashdw akashdw Jul 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jihoonson sorry for the delay in response. I totally agree with you on maxIdleTime and taskLockTimeout should not be associated. May be you can write a comment that timeout passed might not work as expected if server idle time is less than taskLockTimeout.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akashdw thanks for your understanding. I added some caveats.

Copy link
Copy Markdown
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@gianm gianm merged commit d5606bc into apache:master Aug 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants