Passing lockTimeout as a parameter for TaskLockbox.lock()#4549
Passing lockTimeout as a parameter for TaskLockbox.lock()#4549gianm merged 10 commits intoapache:masterfrom
Conversation
| private static final boolean DEFAULT_GUARANTEE_ROLLUP = false; | ||
| private static final boolean DEFAULT_REPORT_PARSE_EXCEPTIONS = false; | ||
| private static final long DEFAULT_PUBLISH_TIMEOUT = 0; | ||
| private static final Period DEFAULT_LOCK_TIMEOUT = new Period("PT5m"); |
There was a problem hiding this comment.
Is that intended that this is a separate constant with the one in HadoopTuningConfig?
There was a problem hiding this comment.
Yes, it was intended because tasks of different types use different tuningConfig implementation. But now, I'm thinking to move the lockTimeout configuration to TaskContext (#1604). It would be better because this configuration is used by all types of tasks, and thus moving to TaskContext will reduce duplicated codes in each implementation.
There was a problem hiding this comment.
I'll move it and add some docs soon.
There was a problem hiding this comment.
@jihoonson I suggest to use HttpClient timeout as default lock timeout. Say if http client timeout is less than 5 minutes, client anyways can not get the response back even after waiting for 5 minutes.
There was a problem hiding this comment.
@akashdw I agree on that the default lock timeout should be smaller than http client timeout for now, so 5 min is the default value. However, I think the lock timeout configuration should be a separated one because 1) the problem still exist if users override the lock timeout with some smaller value than http client timeout, and 2) the fundamental problem looks that any messages aren't sent while waiting for a lock and should be fixed by sending kind of heartbeat messages in the future. If this problem is fixed, the lock timeout can be longer than the http client timeout.
I'll note this problem in the doc. Does it make sense?
| private static final Boolean defaultReportParseExceptions = Boolean.FALSE; | ||
| private static final long defaultHandoffConditionTimeout = 0; | ||
| private static final long defaultAlertTimeout = 0; | ||
| private static final Period defaultLockTimeoutMs = new Period("PT5m"); |
| */ | ||
| public TaskLock lock(final Task task, final Interval interval, long timeoutMs) throws InterruptedException | ||
| { | ||
| long nanos = TIME_UNIT.toNanos(timeoutMs); |
There was a problem hiding this comment.
Milliseconds are anyway hardcoded (parameter name timeoutMs) so IMO there is no point in constant TIME_UNIT, it could be just TimeUnit.MILLISECONDS.toNanos(timeoutMs)
| */ | ||
| public TaskLock lock(final Task task, final Interval interval, long timeoutMs) throws InterruptedException | ||
| { | ||
| long nanos = TimeUnit.MILLISECONDS.toNanos(timeoutMs); |
There was a problem hiding this comment.
IMO, actual timeout should be min(timeoutMs, serverConfig.getMaxIdleTime().getMillis()), say if a task gets lock after 3 minutes and serverConfig.getMaxIdleTime().getMillis() is 2 minutes then taskAquireAction can not write the response in the out channel even after getting the lock.
There was a problem hiding this comment.
I think that the http timeout waiting for a lock is actually our bug. And your suggestion sounds a workaround to avoid this bug.
I think there is no reason the maxIdleTime and taskLockTimeout should be associated except the bug. If so, is it better to fix the bug rather than adding a workaround and telling users to learn this workaround? I think adding a caveat for this bug to the doc would be enough for now.
There was a problem hiding this comment.
@jihoonson sorry for the delay in response. I totally agree with you on maxIdleTime and taskLockTimeout should not be associated. May be you can write a comment that timeout passed might not work as expected if server idle time is less than taskLockTimeout.
There was a problem hiding this comment.
@akashdw thanks for your understanding. I added some caveats.
Fixes #4533.
This change is