Currently the completionTimeout based timeouts can lead to issues where if it's too short, the tasks are killed in the middle of publishing, only to be started again from the beginning in an eternal Sisyphean struggle. Also, the tasks status is SUCCESS when this happens, which is confusing.
I'd suggest the following different behavior.
- Set the default completionTimeout somewhat longer (right now it's 30 minutes). If it's too long then there might be too many tasks running at once, but I think we could afford 90 minutes.
- Mark tasks FAILED if they time out before publishing (upload + commit to metadata store).
- Mark tasks SUCCESS if they time out after publishing, but before historical handoff.
May be partially obsoleted by #4178.
Currently the
completionTimeoutbased timeouts can lead to issues where if it's too short, the tasks are killed in the middle of publishing, only to be started again from the beginning in an eternal Sisyphean struggle. Also, the tasks status is SUCCESS when this happens, which is confusing.I'd suggest the following different behavior.
May be partially obsoleted by #4178.