-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Issue on the try_number is not in the loop #30571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
| else: | ||
| time.sleep(self._get_next_poke_interval(started_at, run_duration, try_number)) | ||
| try_number += 1 | ||
| try_number += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weiwenlan It would be nice to see a test case asserting the expected poke interval to time.sleep() method for reschedule and non-reschedule modes. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@utkarsharma2 Thanks for the suggestion! I will try to add some test cases for the exponential_backoff parameter in different modes.
| else: | ||
| time.sleep(self._get_next_poke_interval(started_at, run_duration, try_number)) | ||
| try_number += 1 | ||
| try_number += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weiwenlan How this change can solve the problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @hussein-awala, this is just put the try_number outside the if logic. For now, if reschedule is true, the try_number will always be 1 and the exponential_backoff will be of no use. The exponential_backoff parameter is used in the function: _get_next_poke_interval and it will use try_number to generate the backoff interval. If the try_number remains 1, the interval will not change at all, even if the exponential_backoff is set to True. Show some code scripts below.
# code in the execute()
if self.reschedule:
next_poke_interval = self._get_next_poke_interval(started_at, run_duration, try_number)
reschedule_date = timezone.utcnow() + timedelta(seconds=next_poke_interval)
if _is_metadatabase_mysql() and reschedule_date > _MYSQL_TIMESTAMP_MAX:
raise AirflowSensorTimeout(
f"Cannot reschedule DAG {log_dag_id} to {reschedule_date.isoformat()} "
f"since it is over MySQL's TIMESTAMP storage limit."
)
raise AirflowRescheduleException(reschedule_date)
else:
time.sleep(self._get_next_poke_interval(started_at, run_duration, try_number))
try_number += 1
# code about determine the poke interval
def _get_next_poke_interval(
self,
started_at: datetime.datetime | float,
run_duration: Callable[[], float],
try_number: int,
) -> float:
"""Using the similar logic which is used for exponential backoff retry delay for operators."""
if not self.exponential_backoff:
return self.poke_interval
min_backoff = int(self.poke_interval * (2 ** (try_number - 2)))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When self.reschedule is True, raise AirflowRescheduleException(reschedule_date) will stop the execution, and the try_number += 1 is not reached even if it's outside the else block.
Even if we add try...finally, updating the try_number cannot solve the problem, because the try_number is resetting when we re-execute (reschedule) the sensor.
IMO, the try_number state should be stored somewhere in order to use it in the next reschedule (ex XCom)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. This solution is by far not complete. I will close the PR - we have few others including #30669 from @hussein-awala and related issue as well. #18080
I beleive (correct me if I am wrong @hussein-awala @uranusjr ) but we are converging on the need to add a new entity to keep try_nums ?).
Hi team,
I found that the sensor's exponential backoff will not working under the reschedule mode. The try_number could only be added 1 in the poke mode. I trying to move it out from the if else statement.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.