Skip to content

Conversation

@weiwenlan
Copy link

@weiwenlan weiwenlan commented Apr 11, 2023

Hi team,

I found that the sensor's exponential backoff will not working under the reschedule mode. The try_number could only be added 1 in the poke mode. I trying to move it out from the if else statement.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg
Copy link

boring-cyborg bot commented Apr 11, 2023

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@weiwenlan weiwenlan changed the title first commit Issue on the try_number is not in the loop Apr 11, 2023
@weiwenlan
Copy link
Author

#30572

else:
time.sleep(self._get_next_poke_interval(started_at, run_duration, try_number))
try_number += 1
try_number += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weiwenlan It would be nice to see a test case asserting the expected poke interval to time.sleep() method for reschedule and non-reschedule modes. :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@utkarsharma2 Thanks for the suggestion! I will try to add some test cases for the exponential_backoff parameter in different modes.

else:
time.sleep(self._get_next_poke_interval(started_at, run_duration, try_number))
try_number += 1
try_number += 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weiwenlan How this change can solve the problem?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @hussein-awala, this is just put the try_number outside the if logic. For now, if reschedule is true, the try_number will always be 1 and the exponential_backoff will be of no use. The exponential_backoff parameter is used in the function: _get_next_poke_interval and it will use try_number to generate the backoff interval. If the try_number remains 1, the interval will not change at all, even if the exponential_backoff is set to True. Show some code scripts below.

# code in the execute()
if self.reschedule:
    next_poke_interval = self._get_next_poke_interval(started_at, run_duration, try_number)
    reschedule_date = timezone.utcnow() + timedelta(seconds=next_poke_interval)
    if _is_metadatabase_mysql() and reschedule_date > _MYSQL_TIMESTAMP_MAX:
        raise AirflowSensorTimeout(
            f"Cannot reschedule DAG {log_dag_id} to {reschedule_date.isoformat()} "
            f"since it is over MySQL's TIMESTAMP storage limit."
        )
    raise AirflowRescheduleException(reschedule_date)
else:
    time.sleep(self._get_next_poke_interval(started_at, run_duration, try_number))
    try_number += 1
# code about determine the poke interval
def _get_next_poke_interval(
        self,
        started_at: datetime.datetime | float,
        run_duration: Callable[[], float],
        try_number: int,
    ) -> float:
        """Using the similar logic which is used for exponential backoff retry delay for operators."""
        if not self.exponential_backoff:
            return self.poke_interval

        min_backoff = int(self.poke_interval * (2 ** (try_number - 2)))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When self.reschedule is True, raise AirflowRescheduleException(reschedule_date) will stop the execution, and the try_number += 1 is not reached even if it's outside the else block.

Even if we add try...finally, updating the try_number cannot solve the problem, because the try_number is resetting when we re-execute (reschedule) the sensor.

IMO, the try_number state should be stored somewhere in order to use it in the next reschedule (ex XCom)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This solution is by far not complete. I will close the PR - we have few others including #30669 from @hussein-awala and related issue as well. #18080

I beleive (correct me if I am wrong @hussein-awala @uranusjr ) but we are converging on the need to add a new entity to keep try_nums ?).

@potiuk potiuk closed this Apr 22, 2023
@potiuk potiuk added the duplicate Issue that is duplicated label Apr 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

duplicate Issue that is duplicated

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants