Skip to content

Conversation

@michaelmicheal
Copy link
Contributor


Similar to this issue and the associated PR, we are seeing MySQL is not using the ti_state index and completing a full-table scan on the task_instance table on the find_zombies query when the ti_state index should be used to find tasks in running state.

zombies = (
            session.query(TaskInstance, DagModel.fileloc)
            .join(LocalTaskJob, TaskInstance.job_id == LocalTaskJob.id)
            .join(DagModel, TaskInstance.dag_id == DagModel.dag_id)
            .filter(TaskInstance.state == TaskInstanceState.RUNNING)
            .filter(
                or_(
                    LocalTaskJob.state != State.RUNNING,
                    LocalTaskJob.latest_heartbeat < limit_dttm,
                )
            )
            .filter(TaskInstance.queued_by_job_id == self.id)
            .all()
        )

Adding a MySQL index hint resolved this issue for us, so this PR adds an index hint on the above query.

@boring-cyborg boring-cyborg bot added the area:Scheduler including HA (high availability) scheduler label Aug 15, 2022
@ashb ashb merged commit 43f9e53 into apache:main Aug 15, 2022
@michaelmicheal michaelmicheal deleted the add-mysql-zombies-index-hint branch August 15, 2022 16:19
@eladkal eladkal added this to the Airflow 2.3.5 milestone Aug 26, 2022
@eladkal eladkal added the type:bug-fix Changelog: Bug Fixes label Aug 26, 2022
@ashb ashb modified the milestones: Airflow 2.3.5, Airflow 2.4.0 Sep 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Scheduler including HA (high availability) scheduler type:bug-fix Changelog: Bug Fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants