Skip to content

Multiple Triggerer processes keeps reassigning triggers to each other when job_heartbeat_sec is higher than 30 seconds. #32121

@tirkarthi

Description

@tirkarthi

Apache Airflow version

main (development)

What happened

Airflow has job_heartbeat_sec (default 5) that was updated to 50 seconds in our environment. This caused 2 instances of triggerer process running for HA to keep updating triggerer_id since below query takes current time minus 30 seconds to query against the latest_heartbeat. This causes alive_triggerer_ids to return empty list since job_heartbeat_sec is more than 50 seconds and the current trigger running this query assigns the triggers to itself. This keeps happening moving triggers from one instance to another.

alive_triggerer_ids = [
row[0]
for row in session.query(Job.id).filter(
Job.end_date.is_(None),
Job.latest_heartbeat > timezone.utcnow() - datetime.timedelta(seconds=30),
Job.job_type == "TriggererJob",
)
]

What you think should happen instead

The heartbeat calculation should have a value based on job_heartbeat_sec rather than 30 seconds hardcoded so that queries to check triggerer processes alive are adjusted as per user settings.

How to reproduce

  1. Change job_heartbeat_sec to 50 in airflow.cfg
  2. Run 2 instances of triggerer.
  3. Make an operator that uses FileTrigger with the file being absent or any other long running task.
  4. Check triggerer_id for the trigger which keeps changing.

Operating System

Ubuntu

Versions of Apache Airflow Providers

No response

Deployment

Virtualenv installation

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions