Skip to content

Conversation

@amoghrajesh
Copy link
Contributor


Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)
    Used cursor IDE

Why This Change?

Trying to assist the client-server separation by moving worker side deadline functionality to task-sdk while keeping the DB dependent evaluation logic in airflow-core. This follows the established pattern from the assets migration: #58993

What stays where?

In simpler terms, this PR tries to address this.

SDK

  • DeadlineAlert: Its the user facing class for defining deadline alerts (no serialization methods from now)
  • DeadlineReference: The same factory class for creating deadline references
  • ReferenceModels.*: Original reference implementations for backward compatibility

The principle here it to keep the lightweight DAG authoring interface with no database dependencies in sdk

Core / Serialization module

  • SerializedDeadlineAlert: Internal representation for core usages used post deserialization of a DeadlineAlert
  • SerializedReferenceModels.*: Reference implementations with database access
  • encode_deadline_alert() / decode_deadline_alert(): Centralized serialization functions used to ser/deser the deadline alerts

The principle here is to keep the serialization, deserialization, and deadline evaluation with database access in core.

Serialization Changes

Structure

Serialization format remains unchanged - no breaking changes to stored DAGs:

{
  "__type": "deadline_alert",
  "__var": {
    "reference": {"reference_type": "DagRunLogicalDateDeadline"},
    "interval": 3600.0,
    "callback": {
      "__classname__": "airflow.sdk.definitions.callback.AsyncCallback",
      "__version__": 0,
      "__data__": {"path": "...", "kwargs": {...}}
    }
  }
}

Same with the flow of control

  1. Encode (DAG Processor): DeadlineAlert → dict via encode_deadline_alert() using airflow.sdk.serde
  2. Stored as JSON in database
  3. Decode (Scheduler): dict → SerializedDeadlineAlert via decode_deadline_alert()
  4. Evaluate: SerializedReferenceModels uses database session to calculate deadlines

One thing of note is the callback serialisation, I chose to continue using serde for this purpose because BaseSerialisation cannot handle callbacks. Using serde made sense since this part of serialisation runs in dag processor, which untilmately is not a core component and can use task sdk. So, flow:

  • Uses airflow.sdk.serde.serialize() / deserialize() for proper callback handling
  • Runs in DAG Processor context where SDK is available
  • Callbacks fully serialize with path and kwargs (no string representations)

Backward Compatibility

  • Serialization format identical to main branch
  • Reference class names unchanged (e.g., DagRunLogicalDateDeadline not SerializedDagRunLogicalDateDeadline)
  • Existing serialized DAGs deserialize correctly
  • Internal API only - no user-facing changes, so nothing to worry about hopefully

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@amoghrajesh
Copy link
Contributor Author

@ferruzzi thanks for your review, I have handled your comments now, let me know how it looks.

Copy link
Contributor

@ferruzzi ferruzzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my questions. I think it looks right.

@amoghrajesh
Copy link
Contributor Author

Thanks @ferruzzi!

Just to gain some more confidence prior to merge, I tried it out functionally.

Used this custom callback:

async def custom_async_callback(**kwargs):
    context = kwargs.get("context", {})
    print(
        f"Deadline exceeded for Dag {context.get('dag_run', {}).get('dag_id')}!"
    )
    print(f"Context: {context}")
    print(f"Alert type: {kwargs.get('alert_type')}")

This DAG:

from datetime import timedelta

from deadline_callback import custom_async_callback

from airflow.providers.standard.operators.bash import BashOperator
from airflow.sdk import DAG
from airflow.sdk.definitions.deadline import AsyncCallback, DeadlineAlert, DeadlineReference

with DAG(
    dag_id="custom_deadline_alert",
    deadline=DeadlineAlert(
        reference=DeadlineReference.DAGRUN_QUEUED_AT,
        interval=timedelta(seconds=10),
        callback=AsyncCallback(
            custom_async_callback,
            kwargs={"alert_type": "time_exceeded"},
        ),
    ),
):
    BashOperator(task_id="example_task", bash_command="sleep 30")

And it works as I expect it to:

image

@amoghrajesh amoghrajesh merged commit cd3f7d3 into apache:main Jan 30, 2026
101 checks passed
@amoghrajesh amoghrajesh deleted the deadline-alerts-decouple branch January 30, 2026 06:16
subhash-0000 pushed a commit to subhash-0000/airflow that referenced this pull request Jan 30, 2026
morelgeorge pushed a commit to morelgeorge/airflow that referenced this pull request Feb 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants