Skip to content

Conversation

@prabhusneha
Copy link
Contributor

@prabhusneha prabhusneha commented Feb 3, 2025

Related issue: #46199

Construct run_id, when logical date is null, using run_after + random string.

@boring-cyborg boring-cyborg bot added area:API Airflow's REST/HTTP API area:Scheduler including HA (high availability) scheduler area:webserver Webserver related Issues labels Feb 3, 2025
@Lee-W Lee-W self-requested a review February 4, 2025 08:50
@Lee-W Lee-W added the legacy api Whether legacy API changes should be allowed in PR label Feb 4, 2025
Comment on lines +87 to +90
run_type=DagRunType.MANUAL,
logical_date=coerced_logical_date,
data_interval=data_interval,
run_after=data_interval.end,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point (before 3.0) I want to reduce the arguments here to just take a DagRunInfo, but that can be done separately instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to the doc, when logical date is null, then data interval end should be null. so it would not make sense to use data interval end as the run_after date. thoughts @uranusjr

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah you’re right. This should do coerced_logical_date or timezone.utcnow() (as currently implemented, coerced_logical_date can never be None, but it will be when everything is finished).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include this change as a part of this PR or have a separate PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll need to do it in this PR 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timezone.utcnow() seems to be a reasonable default for run_after

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed it here: #46616

run_type=DagRunType.ASSET_TRIGGERED,
logical_date=logical_date,
data_interval=data_interval,
run_after=max(logical_dates.values()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s a task to make asset-triggered runs have None logical_date instead, so we’ll need to change this again soon. This is good enough for now.

Comment on lines 90 to 93
data["dag_run_id"] = DagRun.generate_run_id(
DagRunType.MANUAL, timezone.parse(data["logical_date"])
DagRunType.MANUAL,
timezone.parse(data["logical_date"]),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to pass run_after here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like there is no change here, just a new line?

Copy link
Member

@uranusjr uranusjr Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and I think that’s wrong, we should change the logic here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this required here? as run_after was not added in this as part of the PR that added the new field run_after ( #46195 ).
I assumed this was going to be deprecated so wasn't added here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed it here: #46616

Comment on lines -45 to 51

def generate_run_id(self, logical_date: datetime) -> str:
def generate_run_id(self, logical_date: datetime | None, run_after: datetime | None) -> str:
if logical_date is None:
if run_after is None:
raise ValueError("run_after cannot be None")
return run_after + get_random_string()
return f"{self}__{logical_date.isoformat()}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe if this function should continue to accept one single datetime value, and we do the if-else check outside instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_after could be None as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can’t

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe if this function should continue to accept one single datetime value, and we do the if-else check outside instead.

Here we need to know if logical_date is None or not before generating a random string that gets appended to run_after.
If we take just one argument, it wouldn't know when to append the random string. Are you suggesting to move this logic into the callers(Timetable.generate_run_id and DagRun.generate_run_id)?
I went with the current implementation to avoid duplicate code/multiple function calls.

Copy link
Member

@uranusjr uranusjr Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move random string generation to DagRun.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed it here: #46616

@jedcunningham jedcunningham added the AIP-83 Remove Execution Date Unique Constraint from DAG Run label Feb 4, 2025
run_type=DagRunType.MANUAL,
logical_date=logical_date,
data_interval=data_interval,
run_after=data_interval.end,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, if logical date is null, then we won't generally have a data interval.... correct @uranusjr ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats right

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data_interval would be null as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utcnow for this as well probably?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed it here: #46616

Comment on lines +87 to +90
run_type=DagRunType.MANUAL,
logical_date=coerced_logical_date,
data_interval=data_interval,
run_after=data_interval.end,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll need to do it in this PR 🤔

Comment on lines +87 to +90
run_type=DagRunType.MANUAL,
logical_date=coerced_logical_date,
data_interval=data_interval,
run_after=data_interval.end,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timezone.utcnow() seems to be a reasonable default for run_after

Comment on lines +91 to +92
DagRunType.MANUAL,
timezone.parse(data["logical_date"]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DagRunType.MANUAL,
timezone.parse(data["logical_date"]),
run_type=DagRunType.MANUAL,
logical_date=timezone.parse(data["logical_date"]),

run_type=DagRunType.MANUAL,
logical_date=logical_date,
data_interval=data_interval,
run_after=data_interval.end,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utcnow for this as well probably?

Comment on lines +632 to +633
DagRunType.MANUAL,
info.logical_date,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DagRunType.MANUAL,
info.logical_date,
run_type=DagRunType.MANUAL,
logical_date=info.logical_date,

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed it here: #46616

@staticmethod
def generate_run_id(run_type: DagRunType, logical_date: datetime) -> str:
def generate_run_id(
run_type: DagRunType, logical_date: datetime | None, run_after: datetime | None = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
run_type: DagRunType, logical_date: datetime | None, run_after: datetime | None = None
*, run_type: DagRunType, logical_date: datetime | None, run_after: datetime | None = None

as the method becomes more complicate (hard to reason the order of logcial_date and run_after), we probably should make it keyword only

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed it here: #46616

return self.value

def generate_run_id(self, logical_date: datetime) -> str:
def generate_run_id(self, logical_date: datetime | None, run_after: datetime | None) -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def generate_run_id(self, logical_date: datetime | None, run_after: datetime | None) -> str:
def generate_run_id(self, *, logical_date: datetime | None, run_after: datetime | None) -> str:

same here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed it here: #46616

if logical_date is None:
if run_after is None:
raise ValueError("run_after cannot be None")
return run_after + get_random_string()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return run_after + get_random_string()
return f"{run_after}{get_random_string()}"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed it here: #46616

@sunank200
Copy link
Collaborator

I am doing changes in #46616 as logic has changed now.

@sunank200 sunank200 closed this Feb 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AIP-83 Remove Execution Date Unique Constraint from DAG Run area:API Airflow's REST/HTTP API area:Scheduler including HA (high availability) scheduler area:webserver Webserver related Issues legacy api Whether legacy API changes should be allowed in PR

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants