Skip to content

Conversation

@hussein-awala
Copy link
Member


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:API Airflow's REST/HTTP API area:CLI area:Scheduler including HA (high availability) scheduler area:webserver Webserver related Issues labels Aug 30, 2023
@hussein-awala
Copy link
Member Author

More readable and fater:

$ python -m timeit '[1, 2, 3] + [4] + [5] + [6] + [7]'
2000000 loops, best of 5: 179 nsec per loop

$ python -m timeit '[*[1, 2, 3], 4, 5, 6, 7]'
5000000 loops, best of 5: 79 nsec per loop

# Crafting the right filter for dag_id and task_ids combo
conditions = []
for dag in self.subdags + [self]:
for dag in [*self.subdags, self]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if itertools.chain would be better here

Copy link
Member

@Lee-W Lee-W Aug 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some tests on it. It seems performance-wise [*[1, 2, 3], 4, 5, 6, 7] is the best solution

$ python -m timeit '[1, 2, 3] + [4] + [5] + [6] + [7]'
2000000 loops, best of 5: 179 nsec per loop

$ python -m timeit '[*[1, 2, 3], 4, 5, 6, 7]'
5000000 loops, best of 5: 69.2 nsec per loop

$ python -m timeit 'import itertools; itertools.chain([1, 2, 3], [4, 5, 6, 7])'
2000000 loops, best of 5: 177 nsec per loop

$ python -m timeit 'import itertools; [1, 2, 3] + [4] + [5] + [6] + [7]'
1000000 loops, best of 5: 242 nsec per loop

$ python -m timeit 'import itertools;  [*[1, 2, 3], 4, 5, 6, 7]'
2000000 loops, best of 5: 131 nsec per loop

$ python -m timeit 'import itertools; list(itertools.chain([1, 2, 3], [4, 5, 6, 7]))'
1000000 loops, best of 5: 305 nsec per loop

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A big chunk of this would come from import itertools though, which would not be relevant in Airflow since the module is already imported in a lot of places. I would not be surprised if * is still best for small lists though, since the itertools version still needs to build an additional list (of a single item).

Copy link
Member

@Lee-W Lee-W left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me if @uranusjr 's suggestion is applied

@potiuk
Copy link
Member

potiuk commented Sep 3, 2023

LGTM. I think using itertools for those is indeed over-the-top. I am fine witth the current version :)

Copy link
Member

@pierrejeambrun pierrejeambrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@uranusjr uranusjr merged commit 33e5d03 into apache:main Sep 5, 2023
@ephraimbuddy ephraimbuddy added this to the Airflow 2.7.2 milestone Oct 3, 2023
@ephraimbuddy ephraimbuddy added the type:misc/internal Changelog: Misc changes that should appear in change log label Oct 3, 2023
ephraimbuddy pushed a commit that referenced this pull request Oct 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:API Airflow's REST/HTTP API area:CLI area:Scheduler including HA (high availability) scheduler area:webserver Webserver related Issues type:misc/internal Changelog: Misc changes that should appear in change log

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants