Skip to content

Conversation

@mobuchowski
Copy link
Contributor

Currently _get_parsed_dag_tree uses get_tree_view which in degenerated case (like, in test test_get_dag_tree_large_dag can generate string tree representation taking multiple gigabytes of memory.

However, for what it's trying to accomplish, pretty much any temporary allocations are not necessary. This fix constructs task dependency tree without intermediate representation.

Previous memory consumption:

root@a24bae3584cb:/opt/airflow# pytest --memray tests/providers/openlineage/utils/test_utils.py::test_get_dag_tree_large_dag
=========================================================================================================================================================================== test session starts ============================================================================================================================================================================
platform linux -- Python 3.12.5, pytest-8.3.2, pluggy-1.5.0 -- /usr/local/bin/python
cachedir: .pytest_cache
rootdir: /opt/airflow
configfile: pyproject.toml
plugins: memray-1.7.0, timeouts-1.2.1, icdiff-0.9, mock-3.14.0, rerunfailures-14.0, requests-mock-1.12.1, xdist-3.6.1, asyncio-0.23.8, anyio-4.4.0, instafail-0.5.0, cov-5.0.0, time-machine-2.15.0, custom-exit-code-0.3.0
asyncio: mode=Mode.STRICT
setup timeout: 0.0s, execution timeout: 0.0s, teardown timeout: 0.0s
collected 1 item

tests/providers/openlineage/utils/test_utils.py::test_get_dag_tree_large_dag PASSED                                                                                                                                                                                                                                                                                  [100%]


============================================================================================================================================================================== MEMRAY REPORT ===============================================================================================================================================================================
Allocation results for tests/providers/openlineage/utils/test_utils.py::test_get_dag_tree_large_dag at the high watermark

	 📦 Total memory allocated: 5.4GiB
	 📏 Total allocations: 23
	 📊 Histogram of allocation sizes: |▁▁█  |
	 🥇 Biggest allocating functions:
		- _safe_get_dag_tree_view:/opt/airflow/airflow/providers/openlineage/utils/utils.py:446 -> 2.7GiB
		- get_tree_view:/opt/airflow/airflow/models/dag.py:2445 -> 2.7GiB
		- __setattr__:/opt/airflow/airflow/models/baseoperator.py:1191 -> 1.3MiB
		- __setattr__:/opt/airflow/airflow/models/baseoperator.py:1191 -> 1.3MiB
		- __setattr__:/opt/airflow/airflow/models/baseoperator.py:1191 -> 1.3MiB


=================================================================================================================================================================== Warning summary. Total: 3, Unique: 3 ===================================================================================================================================================================
airflow: total 1, unique 1
  collect: total 1, unique 1
other: total 2, unique 2
  collect: total 2, unique 2
Warnings saved into /opt/airflow/tests/warnings.txt file.
============================================================================================================================================================================ 1 passed in 8.60s =============================================================================================================================================================================

current memory consumption:

root@2788b43cb914:/opt/airflow# pytest --memray tests/providers/openlineage/utils/test_utils.py::test_get_dag_tree_large_dag
=========================================================================================================================================================================== test session starts ============================================================================================================================================================================
platform linux -- Python 3.12.5, pytest-8.3.2, pluggy-1.5.0 -- /usr/local/bin/python
cachedir: .pytest_cache
rootdir: /opt/airflow
configfile: pyproject.toml
plugins: memray-1.7.0, timeouts-1.2.1, icdiff-0.9, mock-3.14.0, rerunfailures-14.0, requests-mock-1.12.1, xdist-3.6.1, asyncio-0.23.8, anyio-4.4.0, instafail-0.5.0, cov-5.0.0, time-machine-2.15.0, custom-exit-code-0.3.0
asyncio: mode=Mode.STRICT
setup timeout: 0.0s, execution timeout: 0.0s, teardown timeout: 0.0s
collected 1 item

tests/providers/openlineage/utils/test_utils.py::test_get_dag_tree_large_dag PASSED                                                                                                                                                                                                                                                                                  [100%]


============================================================================================================================================================================== MEMRAY REPORT ===============================================================================================================================================================================
Allocation results for tests/providers/openlineage/utils/test_utils.py::test_get_dag_tree_large_dag at the high watermark

	 📦 Total memory allocated: 16.0MiB
	 📏 Total allocations: 30
	 📊 Histogram of allocation sizes: |▁█ ▃▇|
	 🥇 Biggest allocating functions:
		- __setattr__:/opt/airflow/airflow/models/baseoperator.py:1191 -> 1.3MiB
		- __setattr__:/opt/airflow/airflow/models/baseoperator.py:1191 -> 1.3MiB
		- __setattr__:/opt/airflow/airflow/models/baseoperator.py:1191 -> 1.3MiB
		- __setattr__:/opt/airflow/airflow/models/baseoperator.py:1191 -> 1.3MiB
		- __setattr__:/opt/airflow/airflow/models/baseoperator.py:1191 -> 1.3MiB


=================================================================================================================================================================== Warning summary. Total: 3, Unique: 3 ===================================================================================================================================================================
airflow: total 1, unique 1
  collect: total 1, unique 1
other: total 2, unique 2
  collect: total 2, unique 2
Warnings saved into /opt/airflow/tests/warnings.txt file.
============================================================================================================================================================================ 1 passed in 2.49s =============================================================================================================================================================================

Signed-off-by: Maciej Obuchowski <obuchowski.maciej@gmail.com>
Copy link
Contributor

@kacpermuda kacpermuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks so much better than my previous implementation 🤦 Thanks @mobuchowski 🚀 !

@mobuchowski mobuchowski merged commit 0e7c757 into apache:main Aug 15, 2024
@jlaneve
Copy link
Contributor

jlaneve commented Aug 15, 2024

Nice! Do you know if there's an issue open to track the underlying function? This fixed it for OL but not for other consumers of that function

@mobuchowski
Copy link
Contributor Author

@jlaneve just created it. #41505

Artuz37 pushed a commit to Artuz37/airflow that referenced this pull request Aug 19, 2024
…pache#41494)

Signed-off-by: Maciej Obuchowski <obuchowski.maciej@gmail.com>
romsharon98 pushed a commit to romsharon98/airflow that referenced this pull request Aug 20, 2024
…pache#41494)

Signed-off-by: Maciej Obuchowski <obuchowski.maciej@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants