Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .github/workflows/test-wheel-linux.yml
Original file line number Diff line number Diff line change
Expand Up @@ -298,3 +298,11 @@ jobs:
CUDA_PATHFINDER_TEST_FIND_NVIDIA_HEADERS_STRICTNESS: all_must_work
CUDA_PATHFINDER_TEST_FIND_NVIDIA_BITCODE_LIB_STRICTNESS: all_must_work
run: run-tests pathfinder

- name: Upload cuda.pathfinder test info summary
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I posted #1621.

Introducing these artifacts makes something simple weirdly complex. The INFO lines are a tiny fraction of the total log (see PR #1621 description for numbers).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still too noisy, and the mechanism that we are using to track this information lacks any sort of real structure. It's very ad-hoc and can't really ever be integrated into or compose well with another system.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point to a concrete, measurable problem (similar to the numbers I showed under #1621)?

The fraction of INFO lines in a typical CI log file is around 2.5%

I'm using those INFO lines regularly when working on pathfinder, or if random issues pop up. Hiding that information in disconnected artifacts would force me to puzzle together information from pieces that was perviously available at a glance.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a problem in CI anymore it seems. It's still a big code smell though. Is that measurable? Not exactly. Does it have to be to be a valid concern? Definitely not. Closing the PR.

if: ${{ !cancelled() }}
uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f # v6.0.0
with:
name: cuda-pathfinder-test-info-${{ inputs.host-platform }}-py${{ matrix.PY_VER }}-cuda${{ matrix.CUDA_VER }}-${{ matrix.LOCAL_CTK == '1' && 'local' || 'wheels' }}-${{ matrix.GPU }}
path: cuda_pathfinder/pathfinder-test-info-summary-*.txt
if-no-files-found: error
8 changes: 8 additions & 0 deletions .github/workflows/test-wheel-windows.yml
Original file line number Diff line number Diff line change
Expand Up @@ -275,3 +275,11 @@ jobs:
CUDA_PATHFINDER_TEST_FIND_NVIDIA_BITCODE_LIB_STRICTNESS: all_must_work
shell: bash --noprofile --norc -xeuo pipefail {0}
run: run-tests pathfinder

- name: Upload cuda.pathfinder test info summary
if: ${{ !cancelled() }}
uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f # v6.0.0
with:
name: cuda-pathfinder-test-info-${{ inputs.host-platform }}-py${{ matrix.PY_VER }}-cuda${{ matrix.CUDA_VER }}-${{ matrix.LOCAL_CTK == '1' && 'local' || 'wheels' }}-${{ matrix.GPU }}
path: cuda_pathfinder/pathfinder-test-info-summary-*.txt
if-no-files-found: error
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -194,5 +194,8 @@ cython_debug/
.pixi/*
!.pixi/config.toml

# Pathfinder test info log
pathfinder-test-info-summary-*.txt

# Cursor
.cursorrules
7 changes: 2 additions & 5 deletions ci/tools/run-tests
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,8 @@ if [[ "${test_module}" == "pathfinder" ]]; then
"LD:${CUDA_PATHFINDER_TEST_LOAD_NVIDIA_DYNAMIC_LIB_STRICTNESS} " \
"FH:${CUDA_PATHFINDER_TEST_FIND_NVIDIA_HEADERS_STRICTNESS} " \
"BC:${CUDA_PATHFINDER_TEST_FIND_NVIDIA_BITCODE_LIB_STRICTNESS}"
pytest -ra -s -v --durations=0 tests/ |& tee /tmp/pathfinder_test_log.txt
# Report the number of "INFO test_" lines (including zero)
# to support quick validations based on GHA log archives.
line_count=$(awk '/^INFO test_/ {count++} END {print count+0}' /tmp/pathfinder_test_log.txt)
echo "Number of \"INFO test_\" lines: $line_count"
pytest -ra -s -v --durations=0 tests/
echo "Number of \"INFO test_\" lines: $(cat pathfinder-test-info-summary-*.txt | wc -l)"
popd
elif [[ "${test_module}" == "bindings" ]]; then
echo "Installing bindings wheel"
Expand Down
48 changes: 31 additions & 17 deletions cuda_pathfinder/tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,33 +2,47 @@
# SPDX-License-Identifier: Apache-2.0


import logging
import os

import pytest

_LOGGER_NAME = "cuda_pathfinder.test_info"


def _log_filename():
strictness = os.environ.get("CUDA_PATHFINDER_TEST_LOAD_NVIDIA_DYNAMIC_LIB_STRICTNESS", "see_what_works")
return f"pathfinder-test-info-summary-{strictness}.txt"


def pytest_configure(config):
config.custom_info = []
log_path = config.rootpath / _log_filename()
log_path.unlink(missing_ok=True)


def pytest_terminal_summary(terminalreporter, exitstatus, config):
if not config.getoption("verbose"):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, what about deleting only the verbose condition, but keeping the other two? Otherwise the file will be overwritten 99 times or so?

However, we should unconditionally remove the file when the _info_summary_handler fixture runs. (I've been really confused a few times in the past by stale output, correlating it wrongly to a more recent action.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's getting overwritten 99 times?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _info_summary_handler fixture is scope="session", so it's created exactly once per pytest invocation regardless of how many times pytest-repeat re-runs the tests. The FileHandler opens the file once with mode="w" at session start and appends to it for the duration -- repeated tests just add more lines to the same open handle.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, that's basically spamming the filesystem.

return
if hasattr(config.option, "iterations"): # pytest-freethreaded runs all tests at least twice
return
if getattr(config.option, "count", 1) > 1: # pytest-repeat
return
@pytest.fixture(scope="session")
def _info_summary_handler(request):
log_path = request.config.rootpath / _log_filename()
handler = logging.FileHandler(log_path, mode="w")
handler.setFormatter(logging.Formatter("%(test_node)s: %(message)s"))

if config.custom_info:
terminalreporter.write_sep("=", "INFO summary")
for msg in config.custom_info:
terminalreporter.line(f"INFO {msg}")
logger = logging.getLogger(_LOGGER_NAME)
logger.addHandler(handler)
logger.setLevel(logging.INFO)
logger.propagate = False

yield handler

@pytest.fixture
def info_summary_append(request):
def _append(message):
request.config.custom_info.append(f"{request.node.name}: {message}")
logger.removeHandler(handler)
handler.close()

return _append

@pytest.fixture
def info_log(request, _info_summary_handler):
return logging.LoggerAdapter(
logging.getLogger(_LOGGER_NAME),
extra={"test_node": request.node.name},
)


def skip_if_missing_libnvcudla_so(libname: str, *, timeout: float) -> None:
Expand Down
6 changes: 3 additions & 3 deletions cuda_pathfinder/tests/test_driver_lib_loading.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ def test_load_lib_no_cache_does_not_dispatch_ctk_lib_to_driver_path(mocker):


@pytest.mark.parametrize("libname", sorted(_DRIVER_ONLY_LIBNAMES))
def test_real_load_driver_lib(info_summary_append, libname):
def test_real_load_driver_lib(info_log, libname):
"""Load a real driver library in a dedicated subprocess.

This complements the mock tests above: it exercises the actual OS
Expand All @@ -151,9 +151,9 @@ def raise_child_process_failed():
skip_if_missing_libnvcudla_so(libname, timeout=timeout)
if STRICTNESS == "all_must_work":
raise_child_process_failed()
info_summary_append(f"Not found: {libname=!r}")
info_log.info(f"Not found: {libname=!r}")
else:
abs_path = payload.abs_path
assert abs_path is not None
info_summary_append(f"abs_path={quote_for_shell(abs_path)}")
info_log.info(f"abs_path={quote_for_shell(abs_path)}")
assert os.path.isfile(abs_path)
6 changes: 3 additions & 3 deletions cuda_pathfinder/tests/test_find_bitcode_lib.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,17 +60,17 @@ def _located_bitcode_lib_asserts(located_bitcode_lib):

@pytest.mark.usefixtures("clear_find_bitcode_lib_cache")
@pytest.mark.parametrize("libname", SUPPORTED_BITCODE_LIBS)
def test_locate_bitcode_lib(info_summary_append, libname):
def test_locate_bitcode_lib(info_log, libname):
try:
located_lib = locate_bitcode_lib(libname)
lib_path = find_bitcode_lib(libname)
except BitcodeLibNotFoundError:
if STRICTNESS == "all_must_work":
raise
info_summary_append(f"{libname}: not found")
info_log.info(f"{libname}: not found")
return

info_summary_append(f"{lib_path=!r}")
info_log.info(f"{lib_path=!r}")
_located_bitcode_lib_asserts(located_lib)
assert os.path.isfile(lib_path)
assert lib_path == located_lib.abs_path
Expand Down
4 changes: 2 additions & 2 deletions cuda_pathfinder/tests/test_find_nvidia_binaries.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,9 @@ def test_unknown_utility_name():


@pytest.mark.parametrize("utility_name", SUPPORTED_BINARIES)
def test_find_binary_utilities(info_summary_append, utility_name):
def test_find_binary_utilities(info_log, utility_name):
bin_path = find_nvidia_binary_utility(utility_name)
info_summary_append(f"{bin_path=!r}")
info_log.info(f"{bin_path=!r}")

assert bin_path is None or os.path.isfile(bin_path)

Expand Down
8 changes: 4 additions & 4 deletions cuda_pathfinder/tests/test_find_nvidia_headers.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,12 +113,12 @@ def _fake_cudart_canary_abs_path(ctk_root: Path) -> str:


@pytest.mark.parametrize("libname", SUPPORTED_HEADERS_NON_CTK.keys())
def test_locate_non_ctk_headers(info_summary_append, libname):
def test_locate_non_ctk_headers(info_log, libname):
hdr_dir = find_nvidia_header_directory(libname)
located_hdr_dir = locate_nvidia_header_directory(libname)
assert hdr_dir is None if not located_hdr_dir else hdr_dir == located_hdr_dir.abs_path

info_summary_append(f"{hdr_dir=!r}")
info_log.info(f"{hdr_dir=!r}")
if hdr_dir:
_located_hdr_dir_asserts(located_hdr_dir)
assert os.path.isdir(hdr_dir)
Expand Down Expand Up @@ -147,12 +147,12 @@ def test_supported_headers_site_packages_ctk_consistency():


@pytest.mark.parametrize("libname", SUPPORTED_HEADERS_CTK.keys())
def test_locate_ctk_headers(info_summary_append, libname):
def test_locate_ctk_headers(info_log, libname):
hdr_dir = find_nvidia_header_directory(libname)
located_hdr_dir = locate_nvidia_header_directory(libname)
assert hdr_dir is None if not located_hdr_dir else hdr_dir == located_hdr_dir.abs_path

info_summary_append(f"{hdr_dir=!r}")
info_log.info(f"{hdr_dir=!r}")
if hdr_dir:
_located_hdr_dir_asserts(located_hdr_dir)
assert os.path.isdir(hdr_dir)
Expand Down
6 changes: 3 additions & 3 deletions cuda_pathfinder/tests/test_find_static_lib.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,17 +57,17 @@ def _located_static_lib_asserts(located_static_lib):

@pytest.mark.usefixtures("clear_find_static_lib_cache")
@pytest.mark.parametrize("libname", SUPPORTED_STATIC_LIBS)
def test_locate_static_lib(info_summary_append, libname):
def test_locate_static_lib(info_log, libname):
try:
located_lib = locate_static_lib(libname)
lib_path = find_static_lib(libname)
except StaticLibNotFoundError:
if STRICTNESS == "all_must_work":
raise
info_summary_append(f"{libname}: not found")
info_log.info(f"{libname}: not found")
return

info_summary_append(f"abs_path={quote_for_shell(lib_path)}")
info_log.info(f"abs_path={quote_for_shell(lib_path)}")
_located_static_lib_asserts(located_lib)
assert os.path.isfile(lib_path)
assert lib_path == located_lib.abs_path
Expand Down
6 changes: 3 additions & 3 deletions cuda_pathfinder/tests/test_load_nvidia_dynamic_lib.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ def _is_expected_load_nvidia_dynamic_lib_failure(libname):
"libname",
supported_nvidia_libs.SUPPORTED_WINDOWS_DLLS if IS_WINDOWS else supported_nvidia_libs.SUPPORTED_LINUX_SONAMES,
)
def test_load_nvidia_dynamic_lib(info_summary_append, libname):
def test_load_nvidia_dynamic_lib(info_log, libname):
# Use a fresh Python subprocess for each load to isolate global dynamic
# loader state and keep the tests aligned with the canary probe model.
timeout = 120 if IS_WINDOWS else 30
Expand All @@ -133,9 +133,9 @@ def raise_child_process_failed():
skip_if_missing_libnvcudla_so(libname, timeout=timeout)
if STRICTNESS == "all_must_work" and not _is_expected_load_nvidia_dynamic_lib_failure(libname):
raise_child_process_failed()
info_summary_append(f"Not found: {libname=!r}")
info_log.info(f"Not found: {libname=!r}")
else:
abs_path = payload.abs_path
assert abs_path is not None
info_summary_append(f"abs_path={quote_for_shell(abs_path)}")
info_log.info(f"abs_path={quote_for_shell(abs_path)}")
assert os.path.isfile(abs_path) # double-check the abs_path
Loading