Skip to content

Conversation

@jhnwu3
Copy link
Collaborator

@jhnwu3 jhnwu3 commented Dec 29, 2025

This pull request adds a new benchmarking script to the codebase for evaluating the performance of MIMIC-IV mortality prediction with varying numbers of worker processes. The script provides detailed measurements of dataset loading time, task processing time, cache sizes, and peak memory usage, supporting multiple repeats and configurations. It is designed to help analyze the scalability and resource usage of the pipeline under different parallelization settings.

Key additions and features:

Benchmarking functionality:

  • Adds a new script benchmark_workers_n.py that benchmarks the MIMIC-IV mortality prediction pipeline across different num_workers values, measuring dataset loading time, task processing time, cache sizes, and peak memory usage.
  • Supports command-line arguments for customizing worker counts, repeat runs, dataset root, cache root, memory limits, and output CSV location.
  • Reports results per run and outputs a summary table with median metrics for each worker count.

Resource and cache management:

  • Includes logic to enforce memory limits (on Unix systems) and to track peak resident set size (RSS) for the process and its children using a background thread.
  • Ensures clean cache directories for each run and removes them after measurement to avoid interference between runs and disk space growth.

Usability and reporting:

  • Outputs results to

@jhnwu3 jhnwu3 merged commit cf4aed5 into master Dec 29, 2025
1 check passed
@jhnwu3 jhnwu3 deleted the add/comp_benchmark branch December 29, 2025 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants