IUCompPath · Geeks-Sid · Jun 10, 2025 · Aug 15, 2025 · Aug 15, 2025 · Aug 18, 2025
@@ -1,11 +1,67 @@
 # PyRanker
 
-This package is designed to benchmark the performance of different methods. 
+This package is designed to compare the performance of different methods.
+
+## Algorithm
+
+The Ranker class compares the performance of different methods based on a set of
+metrics. It takes as input a dictionary of CSV files, where each file
+represents a method and contains the scores for a set of subjects on a set of
+metrics.
+
+The ranking algorithm consists of the following steps:
+
+1.  **Combine CSVs and Scores**: The class first combines all the input CSV
+    files into a single DataFrame. This DataFrame has a hierarchical column
+    structure, where the top level represents the metrics and the bottom level
+    represents the subjects.
+
+2.  **Rank Methods**: The class then ranks the methods based on their scores for
+    each metric and subject. The ranking can be done using different methods,
+    such as 'average', 'min', 'max', 'first', or 'dense'.
+
+3.  **Handle Metric Reversal**: For metrics where lower values are better (e.g.,
+    error rates), the class can reverse the ranks so that lower scores get
+    higher ranks.
+
+4.  **Aggregate Ranks**: The class then aggregates the ranks across all metrics
+    for each subject to get a per-subject average rank for each method.
+
+5.  **Calculate Cumulative Rank**: The per-subject average ranks are then summed
+    up to get a cumulative rank for each method.
+
+6.  **Determine Final Rank**: The methods are then ranked based on their
+    cumulative ranks to determine the final ranking.
+
+7.  **Perform Permutation Test**: Finally, the class performs a permutation test
+    to determine the statistical significance of the differences in the ranks
+    of the methods. The permutation test is a non-parametric method that does
+    not make any assumptions about the distribution of the data.
+
+The output of the Ranker class is a pair of DataFrames: one containing the
+final rankings of the methods, and another containing the p-values from the
+permutation test.
+
+### Permutation Test
+
+The permutation test is a non-parametric method for testing the statistical
+significance of an observed difference between two groups. In this case, the
+two groups are the ranks of two different methods.
+
+The null hypothesis is that the two methods are equivalent, and any observed
+difference in their ranks is due to chance. The alternative hypothesis is that
+the two methods are not equivalent, and the observed difference in their ranks
+is statistically significant.
+
+The test works by repeatedly shuffling the ranks between the two methods and
+calculating the difference in their sums. The p-value is the proportion of
+permutations that result in a difference as or more extreme than the
+observed difference.
 
 ## Installation
 
 ```sh
-(base) user@location $> git clone https://github.com/mlcommons/PyRanker.git 
+(base) user@location $> git clone https://github.com/mlcommons/PyRanker.git
 (base) user@location $> cd PyRanker
 (base) user@PyRanker $> conda create -p ./venv python=3.12 -y
 (base) user@PyRanker $> conda activate ./venv
@@ -41,10 +97,10 @@ This package is designed to benchmark the performance of different methods.
 
 2. **Metrics for reversal normalization**: a comma-separated list of metrics that need to be normalized in reverse. For metrics such as [Hausdorff Distance](https://en.wikipedia.org/wiki/Hausdorff_distance) and communication cost (used in the [FeTS Challenge](https://doi.org/10.48550/arXiv.2105.05874)) which are defined as "higher is worse", PyRanker can normalize in reverse order.
    - This is checked in a case-insensitive manner, so `C,F` is equivalent to `c,f`.
-   - The check is done by checking for the presence of the string in the metric header, rather than a "hard" check. For example, passing `hausd` **will** match `hausd*` in the metric headers, and will be case-insensitive. This is done to allow for flexibility in the metric names. 
-   - The metric string needs to be present. For example, passing `dsc` **will not** match for `dice*` in the metric headers. 
+   - The check is done by checking for the presence of the string in the metric header, rather than a "hard" check. For example, passing `hausd` **will** match `hausd*` in the metric headers, and will be case-insensitive. This is done to allow for flexibility in the metric names.
+   - The metric string needs to be present. For example, passing `dsc` **will not** match for `dice*` in the metric headers.
 
-3. **Ranking method**: the ranking method used to rank the methods. The available options are [[ref](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rank.html#pandas-dataframe-rank)]: 
+3. **Ranking method**: the ranking method used to rank the methods. The available options are [[ref](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rank.html#pandas-dataframe-rank)]:
    - `average` (default): average rank of the group
    - `min`: lowest rank in the group
    - `max`: highest rank in the group
@@ -73,4 +129,4 @@ To get detailed help, please run ```ranker --help```.
 
 ## Acknowledgements
 
-This tool was partly supported by the [Informatics Technology for Cancer Research (ITCR) program](https://www.cancer.gov/about-nci/organization/cssi/research/itcr) of the [National Cancer Institute (NCI)](https://www.cancer.gov/) at the [National Institutes of Health (NIH)](https://www.nih.gov/) under award numbers [U01CA242871](https://reporter.nih.gov/search/8qcT1J34hEyj5npqmq9aEw/project-details/10009302) and [U24CA279629](https://reporter.nih.gov/search/8qcT1J34hEyj5npqmq9aEw/project-details/10932257). The content of this tool is solely the responsibility of the authors and does not represent the official views of the NIH.
+This tool was partly supported by the [Informatics Technology for Cancer Research (ITCR) program](https://www.cancer.gov/about-nci/organization/cssi/research/itcr) of the [National Cancer Institute (NCI)](https://www.cancer.gov/) at the [National Institutes of Health (NIH)](https://www.nih.gov/) under award numbers [U01CA242871](https://reporter.nih.gov/search/8qcT1J34hEyj5npqmq9aEw/project-details/10009302) and [U24CA279629](https://reporter.nih.gov/search/8qcT1J34hEyj5npqmq9aEw/project-details/10932257). The content of this tool is solely the responsibility of the authors and does not represent the official views of the NIH.
@@ -1,11 +1,3 @@
-SubjectID,A,B,C,D,E,F
-s001,-0.676662165,-1.406645477,0.736895876,-0.174272834,0.576927715,-0.232139845
-s002,-1.182135526,0.325161174,1.265839829,0.637533468,0.717606195,-0.232249719
-s003,0.393762147,-1.366917238,-1.974747205,-2.029359097,-0.91486706,-0.110356815
-s004,-0.560421215,-0.916606755,-0.244361005,0.173264029,-0.018263561,-1.112137106
-s005,-0.018074945,0.909978883,0.654103198,-0.412681032,0.415519864,0.415147598
-s006,0.584884843,-0.365552063,-0.125284377,0.420532768,1.048717925,-0.520722918
-s007,0.246445503,0.018436118,0.540072217,-0.059316335,-1.102092291,0.446401257
-s008,-0.78842192,-0.634175082,0.312935264,0.272096895,-0.151559698,-2.457860693
-s009,0.134775369,-0.241349035,0.711768614,-0.387514653,0.090663752,0.71284279
-s010,-0.96395775,-0.663571103,0.838443773,-0.933803671,-0.722117911,-0.189414521
+subjectid,A,B,C,D,E,F
+s1,1,2,3,4,5,6
+s2,7,8,9,10,11,12
@@ -1,11 +1,3 @@
-SubjectID,A,B,C,D,E,F
-s001,-0.371449174,0.956404946,-0.959452443,-0.309927689,0.905046916,0.819083005
-s002,0.935687942,0.109916076,-0.689643721,1.068025385,-1.154739305,-0.462448565
-s003,-0.049420815,0.64668578,-0.318198107,0.724407035,0.583641064,-0.704724761
-s004,-1.49698864,1.249697716,0.04787162,0.188726789,-0.819034985,-0.179096185
-s005,2.136690703,-0.868203102,-0.78604478,0.855744592,0.857935164,0.492256653
-s006,-0.355118237,0.517377129,0.928951769,0.792176927,-0.805270336,1.117546966
-s007,-0.778346825,1.683369425,-0.443459427,-0.593956209,4.0971389,-0.445679171
-s008,0.267208376,0.184556657,0.323158227,2.282268373,1.364794637,0.181174591
-s009,-0.386538967,-0.916456619,1.271967332,-0.052378684,-1.205062795,-0.626923254
-s010,0.435225064,0.91151586,-1.113652003,-0.220028617,-1.05347926,0.365272475
+subjectid,A,B,C,D,E,F
+s1,2,3,4,5,6,7
+s2,8,9,10,11,12,13
@@ -1,11 +1,3 @@
-SubjectID,A,B,C,D,E,F
-s001,-0.495294073,0.949116249,0.296072803,1.868387862,-0.272883702,-1.818801645
-s002,1.216439744,0.197072557,-0.081120879,1.469343652,2.263823391,0.181492295
-s003,-0.155607109,0.337023954,-0.458342088,-1.031167585,0.218811382,0.148051802
-s004,-1.209131999,-0.096524866,1.197362593,-0.062309653,-0.658751113,-0.262658666
-s005,0.645690766,0.899682779,-1.202114635,-0.452507338,0.178007526,-0.526872668
-s006,-0.527395342,-0.585397127,0.601057827,-0.438992879,9.23E-05,2.411401279
-s007,-0.781069044,-0.651766877,-0.003398167,-0.254586911,-0.048605563,1.6079838
-s008,-0.005850292,1.152494476,1.064747549,-0.227608884,1.45054756,1.422734322
-s009,0.796185038,-1.295533863,-0.007947827,0.624035116,-0.605764923,-0.856374829
-s010,0.952854212,-1.007389474,0.686420686,1.377020745,1.221967627,-0.120206896
+subjectid,A,B,C,D,E,F
+s1,3,4,5,6,7,8
+s2,9,10,11,12,13,14
@@ -1,11 +1,3 @@
-SubjectID,A,B,C,D,E,F
-s001,0.127830235,0.543904483,0.169190618,-0.849953283,-0.563713316,0.736931479
-s002,0.567418525,0.965856382,1.266015552,0.471422651,-0.758025824,-0.427404497
-s003,-1.221693479,-1.121073154,-1.677648371,2.016433719,-0.087967121,-0.472855621
-s004,0.954423388,-0.093452563,0.659446581,-0.190049419,-0.921771701,0.090774055
-s005,0.950052283,-0.621810664,0.254520025,0.360940315,-0.483358752,-0.935151931
-s006,1.455226207,-0.721900186,0.801810726,-0.641529199,0.563422873,0.772440661
-s007,-1.053644931,0.098930728,0.999364504,1.029298347,-0.632529862,-1.666171306
-s008,-0.671755474,0.389256225,0.697323813,-0.483432377,0.073658468,-0.233170802
-s009,0.059997347,0.583152369,-1.371183183,-0.528158479,0.435198404,0.705164885
-s010,-0.458500476,-1.526985622,0.370253517,0.844777527,-0.500950386,0.75340932
+subjectid,A,B,C,D,E,F
+s1,4,5,6,7,8,9
+s2,10,11,12,13,14,15
@@ -0,0 +1,5 @@
+method,a_s1,b_s1,c_s1,d_s1,e_s1,f_s1,a_s2,b_s2,c_s2,d_s2,e_s2,f_s2
+m1,4.0,4.0,1.0,4.0,4.0,1.0,4.0,4.0,1.0,4.0,4.0,1.0
+m2,3.0,3.0,2.0,3.0,3.0,2.0,3.0,3.0,2.0,3.0,3.0,2.0
+m3,2.0,2.0,3.0,2.0,2.0,3.0,2.0,2.0,3.0,2.0,2.0,3.0
+m4,1.0,1.0,4.0,1.0,1.0,4.0,1.0,1.0,4.0,1.0,1.0,4.0
@@ -0,0 +1,5 @@
+method,m4,m3,m2,m1
+m4,0.0,0.928,0.926,0.925
+m3,0.0,0.0,0.928,0.926
+m2,0.0,0.0,0.0,0.927
+m1,0.0,0.0,0.0,0.0
@@ -0,0 +1,5 @@
+method,final_rank,cumulative_rank,s1_avg_rank,s2_avg_rank,a_s1,b_s1,c_s1,d_s1,e_s1,f_s1,a_s2,b_s2,c_s2,d_s2,e_s2,f_s2
+m4,1.0,4.0,2.0,2.0,1.0,1.0,4.0,1.0,1.0,4.0,1.0,1.0,4.0,1.0,1.0,4.0
+m3,2.0,4.666666666666667,2.3333333333333335,2.3333333333333335,2.0,2.0,3.0,2.0,2.0,3.0,2.0,2.0,3.0,2.0,2.0,3.0
+m2,3.0,5.333333333333333,2.6666666666666665,2.6666666666666665,3.0,3.0,2.0,3.0,3.0,2.0,3.0,3.0,2.0,3.0,3.0,2.0
+m1,4.0,6.0,3.0,3.0,4.0,4.0,1.0,4.0,4.0,1.0,4.0,4.0,1.0,4.0,4.0,1.0
@@ -1,3 +1,4 @@
+import os
 from pathlib import Path
 from typing import Optional
 
@@ -119,7 +120,7 @@ def __get_sorted_metrics(df: pd.DataFrame) -> list:
                 current_metrics = __get_sorted_metrics(current_df)
                 if current_metrics != metrics_base:
                     sanity_checks["Files_with_different_metrics"].append(filename)
-        except Exception as e:
+        except Exception:
             sanity_checks["Files_that_cannot_be_read"].append(filename)
 
     # if any of the sanity checks fail, print the problematic files and exit
@@ -168,7 +169,7 @@ def main(
             "--iterations",
             help="The number of iterations to perform for the permutation test.",
         ),
-    ] = 1000,
+    ] = 100000,
     ranking_method: Annotated[
         str,
         typer.Option(
@@ -177,6 +178,14 @@ def main(
             help="The method to use for ranking the methods; one of 'average', 'min', 'max', 'first', 'dense'.",
         ),
     ] = "average",
+    n_jobs: Annotated[
+        int,
+        typer.Option(
+            "-j",
+            "--n-jobs",
+            help="The number of CPU cores to use for parallel processing.",
+        ),
+    ] = 1,
     version: Annotated[
         Optional[bool],
         typer.Option(
@@ -195,9 +204,9 @@ def main(
     csvs_to_compare_with_full_path = get_csv_paths(input)
 
     # basic sanity checks
-    assert (
-        len(csvs_to_compare_with_full_path) > 1
-    ), "At least two methods are required for comparison"
+    assert len(csvs_to_compare_with_full_path) > 1, (
+        "At least two methods are required for comparison"
+    )
     ranking_method = ranking_method.lower()
     assert ranking_method in [
         "average",
@@ -208,6 +217,11 @@ def main(
     ], "Invalid ranking method"
     assert iterations > 0, "Number of iterations must be greater than 0"
 
+    # Assert that the number of jobs is not greater than the number of cores
+    assert n_jobs <= os.cpu_count(), (
+        "Number of jobs cannot be greater than the number of cores"
+    )
+
     # convert the metrics_for_reversal to a list
     metrics_for_reversal_list = (
         metrics_for_reversal.split(",") if metrics_for_reversal else []
@@ -227,6 +241,8 @@ def main(
         metrics_for_reversal=metrics_for_reversal_list,
         n_iterations=iterations,
         ranking_method=ranking_method,
+        n_jobs=n_jobs,
+        output_dir=outputdir,
     )
     ranks, pvals = ranker.get_rankings_and_pvals()
     Path(outputdir).mkdir(parents=True, exist_ok=True)