Skip to content
68 changes: 62 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,67 @@
# PyRanker

This package is designed to benchmark the performance of different methods.
This package is designed to compare the performance of different methods.

## Algorithm

The Ranker class compares the performance of different methods based on a set of
metrics. It takes as input a dictionary of CSV files, where each file
represents a method and contains the scores for a set of subjects on a set of
metrics.

The ranking algorithm consists of the following steps:

1. **Combine CSVs and Scores**: The class first combines all the input CSV
files into a single DataFrame. This DataFrame has a hierarchical column
structure, where the top level represents the metrics and the bottom level
represents the subjects.

2. **Rank Methods**: The class then ranks the methods based on their scores for
each metric and subject. The ranking can be done using different methods,
such as 'average', 'min', 'max', 'first', or 'dense'.

3. **Handle Metric Reversal**: For metrics where lower values are better (e.g.,
error rates), the class can reverse the ranks so that lower scores get
higher ranks.

4. **Aggregate Ranks**: The class then aggregates the ranks across all metrics
for each subject to get a per-subject average rank for each method.

5. **Calculate Cumulative Rank**: The per-subject average ranks are then summed
up to get a cumulative rank for each method.

6. **Determine Final Rank**: The methods are then ranked based on their
cumulative ranks to determine the final ranking.

7. **Perform Permutation Test**: Finally, the class performs a permutation test
to determine the statistical significance of the differences in the ranks
of the methods. The permutation test is a non-parametric method that does
not make any assumptions about the distribution of the data.

The output of the Ranker class is a pair of DataFrames: one containing the
final rankings of the methods, and another containing the p-values from the
permutation test.

### Permutation Test

The permutation test is a non-parametric method for testing the statistical
significance of an observed difference between two groups. In this case, the
two groups are the ranks of two different methods.

The null hypothesis is that the two methods are equivalent, and any observed
difference in their ranks is due to chance. The alternative hypothesis is that
the two methods are not equivalent, and the observed difference in their ranks
is statistically significant.

The test works by repeatedly shuffling the ranks between the two methods and
calculating the difference in their sums. The p-value is the proportion of
permutations that result in a difference as or more extreme than the
observed difference.

## Installation

```sh
(base) user@location $> git clone https://github.com/mlcommons/PyRanker.git
(base) user@location $> git clone https://github.com/mlcommons/PyRanker.git
(base) user@location $> cd PyRanker
(base) user@PyRanker $> conda create -p ./venv python=3.12 -y
(base) user@PyRanker $> conda activate ./venv
Expand Down Expand Up @@ -41,10 +97,10 @@ This package is designed to benchmark the performance of different methods.

2. **Metrics for reversal normalization**: a comma-separated list of metrics that need to be normalized in reverse. For metrics such as [Hausdorff Distance](https://en.wikipedia.org/wiki/Hausdorff_distance) and communication cost (used in the [FeTS Challenge](https://doi.org/10.48550/arXiv.2105.05874)) which are defined as "higher is worse", PyRanker can normalize in reverse order.
- This is checked in a case-insensitive manner, so `C,F` is equivalent to `c,f`.
- The check is done by checking for the presence of the string in the metric header, rather than a "hard" check. For example, passing `hausd` **will** match `hausd*` in the metric headers, and will be case-insensitive. This is done to allow for flexibility in the metric names.
- The metric string needs to be present. For example, passing `dsc` **will not** match for `dice*` in the metric headers.
- The check is done by checking for the presence of the string in the metric header, rather than a "hard" check. For example, passing `hausd` **will** match `hausd*` in the metric headers, and will be case-insensitive. This is done to allow for flexibility in the metric names.
- The metric string needs to be present. For example, passing `dsc` **will not** match for `dice*` in the metric headers.

3. **Ranking method**: the ranking method used to rank the methods. The available options are [[ref](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rank.html#pandas-dataframe-rank)]:
3. **Ranking method**: the ranking method used to rank the methods. The available options are [[ref](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rank.html#pandas-dataframe-rank)]:
- `average` (default): average rank of the group
- `min`: lowest rank in the group
- `max`: highest rank in the group
Expand Down Expand Up @@ -73,4 +129,4 @@ To get detailed help, please run ```ranker --help```.

## Acknowledgements

This tool was partly supported by the [Informatics Technology for Cancer Research (ITCR) program](https://www.cancer.gov/about-nci/organization/cssi/research/itcr) of the [National Cancer Institute (NCI)](https://www.cancer.gov/) at the [National Institutes of Health (NIH)](https://www.nih.gov/) under award numbers [U01CA242871](https://reporter.nih.gov/search/8qcT1J34hEyj5npqmq9aEw/project-details/10009302) and [U24CA279629](https://reporter.nih.gov/search/8qcT1J34hEyj5npqmq9aEw/project-details/10932257). The content of this tool is solely the responsibility of the authors and does not represent the official views of the NIH.
This tool was partly supported by the [Informatics Technology for Cancer Research (ITCR) program](https://www.cancer.gov/about-nci/organization/cssi/research/itcr) of the [National Cancer Institute (NCI)](https://www.cancer.gov/) at the [National Institutes of Health (NIH)](https://www.nih.gov/) under award numbers [U01CA242871](https://reporter.nih.gov/search/8qcT1J34hEyj5npqmq9aEw/project-details/10009302) and [U24CA279629](https://reporter.nih.gov/search/8qcT1J34hEyj5npqmq9aEw/project-details/10932257). The content of this tool is solely the responsibility of the authors and does not represent the official views of the NIH.
14 changes: 3 additions & 11 deletions data/m1.csv
Original file line number Diff line number Diff line change
@@ -1,11 +1,3 @@
SubjectID,A,B,C,D,E,F
s001,-0.676662165,-1.406645477,0.736895876,-0.174272834,0.576927715,-0.232139845
s002,-1.182135526,0.325161174,1.265839829,0.637533468,0.717606195,-0.232249719
s003,0.393762147,-1.366917238,-1.974747205,-2.029359097,-0.91486706,-0.110356815
s004,-0.560421215,-0.916606755,-0.244361005,0.173264029,-0.018263561,-1.112137106
s005,-0.018074945,0.909978883,0.654103198,-0.412681032,0.415519864,0.415147598
s006,0.584884843,-0.365552063,-0.125284377,0.420532768,1.048717925,-0.520722918
s007,0.246445503,0.018436118,0.540072217,-0.059316335,-1.102092291,0.446401257
s008,-0.78842192,-0.634175082,0.312935264,0.272096895,-0.151559698,-2.457860693
s009,0.134775369,-0.241349035,0.711768614,-0.387514653,0.090663752,0.71284279
s010,-0.96395775,-0.663571103,0.838443773,-0.933803671,-0.722117911,-0.189414521
subjectid,A,B,C,D,E,F
s1,1,2,3,4,5,6
s2,7,8,9,10,11,12
14 changes: 3 additions & 11 deletions data/m2.csv
Original file line number Diff line number Diff line change
@@ -1,11 +1,3 @@
SubjectID,A,B,C,D,E,F
s001,-0.371449174,0.956404946,-0.959452443,-0.309927689,0.905046916,0.819083005
s002,0.935687942,0.109916076,-0.689643721,1.068025385,-1.154739305,-0.462448565
s003,-0.049420815,0.64668578,-0.318198107,0.724407035,0.583641064,-0.704724761
s004,-1.49698864,1.249697716,0.04787162,0.188726789,-0.819034985,-0.179096185
s005,2.136690703,-0.868203102,-0.78604478,0.855744592,0.857935164,0.492256653
s006,-0.355118237,0.517377129,0.928951769,0.792176927,-0.805270336,1.117546966
s007,-0.778346825,1.683369425,-0.443459427,-0.593956209,4.0971389,-0.445679171
s008,0.267208376,0.184556657,0.323158227,2.282268373,1.364794637,0.181174591
s009,-0.386538967,-0.916456619,1.271967332,-0.052378684,-1.205062795,-0.626923254
s010,0.435225064,0.91151586,-1.113652003,-0.220028617,-1.05347926,0.365272475
subjectid,A,B,C,D,E,F
s1,2,3,4,5,6,7
s2,8,9,10,11,12,13
14 changes: 3 additions & 11 deletions data/m3.csv
Original file line number Diff line number Diff line change
@@ -1,11 +1,3 @@
SubjectID,A,B,C,D,E,F
s001,-0.495294073,0.949116249,0.296072803,1.868387862,-0.272883702,-1.818801645
s002,1.216439744,0.197072557,-0.081120879,1.469343652,2.263823391,0.181492295
s003,-0.155607109,0.337023954,-0.458342088,-1.031167585,0.218811382,0.148051802
s004,-1.209131999,-0.096524866,1.197362593,-0.062309653,-0.658751113,-0.262658666
s005,0.645690766,0.899682779,-1.202114635,-0.452507338,0.178007526,-0.526872668
s006,-0.527395342,-0.585397127,0.601057827,-0.438992879,9.23E-05,2.411401279
s007,-0.781069044,-0.651766877,-0.003398167,-0.254586911,-0.048605563,1.6079838
s008,-0.005850292,1.152494476,1.064747549,-0.227608884,1.45054756,1.422734322
s009,0.796185038,-1.295533863,-0.007947827,0.624035116,-0.605764923,-0.856374829
s010,0.952854212,-1.007389474,0.686420686,1.377020745,1.221967627,-0.120206896
subjectid,A,B,C,D,E,F
s1,3,4,5,6,7,8
s2,9,10,11,12,13,14
14 changes: 3 additions & 11 deletions data/m4.csv
Original file line number Diff line number Diff line change
@@ -1,11 +1,3 @@
SubjectID,A,B,C,D,E,F
s001,0.127830235,0.543904483,0.169190618,-0.849953283,-0.563713316,0.736931479
s002,0.567418525,0.965856382,1.266015552,0.471422651,-0.758025824,-0.427404497
s003,-1.221693479,-1.121073154,-1.677648371,2.016433719,-0.087967121,-0.472855621
s004,0.954423388,-0.093452563,0.659446581,-0.190049419,-0.921771701,0.090774055
s005,0.950052283,-0.621810664,0.254520025,0.360940315,-0.483358752,-0.935151931
s006,1.455226207,-0.721900186,0.801810726,-0.641529199,0.563422873,0.772440661
s007,-1.053644931,0.098930728,0.999364504,1.029298347,-0.632529862,-1.666171306
s008,-0.671755474,0.389256225,0.697323813,-0.483432377,0.073658468,-0.233170802
s009,0.059997347,0.583152369,-1.371183183,-0.528158479,0.435198404,0.705164885
s010,-0.458500476,-1.526985622,0.370253517,0.844777527,-0.500950386,0.75340932
subjectid,A,B,C,D,E,F
s1,4,5,6,7,8,9
s2,10,11,12,13,14,15
5 changes: 5 additions & 0 deletions data/temp_output/detailed_ranks.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
method,a_s1,b_s1,c_s1,d_s1,e_s1,f_s1,a_s2,b_s2,c_s2,d_s2,e_s2,f_s2
m1,4.0,4.0,1.0,4.0,4.0,1.0,4.0,4.0,1.0,4.0,4.0,1.0
m2,3.0,3.0,2.0,3.0,3.0,2.0,3.0,3.0,2.0,3.0,3.0,2.0
m3,2.0,2.0,3.0,2.0,2.0,3.0,2.0,2.0,3.0,2.0,2.0,3.0
m4,1.0,1.0,4.0,1.0,1.0,4.0,1.0,1.0,4.0,1.0,1.0,4.0
5 changes: 5 additions & 0 deletions data/temp_output/pvals.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
method,m4,m3,m2,m1
m4,0.0,0.928,0.926,0.925
m3,0.0,0.0,0.928,0.926
m2,0.0,0.0,0.0,0.927
m1,0.0,0.0,0.0,0.0
5 changes: 5 additions & 0 deletions data/temp_output/ranks.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
method,final_rank,cumulative_rank,s1_avg_rank,s2_avg_rank,a_s1,b_s1,c_s1,d_s1,e_s1,f_s1,a_s2,b_s2,c_s2,d_s2,e_s2,f_s2
m4,1.0,4.0,2.0,2.0,1.0,1.0,4.0,1.0,1.0,4.0,1.0,1.0,4.0,1.0,1.0,4.0
m3,2.0,4.666666666666667,2.3333333333333335,2.3333333333333335,2.0,2.0,3.0,2.0,2.0,3.0,2.0,2.0,3.0,2.0,2.0,3.0
m2,3.0,5.333333333333333,2.6666666666666665,2.6666666666666665,3.0,3.0,2.0,3.0,3.0,2.0,3.0,3.0,2.0,3.0,3.0,2.0
m1,4.0,6.0,3.0,3.0,4.0,4.0,1.0,4.0,4.0,1.0,4.0,4.0,1.0,4.0,4.0,1.0
26 changes: 21 additions & 5 deletions pyranker/cli/run.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import os
from pathlib import Path
from typing import Optional

Expand Down Expand Up @@ -119,7 +120,7 @@ def __get_sorted_metrics(df: pd.DataFrame) -> list:
current_metrics = __get_sorted_metrics(current_df)
if current_metrics != metrics_base:
sanity_checks["Files_with_different_metrics"].append(filename)
except Exception as e:
except Exception:
sanity_checks["Files_that_cannot_be_read"].append(filename)

# if any of the sanity checks fail, print the problematic files and exit
Expand Down Expand Up @@ -168,7 +169,7 @@ def main(
"--iterations",
help="The number of iterations to perform for the permutation test.",
),
] = 1000,
] = 100000,
ranking_method: Annotated[
str,
typer.Option(
Expand All @@ -177,6 +178,14 @@ def main(
help="The method to use for ranking the methods; one of 'average', 'min', 'max', 'first', 'dense'.",
),
] = "average",
n_jobs: Annotated[
int,
typer.Option(
"-j",
"--n-jobs",
help="The number of CPU cores to use for parallel processing.",
),
] = 1,
version: Annotated[
Optional[bool],
typer.Option(
Expand All @@ -195,9 +204,9 @@ def main(
csvs_to_compare_with_full_path = get_csv_paths(input)

# basic sanity checks
assert (
len(csvs_to_compare_with_full_path) > 1
), "At least two methods are required for comparison"
assert len(csvs_to_compare_with_full_path) > 1, (
"At least two methods are required for comparison"
)
ranking_method = ranking_method.lower()
assert ranking_method in [
"average",
Expand All @@ -208,6 +217,11 @@ def main(
], "Invalid ranking method"
assert iterations > 0, "Number of iterations must be greater than 0"

# Assert that the number of jobs is not greater than the number of cores
assert n_jobs <= os.cpu_count(), (
"Number of jobs cannot be greater than the number of cores"
)

# convert the metrics_for_reversal to a list
metrics_for_reversal_list = (
metrics_for_reversal.split(",") if metrics_for_reversal else []
Expand All @@ -227,6 +241,8 @@ def main(
metrics_for_reversal=metrics_for_reversal_list,
n_iterations=iterations,
ranking_method=ranking_method,
n_jobs=n_jobs,
output_dir=outputdir,
)
ranks, pvals = ranker.get_rankings_and_pvals()
Path(outputdir).mkdir(parents=True, exist_ok=True)
Expand Down
Loading