diff --git a/README.md b/README.md
index fa242c2..bc264a1 100644
--- a/README.md
+++ b/README.md
@@ -1,11 +1,67 @@
 # PyRanker
 
-This package is designed to benchmark the performance of different methods. 
+This package is designed to compare the performance of different methods.
+
+## Algorithm
+
+The Ranker class compares the performance of different methods based on a set of
+metrics. It takes as input a dictionary of CSV files, where each file
+represents a method and contains the scores for a set of subjects on a set of
+metrics.
+
+The ranking algorithm consists of the following steps:
+
+1.  **Combine CSVs and Scores**: The class first combines all the input CSV
+    files into a single DataFrame. This DataFrame has a hierarchical column
+    structure, where the top level represents the metrics and the bottom level
+    represents the subjects.
+
+2.  **Rank Methods**: The class then ranks the methods based on their scores for
+    each metric and subject. The ranking can be done using different methods,
+    such as 'average', 'min', 'max', 'first', or 'dense'.
+
+3.  **Handle Metric Reversal**: For metrics where lower values are better (e.g.,
+    error rates), the class can reverse the ranks so that lower scores get
+    higher ranks.
+
+4.  **Aggregate Ranks**: The class then aggregates the ranks across all metrics
+    for each subject to get a per-subject average rank for each method.
+
+5.  **Calculate Cumulative Rank**: The per-subject average ranks are then summed
+    up to get a cumulative rank for each method.
+
+6.  **Determine Final Rank**: The methods are then ranked based on their
+    cumulative ranks to determine the final ranking.
+
+7.  **Perform Permutation Test**: Finally, the class performs a permutation test
+    to determine the statistical significance of the differences in the ranks
+    of the methods. The permutation test is a non-parametric method that does
+    not make any assumptions about the distribution of the data.
+
+The output of the Ranker class is a pair of DataFrames: one containing the
+final rankings of the methods, and another containing the p-values from the
+permutation test.
+
+### Permutation Test
+
+The permutation test is a non-parametric method for testing the statistical
+significance of an observed difference between two groups. In this case, the
+two groups are the ranks of two different methods.
+
+The null hypothesis is that the two methods are equivalent, and any observed
+difference in their ranks is due to chance. The alternative hypothesis is that
+the two methods are not equivalent, and the observed difference in their ranks
+is statistically significant.
+
+The test works by repeatedly shuffling the ranks between the two methods and
+calculating the difference in their sums. The p-value is the proportion of
+permutations that result in a difference as or more extreme than the
+observed difference.
 
 ## Installation
 
 ```sh
-(base) user@location $> git clone https://github.com/mlcommons/PyRanker.git 
+(base) user@location $> git clone https://github.com/mlcommons/PyRanker.git
 (base) user@location $> cd PyRanker
 (base) user@PyRanker $> conda create -p ./venv python=3.12 -y
 (base) user@PyRanker $> conda activate ./venv
@@ -41,10 +97,10 @@ This package is designed to benchmark the performance of different methods.
 
 2. **Metrics for reversal normalization**: a comma-separated list of metrics that need to be normalized in reverse. For metrics such as [Hausdorff Distance](https://en.wikipedia.org/wiki/Hausdorff_distance) and communication cost (used in the [FeTS Challenge](https://doi.org/10.48550/arXiv.2105.05874)) which are defined as "higher is worse", PyRanker can normalize in reverse order.
    - This is checked in a case-insensitive manner, so `C,F` is equivalent to `c,f`.
-   - The check is done by checking for the presence of the string in the metric header, rather than a "hard" check. For example, passing `hausd` **will** match `hausd*` in the metric headers, and will be case-insensitive. This is done to allow for flexibility in the metric names. 
-   - The metric string needs to be present. For example, passing `dsc` **will not** match for `dice*` in the metric headers. 
+   - The check is done by checking for the presence of the string in the metric header, rather than a "hard" check. For example, passing `hausd` **will** match `hausd*` in the metric headers, and will be case-insensitive. This is done to allow for flexibility in the metric names.
+   - The metric string needs to be present. For example, passing `dsc` **will not** match for `dice*` in the metric headers.
 
-3. **Ranking method**: the ranking method used to rank the methods. The available options are [[ref](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rank.html#pandas-dataframe-rank)]: 
+3. **Ranking method**: the ranking method used to rank the methods. The available options are [[ref](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rank.html#pandas-dataframe-rank)]:
    - `average` (default): average rank of the group
    - `min`: lowest rank in the group
    - `max`: highest rank in the group
@@ -73,4 +129,4 @@ To get detailed help, please run ```ranker --help```.
 
 ## Acknowledgements
 
-This tool was partly supported by the [Informatics Technology for Cancer Research (ITCR) program](https://www.cancer.gov/about-nci/organization/cssi/research/itcr) of the [National Cancer Institute (NCI)](https://www.cancer.gov/) at the [National Institutes of Health (NIH)](https://www.nih.gov/) under award numbers [U01CA242871](https://reporter.nih.gov/search/8qcT1J34hEyj5npqmq9aEw/project-details/10009302) and [U24CA279629](https://reporter.nih.gov/search/8qcT1J34hEyj5npqmq9aEw/project-details/10932257). The content of this tool is solely the responsibility of the authors and does not represent the official views of the NIH.
+This tool was partly supported by the [Informatics Technology for Cancer Research (ITCR) program](https://www.cancer.gov/about-nci/organization/cssi/research/itcr) of the [National Cancer Institute (NCI)](https://www.cancer.gov/) at the [National Institutes of Health (NIH)](https://www.nih.gov/) under award numbers [U01CA242871](https://reporter.nih.gov/search/8qcT1J34hEyj5npqmq9aEw/project-details/10009302) and [U24CA279629](https://reporter.nih.gov/search/8qcT1J34hEyj5npqmq9aEw/project-details/10932257). The content of this tool is solely the responsibility of the authors and does not represent the official views of the NIH.
\ No newline at end of file
diff --git a/data/m1.csv b/data/m1.csv
index e23342c..1dbbb7c 100644
--- a/data/m1.csv
+++ b/data/m1.csv
@@ -1,11 +1,3 @@
-SubjectID,A,B,C,D,E,F
-s001,-0.676662165,-1.406645477,0.736895876,-0.174272834,0.576927715,-0.232139845
-s002,-1.182135526,0.325161174,1.265839829,0.637533468,0.717606195,-0.232249719
-s003,0.393762147,-1.366917238,-1.974747205,-2.029359097,-0.91486706,-0.110356815
-s004,-0.560421215,-0.916606755,-0.244361005,0.173264029,-0.018263561,-1.112137106
-s005,-0.018074945,0.909978883,0.654103198,-0.412681032,0.415519864,0.415147598
-s006,0.584884843,-0.365552063,-0.125284377,0.420532768,1.048717925,-0.520722918
-s007,0.246445503,0.018436118,0.540072217,-0.059316335,-1.102092291,0.446401257
-s008,-0.78842192,-0.634175082,0.312935264,0.272096895,-0.151559698,-2.457860693
-s009,0.134775369,-0.241349035,0.711768614,-0.387514653,0.090663752,0.71284279
-s010,-0.96395775,-0.663571103,0.838443773,-0.933803671,-0.722117911,-0.189414521
+subjectid,A,B,C,D,E,F
+s1,1,2,3,4,5,6
+s2,7,8,9,10,11,12
\ No newline at end of file
diff --git a/data/m2.csv b/data/m2.csv
index 059996b..e93738f 100644
--- a/data/m2.csv
+++ b/data/m2.csv
@@ -1,11 +1,3 @@
-SubjectID,A,B,C,D,E,F
-s001,-0.371449174,0.956404946,-0.959452443,-0.309927689,0.905046916,0.819083005
-s002,0.935687942,0.109916076,-0.689643721,1.068025385,-1.154739305,-0.462448565
-s003,-0.049420815,0.64668578,-0.318198107,0.724407035,0.583641064,-0.704724761
-s004,-1.49698864,1.249697716,0.04787162,0.188726789,-0.819034985,-0.179096185
-s005,2.136690703,-0.868203102,-0.78604478,0.855744592,0.857935164,0.492256653
-s006,-0.355118237,0.517377129,0.928951769,0.792176927,-0.805270336,1.117546966
-s007,-0.778346825,1.683369425,-0.443459427,-0.593956209,4.0971389,-0.445679171
-s008,0.267208376,0.184556657,0.323158227,2.282268373,1.364794637,0.181174591
-s009,-0.386538967,-0.916456619,1.271967332,-0.052378684,-1.205062795,-0.626923254
-s010,0.435225064,0.91151586,-1.113652003,-0.220028617,-1.05347926,0.365272475
+subjectid,A,B,C,D,E,F
+s1,2,3,4,5,6,7
+s2,8,9,10,11,12,13
\ No newline at end of file
diff --git a/data/m3.csv b/data/m3.csv
index 9bf468c..d60cbd7 100644
--- a/data/m3.csv
+++ b/data/m3.csv
@@ -1,11 +1,3 @@
-SubjectID,A,B,C,D,E,F
-s001,-0.495294073,0.949116249,0.296072803,1.868387862,-0.272883702,-1.818801645
-s002,1.216439744,0.197072557,-0.081120879,1.469343652,2.263823391,0.181492295
-s003,-0.155607109,0.337023954,-0.458342088,-1.031167585,0.218811382,0.148051802
-s004,-1.209131999,-0.096524866,1.197362593,-0.062309653,-0.658751113,-0.262658666
-s005,0.645690766,0.899682779,-1.202114635,-0.452507338,0.178007526,-0.526872668
-s006,-0.527395342,-0.585397127,0.601057827,-0.438992879,9.23E-05,2.411401279
-s007,-0.781069044,-0.651766877,-0.003398167,-0.254586911,-0.048605563,1.6079838
-s008,-0.005850292,1.152494476,1.064747549,-0.227608884,1.45054756,1.422734322
-s009,0.796185038,-1.295533863,-0.007947827,0.624035116,-0.605764923,-0.856374829
-s010,0.952854212,-1.007389474,0.686420686,1.377020745,1.221967627,-0.120206896
+subjectid,A,B,C,D,E,F
+s1,3,4,5,6,7,8
+s2,9,10,11,12,13,14
\ No newline at end of file
diff --git a/data/m4.csv b/data/m4.csv
index 36e69eb..05cc7a1 100644
--- a/data/m4.csv
+++ b/data/m4.csv
@@ -1,11 +1,3 @@
-SubjectID,A,B,C,D,E,F
-s001,0.127830235,0.543904483,0.169190618,-0.849953283,-0.563713316,0.736931479
-s002,0.567418525,0.965856382,1.266015552,0.471422651,-0.758025824,-0.427404497
-s003,-1.221693479,-1.121073154,-1.677648371,2.016433719,-0.087967121,-0.472855621
-s004,0.954423388,-0.093452563,0.659446581,-0.190049419,-0.921771701,0.090774055
-s005,0.950052283,-0.621810664,0.254520025,0.360940315,-0.483358752,-0.935151931
-s006,1.455226207,-0.721900186,0.801810726,-0.641529199,0.563422873,0.772440661
-s007,-1.053644931,0.098930728,0.999364504,1.029298347,-0.632529862,-1.666171306
-s008,-0.671755474,0.389256225,0.697323813,-0.483432377,0.073658468,-0.233170802
-s009,0.059997347,0.583152369,-1.371183183,-0.528158479,0.435198404,0.705164885
-s010,-0.458500476,-1.526985622,0.370253517,0.844777527,-0.500950386,0.75340932
+subjectid,A,B,C,D,E,F
+s1,4,5,6,7,8,9
+s2,10,11,12,13,14,15
\ No newline at end of file
diff --git a/data/temp_output/detailed_ranks.csv b/data/temp_output/detailed_ranks.csv
new file mode 100644
index 0000000..6648b3a
--- /dev/null
+++ b/data/temp_output/detailed_ranks.csv
@@ -0,0 +1,5 @@
+method,a_s1,b_s1,c_s1,d_s1,e_s1,f_s1,a_s2,b_s2,c_s2,d_s2,e_s2,f_s2
+m1,4.0,4.0,1.0,4.0,4.0,1.0,4.0,4.0,1.0,4.0,4.0,1.0
+m2,3.0,3.0,2.0,3.0,3.0,2.0,3.0,3.0,2.0,3.0,3.0,2.0
+m3,2.0,2.0,3.0,2.0,2.0,3.0,2.0,2.0,3.0,2.0,2.0,3.0
+m4,1.0,1.0,4.0,1.0,1.0,4.0,1.0,1.0,4.0,1.0,1.0,4.0
diff --git a/data/temp_output/pvals.csv b/data/temp_output/pvals.csv
new file mode 100644
index 0000000..514c997
--- /dev/null
+++ b/data/temp_output/pvals.csv
@@ -0,0 +1,5 @@
+method,m4,m3,m2,m1
+m4,0.0,0.928,0.926,0.925
+m3,0.0,0.0,0.928,0.926
+m2,0.0,0.0,0.0,0.927
+m1,0.0,0.0,0.0,0.0
diff --git a/data/temp_output/ranks.csv b/data/temp_output/ranks.csv
new file mode 100644
index 0000000..e555038
--- /dev/null
+++ b/data/temp_output/ranks.csv
@@ -0,0 +1,5 @@
+method,final_rank,cumulative_rank,s1_avg_rank,s2_avg_rank,a_s1,b_s1,c_s1,d_s1,e_s1,f_s1,a_s2,b_s2,c_s2,d_s2,e_s2,f_s2
+m4,1.0,4.0,2.0,2.0,1.0,1.0,4.0,1.0,1.0,4.0,1.0,1.0,4.0,1.0,1.0,4.0
+m3,2.0,4.666666666666667,2.3333333333333335,2.3333333333333335,2.0,2.0,3.0,2.0,2.0,3.0,2.0,2.0,3.0,2.0,2.0,3.0
+m2,3.0,5.333333333333333,2.6666666666666665,2.6666666666666665,3.0,3.0,2.0,3.0,3.0,2.0,3.0,3.0,2.0,3.0,3.0,2.0
+m1,4.0,6.0,3.0,3.0,4.0,4.0,1.0,4.0,4.0,1.0,4.0,4.0,1.0,4.0,4.0,1.0
diff --git a/pyranker/cli/run.py b/pyranker/cli/run.py
index 5d11c58..84e6dad 100644
--- a/pyranker/cli/run.py
+++ b/pyranker/cli/run.py
@@ -1,3 +1,4 @@
+import os
 from pathlib import Path
 from typing import Optional
 
@@ -119,7 +120,7 @@ def __get_sorted_metrics(df: pd.DataFrame) -> list:
                 current_metrics = __get_sorted_metrics(current_df)
                 if current_metrics != metrics_base:
                     sanity_checks["Files_with_different_metrics"].append(filename)
-        except Exception as e:
+        except Exception:
             sanity_checks["Files_that_cannot_be_read"].append(filename)
 
     # if any of the sanity checks fail, print the problematic files and exit
@@ -168,7 +169,7 @@ def main(
             "--iterations",
             help="The number of iterations to perform for the permutation test.",
         ),
-    ] = 1000,
+    ] = 100000,
     ranking_method: Annotated[
         str,
         typer.Option(
@@ -177,6 +178,14 @@ def main(
             help="The method to use for ranking the methods; one of 'average', 'min', 'max', 'first', 'dense'.",
         ),
     ] = "average",
+    n_jobs: Annotated[
+        int,
+        typer.Option(
+            "-j",
+            "--n-jobs",
+            help="The number of CPU cores to use for parallel processing.",
+        ),
+    ] = 1,
     version: Annotated[
         Optional[bool],
         typer.Option(
@@ -195,9 +204,9 @@ def main(
     csvs_to_compare_with_full_path = get_csv_paths(input)
 
     # basic sanity checks
-    assert (
-        len(csvs_to_compare_with_full_path) > 1
-    ), "At least two methods are required for comparison"
+    assert len(csvs_to_compare_with_full_path) > 1, (
+        "At least two methods are required for comparison"
+    )
     ranking_method = ranking_method.lower()
     assert ranking_method in [
         "average",
@@ -208,6 +217,11 @@ def main(
     ], "Invalid ranking method"
     assert iterations > 0, "Number of iterations must be greater than 0"
 
+    # Assert that the number of jobs is not greater than the number of cores
+    assert n_jobs <= os.cpu_count(), (
+        "Number of jobs cannot be greater than the number of cores"
+    )
+
     # convert the metrics_for_reversal to a list
     metrics_for_reversal_list = (
         metrics_for_reversal.split(",") if metrics_for_reversal else []
@@ -227,6 +241,8 @@ def main(
         metrics_for_reversal=metrics_for_reversal_list,
         n_iterations=iterations,
         ranking_method=ranking_method,
+        n_jobs=n_jobs,
+        output_dir=outputdir,
     )
     ranks, pvals = ranker.get_rankings_and_pvals()
     Path(outputdir).mkdir(parents=True, exist_ok=True)
diff --git a/pyranker/ranker.py b/pyranker/ranker.py
index 6bfc27e..9d7817e 100644
--- a/pyranker/ranker.py
+++ b/pyranker/ranker.py
@@ -1,202 +1,390 @@
+import logging
+import os
+from concurrent.futures import ProcessPoolExecutor, as_completed
 from typing import Dict, List, Tuple
-import pandas as pd
+
 import numpy as np
+import pandas as pd
 from tqdm import tqdm
 
+# Set up a global logger
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
+    filename="ranker.log",
+    filemode="w",
+)
+logger = logging.getLogger(__name__)
+
+
+# This worker function is defined at the top level so it can be pickled
+# and sent to other processes by the ProcessPoolExecutor.
+def _calculate_pval_for_pair(
+    arr_i: np.ndarray,
+    arr_j: np.ndarray,
+    n_iterations: int,
+    log_permutations: bool = False,
+) -> float:
+    """
+    Performs the permutation test for a single pair of rank arrays.
+
+    The permutation test is a non-parametric method for testing the statistical
+    significance of an observed difference between two groups. In this case, the
+    two groups are the ranks of two different methods.
+
+    The null hypothesis is that the two methods are equivalent, and any observed
+    difference in their ranks is due to chance. The alternative hypothesis is that
+    the two methods are not equivalent, and the observed difference in their ranks
+    is statistically significant.
+
+    The test works by repeatedly shuffling the ranks between the two methods and
+    calculating the difference in their sums. The p-value is the proportion of
+    permutations that result in a difference as or more extreme than the
+    observed difference.
+
+    Args:
+        arr_i (np.ndarray): Rank array for the first method.
+        arr_j (np.ndarray): Rank array for the second method.
+        n_iterations (int): The number of permutation iterations.
+        log_permutations (bool, optional): Whether to log detailed permutation
+            information. Defaults to False.
+
+    Returns:
+        float: The calculated p-value.
+    """
+    # Use the difference for a one-sided test
+    observed_diff = arr_i.sum() - arr_j.sum()
+    if log_permutations:
+        print(f"Observed difference in sums: {observed_diff}")
+    count_extreme = 0
+
+    # Create a local random number generator for thread-safety
+    rng = np.random.default_rng()
+
+    # Perform the permutation test
+    for i in range(n_iterations):
+        # Generate a random permutation mask
+        r = rng.integers(0, 2, size=arr_i.shape, dtype=bool)
+
+        # Create a copy of the ranks
+        arr1_rand = arr_i.copy()
+        arr2_rand = arr_j.copy()
+
+        # Swap the ranks based on the random permutation
+        arr1_rand[r], arr2_rand[r] = arr_j[r], arr_i[r]
+
+        # Calculate the difference in ranks for the random permutation
+        permuted_diff = arr1_rand.sum() - arr2_rand.sum()
+        if log_permutations:
+            print(
+                f"Permutation {i + 1}/{n_iterations} | Permuted diff: {permuted_diff}"
+            )
+
+        # Check if the permuted difference is as or more extreme
+        if permuted_diff <= observed_diff:
+            if log_permutations:
+                print(
+                    f"Permutation {i + 1} is more extreme: {permuted_diff} <= {observed_diff}"
+                )
+            count_extreme += 1
+
+    # Calculate the p-value using the standard formula for permutation tests,
+    # which adds 1 to both the numerator and denominator to avoid p-values of 0.
+    pval = (count_extreme + 1) / (n_iterations + 1)
+    if log_permutations:
+        print(f"Final p-value: {pval}")
+    return pval
+
 
 class Ranker:
+    """
+    The Ranker class compares the performance of different methods based on a set of
+    metrics. It takes as input a dictionary of CSV files, where each file
+    represents a method and contains the scores for a set of subjects on a set of
+    metrics.
+
+    The ranking algorithm consists of the following steps:
+
+    1.  **Combine CSVs and Scores**: The class first combines all the input CSV
+        files into a single DataFrame. This DataFrame has a hierarchical column
+        structure, where the top level represents the metrics and the bottom level
+        represents the subjects.
+
+    2.  **Rank Methods**: The class then ranks the methods based on their scores for
+        each metric and subject. The ranking can be done using different methods,
+        such as 'average', 'min', 'max', 'first', or 'dense'.
+
+    3.  **Handle Metric Reversal**: For metrics where lower values are better (e.g.,
+        error rates), the class can reverse the ranks so that lower scores get
+        higher ranks.
+
+    4.  **Aggregate Ranks**: The class then aggregates the ranks across all metrics
+        for each subject to get a per-subject average rank for each method.
+
+    5.  **Calculate Cumulative Rank**: The per-subject average ranks are then summed
+        up to get a cumulative rank for each method.
+
+    6.  **Determine Final Rank**: The methods are then ranked based on their
+        cumulative ranks to determine the final ranking.
+
+    7.  **Perform Permutation Test**: Finally, the class performs a permutation test
+        to determine the statistical significance of the differences in the ranks
+        of the methods. The permutation test is a non-parametric method that does
+        not make any assumptions about the distribution of the data.
+
+    The output of the Ranker class is a pair of DataFrames: one containing the
+    final rankings of the methods, and another containing the p-values from the
+    permutation test.
+    """
+
     def __init__(
         self,
         input_csvs_to_compare: Dict[str, str],
         metrics_for_reversal: List[str],
         n_iterations: int = 1000,
         ranking_method: str = "average",
+        n_jobs: int = 4,
+        output_dir: str = ".",
+        detailed_ranks_csv_name: str = "detailed_ranks.csv",
+        log_permutations: bool = False,
     ) -> None:
         """
-        Ranker class to compare the scores of different methods.
+        Initializes the Ranker class.
 
         Args:
-            input_csvs_to_compare (Dict[str, str]): A dictionary with the key being the method name and the value being the path to the CSV file.
-            metrics_for_reversal (List[str]): The metrics for which the reversal should be calculated.
-            n_iterations (int): The number of iterations to perform for the permutation test.
-            ranking_method (str): The method to use for ranking the methods.
+            input_csvs_to_compare (Dict[str, str]): A dictionary where the keys are
+                the method names and the values are the paths to the CSV files
+                containing the scores for each method.
+            metrics_for_reversal (List[str]): A list of metrics for which the
+                ranks should be reversed (i.e., lower values are better).
+            n_iterations (int, optional): The number of iterations to perform for
+                the permutation test. Defaults to 1000.
+            ranking_method (str, optional): The method to use for ranking the
+                methods. Can be one of 'average', 'min', 'max', 'first', or
+                'dense'. Defaults to "average".
+            n_jobs (int, optional): The number of CPU cores to use for parallel
+                processing. Defaults to 4.
+            output_dir (str, optional): The directory where the output files will
+                be saved. Defaults to ".".
+            detailed_ranks_csv_name (str, optional): The name of the CSV file
+                where the detailed ranks will be saved. Defaults to
+                "detailed_ranks.csv".
+            log_permutations (bool, optional): Whether to log detailed permutation
+                information. Defaults to False.
         """
         self.input_csvs_to_compare = input_csvs_to_compare
         self.metrics_for_reversal = metrics_for_reversal
         self.n_iterations = n_iterations
         self.ranking_method = ranking_method
+        self.output_dir = output_dir
+        self.detailed_ranks_csv_name = detailed_ranks_csv_name
+        self.log_permutations = log_permutations
 
-        # dict of lists with the key being ${metric}_${subjectid} and value being the list of scores
-        self.combined_scores_per_subject = {}
+        if n_jobs == -1:
+            self.n_jobs = os.cpu_count()
+        else:
+            self.n_jobs = n_jobs
 
+        os.makedirs(self.output_dir, exist_ok=True)
+        self.detailed_rank_columns = []
+        self.all_subject_ids = set()
+        print("Ranker initialized.")
         self.combine_csvs_and_scores()
 
     def combine_csvs_and_scores(self) -> None:
         """
-        Combine the CSVs and scores of the methods.
-        """
-        self.combined_scores_per_subject["method"] = []
+        Combines the input CSV files into a single DataFrame.
 
-        # create a dataframe to store the metrics per subject
+        This method reads each CSV file, converts the column names to lowercase,
+        and then flattens the DataFrame so that each row represents a method and
+        each column represents a metric-subject combination.
+        """
+        print("Combining CSVs and scores...")
         self.metrics_per_subject = pd.DataFrame()
 
         for method in self.input_csvs_to_compare:
-            self.combined_scores_per_subject["method"].append(method)
+            print(f"Processing method: {method}")
             current_df = pd.read_csv(self.input_csvs_to_compare[method])
-            # ensure all columns are lowercase to avoid case sensitivity
             current_df.columns = current_df.columns.str.lower()
-
-            # sort along subjectid column to ensure that metrics are in the same order
+            self.all_subject_ids.update(current_df["subjectid"].unique())
             current_df = current_df.sort_values(by="subjectid")
-
-            # sort metrics columns to ensure that metrics are in the same order
-            metrics_columns = current_df.columns.tolist()
-            metrics_columns.remove("subjectid")
+            metrics_columns = [col for col in current_df.columns if col != "subjectid"]
             metrics_columns.sort()
             current_df = current_df[["subjectid"] + metrics_columns]
-
-            # convert to a single row df with unique column names based on subjectid column
             current_df_flattened = {"method": method}
             for _, row in current_df.iterrows():
-                for metric in current_df.columns:
-                    if metric != "subjectid":
-                        current_df_flattened[f"{metric}_{row['subjectid']}"] = row[
-                            metric
-                        ]
-
-            # convert to a dataframe and append to the metrics_per_subject dataframe
+                for metric in metrics_columns:
+                    score = row[metric]
+                    if not pd.api.types.is_number(score):
+                        error_msg = (
+                            f"Invalid score for method '{method}', subject '{row['subjectid']}', "
+                            f"metric '{metric}'. Expected a number, but got '{score}'."
+                        )
+                        logger.error(error_msg)
+                        raise ValueError(error_msg)
+                    current_df_flattened[f"{metric}_{row['subjectid']}"] = score
             current_df_flattened = pd.DataFrame(current_df_flattened, index=[0])
             self.metrics_per_subject = pd.concat(
                 [self.metrics_per_subject, current_df_flattened], axis=0
-            )
+            ).reset_index(drop=True)
 
+        print("Finished combining CSVs and scores.")
         self.rank_methods()
 
     def rank_methods(self) -> None:
         """
-        Rank the methods based on the metrics.
+        Ranks the methods based on the metrics using a two-step aggregation process.
+
+        First, it ranks the methods for each metric and subject combination.
+        Then, it calculates the cumulative rank for each method across all metrics
+        for each subject. Finally, it sums up the per-subject cumulative ranks to
+        get a total cumulative rank for each method, which is then used to determine
+        the final ranking.
         """
-        # calculate rank per metric
-        self.ranks_per_metric = self.metrics_per_subject.rank(
+        print("Ranking methods...")
+        # Rank the methods for each metric-subject combination
+        # Use ascending=False so that higher values get better ranks (original logic)
+        ranks_per_metric_detailed = self.metrics_per_subject.rank(
             method=self.ranking_method, ascending=False, numeric_only=True
         )
 
-        # ensure all metrics are lowercase to avoid case sensitivity
+        # Reverse the ranks for the specified metrics
         metrics_for_reversal_lower = [x.lower() for x in self.metrics_for_reversal]
-
-        # reverse the ranks for the metrics that need reversal
         for metric in metrics_for_reversal_lower:
-            for column in self.ranks_per_metric.columns:
+            for column in ranks_per_metric_detailed.columns:
                 if metric in column:
-                    self.ranks_per_metric[column] = (
-                        self.ranks_per_metric[column].max()
+                    print(f"Reversing ranks for metric: {metric} in column: {column}")
+                    ranks_per_metric_detailed[column] = (
+                        ranks_per_metric_detailed[column].max()
                         + 1
-                        - self.ranks_per_metric[column]
+                        - ranks_per_metric_detailed[column]
                     )
+        self.detailed_rank_columns = ranks_per_metric_detailed.columns.tolist()
 
-        # calculate cumulative rank by summing the ranks of all metrics and dividing by the number of metrics
-        cumulative_rank_column = self.ranks_per_metric.sum(axis=1) / len(
-            self.ranks_per_metric.columns
-        )
+        # Save the detailed ranks to a CSV file for verification
+        verification_df = ranks_per_metric_detailed.copy()
+        verification_df.insert(0, "method", self.metrics_per_subject["method"])
+        verification_path = os.path.join(self.output_dir, self.detailed_ranks_csv_name)
+        verification_df.to_csv(verification_path, index=False)
+        print(f"Saved detailed verification ranks to: {verification_path}")
+
+        # Create a dictionary to hold the per-subject cumulative ranks
+        subject_cumulative_rank_data = {}
+        subject_ids_sorted = sorted(list(self.all_subject_ids))
+
+        # Calculate the cumulative rank for each method for each subject
+        for subject in subject_ids_sorted:
+            subject_cols = [
+                col for col in self.detailed_rank_columns if col.endswith(f"_{subject}")
+            ]
+            if subject_cols:
+                subject_cumulative_rank_data[f"{subject}_cumulative_rank"] = (
+                    ranks_per_metric_detailed[subject_cols].sum(axis=1)
+                )
+
+        # Create a DataFrame from the dictionary of per-subject cumulative ranks
+        per_subject_cumulative_ranks = pd.DataFrame(subject_cumulative_rank_data)
+        self.per_subject_cumulative_ranks = per_subject_cumulative_ranks
+        # Calculate the cumulative and final ranks
+        cumulative_rank_column = per_subject_cumulative_ranks.sum(axis=1)
         final_rank_column = cumulative_rank_column.rank(
             method="average", ascending=True
         )
-        # combine cumulative_rank_column, final_rank_column, and method column to the ranks_per_metric dataframe
+
+        # Combine all the rank information into a single DataFrame
         self.ranks_per_metric = pd.concat(
             [
-                self.ranks_per_metric,
-                cumulative_rank_column.rename("cumulative_rank"),
-                final_rank_column.rename("final_rank"),
                 self.metrics_per_subject["method"],
+                final_rank_column.rename("final_rank"),
+                cumulative_rank_column.rename("cumulative_rank"),
+                per_subject_cumulative_ranks,
             ],
             axis=1,
         )
-
-        # reorder columns to put method, final_rank, cumulative_rank in the beginning
-        self.ranks_per_metric = self.ranks_per_metric[
-            ["method", "final_rank", "cumulative_rank"]
-            + [
-                col
-                for col in self.ranks_per_metric.columns
-                if col not in ["method", "final_rank", "cumulative_rank"]
-            ]
-        ]
-
+        print("Finished ranking methods.")
+        # Perform the permutation test to determine the statistical significance
         self.perform_permutation_test()
 
     def perform_permutation_test(self) -> None:
         """
-        Perform permutation test to determine the significance of the ranks.
+        Performs a permutation test to determine the statistical significance of the
+        ranks. This test is performed on the detailed rank data.
         """
+        print("Performing permutation test...")
         n_methods = len(self.ranks_per_metric)
         self.pvals = np.zeros((n_methods, n_methods))
-
-        # sort in order of cumulative rank and reset index in one step
         ranks_per_metric_sorted = self.ranks_per_metric.sort_values(
             by="cumulative_rank"
         ).reset_index(drop=True)
 
-        ranks_per_metric_sanitized = ranks_per_metric_sorted.drop(
-            columns=["method", "cumulative_rank", "final_rank"]
-        )
+        # Select only the detailed rank columns for the test
+        ranks_per_metric_sanitized = ranks_per_metric_sorted[
+            self.per_subject_cumulative_ranks.columns
+        ]
 
-        for i in tqdm(range(n_methods), desc="Permutation test"):
-            for j in range(i + 1, n_methods):
-                # get the ranks for the two methods
-                arr_i = ranks_per_metric_sanitized.iloc[i].to_numpy()
-                arr_j = ranks_per_metric_sanitized.iloc[j].to_numpy()
-                
-                # BUG FIX: Use the absolute difference for a two-sided test
-                observed_diff = abs(arr_i.sum() - arr_j.sum())
-                
-                count_extreme = 0
-
-                # perform the permutation test
-                for it in range(self.n_iterations):
-                    # generate a random permutation mask
-                    r = np.random.randint(0, 2, size=arr_i.shape, dtype=bool)
-
-                    # create a copy of the ranks
-                    arr1_rand = arr_i.copy()
-                    arr2_rand = arr_j.copy()
-
-                    # swap the ranks based on the random permutation
-                    # Note: Using boolean indexing is cleaner and often faster
-                    arr1_rand[r], arr2_rand[r] = arr_j[r], arr_i[r]
-
-                    # calculate the difference in ranks for the random permutation
-                    permuted_diff = abs(arr1_rand.sum() - arr2_rand.sum())
-                    
-                    # BUG FIX: Check if the permuted difference is as or more extreme
-                    if permuted_diff >= observed_diff:
-                        count_extreme += 1
-
-                # Check if count_extreme is still zero, which would create pval=0.
-                # A p-value of 0 implies absolute certainty, which is unrealistic given the finite
-                # number of permutations. To avoid this, we adjust count_extreme to ensure a 
-                # conservative estimate of the p-value, aligning with standard statistical practices.
-                if count_extreme == 0:
-                    count_extreme += 1
-                # calculate the p-value
-                pval = count_extreme / self.n_iterations
-                self.pvals[i, j] = pval
-                # The p-value is symmetric
-                self.pvals[j, i] = pval
-
-        # create a dataframe from the pvals
+        # Use a process pool to parallelize the p-value calculations
+        with ProcessPoolExecutor(max_workers=self.n_jobs) as executor:
+            future_to_indices = {}
+            for i in range(n_methods):
+                for j in range(i + 1, n_methods):
+                    arr_i = ranks_per_metric_sanitized.iloc[i].to_numpy()
+                    arr_j = ranks_per_metric_sanitized.iloc[j].to_numpy()
+                    future = executor.submit(
+                        _calculate_pval_for_pair,
+                        arr_i,
+                        arr_j,
+                        self.n_iterations,
+                        self.log_permutations,
+                    )
+                    future_to_indices[future] = (i, j)
+
+            # Show a progress bar for the permutation test
+            pbar = tqdm(
+                as_completed(future_to_indices),
+                total=len(future_to_indices),
+                desc="Permutation test",
+            )
+            for future in pbar:
+                i, j = future_to_indices[future]
+                try:
+                    pval = future.result()
+                    # Format p-value with precision and scientific notation
+                    formatted_pval = f"{pval:.3f}" if pval >= 0.001 else f"{pval:.1e}"
+                    self.pvals[i, j] = formatted_pval
+                    self.pvals[j, i] = formatted_pval
+                except Exception as exc:
+                    logger.error(f"Pair ({i}, {j}) generated an exception: {exc}")
+
+        # Create a DataFrame from the p-values
         self.pvals_df = pd.DataFrame(
             self.pvals,
-            columns=self.ranks_per_metric["method"],
-            index=self.ranks_per_metric["method"],
+            columns=ranks_per_metric_sorted["method"],
+            index=ranks_per_metric_sorted["method"],
         )
-        self.pvals_df["method"] = self.ranks_per_metric["method"].tolist()
-        self.pvals_df = self.pvals_df.set_index("method")
+        print("Finished permutation test.")
 
     def get_rankings_and_pvals(self) -> Tuple[pd.DataFrame, pd.DataFrame]:
         """
-        Get the rankings of the methods.
+        Returns the final rankings and p-values.
+
+        The p-values matrix is returned as a DataFrame with only the upper right
+        diagonal, to avoid redundancy.
 
         Returns:
-            Tuple[pd.DataFrame, pd.DataFrame]: A tuple containing the rankings and p-values dataframes.
+            Tuple[pd.DataFrame, pd.DataFrame]: A tuple containing two DataFrames:
+                - The final rankings of the methods.
+                - The p-values from the permutation test.
         """
-        return self.ranks_per_metric, self.pvals_df
+        print("Retrieving rankings and p-values.")
+        ranks_df = self.ranks_per_metric.sort_values(by="final_rank").reset_index(
+            drop=True
+        )
+
+        # Create a DataFrame for the upper right diagonal of the p-values matrix
+        pvals_upper_df = pd.DataFrame(
+            np.triu(self.pvals),  # Use np.triu to get the upper triangle of the matrix
+            columns=self.pvals_df.columns,
+            index=self.pvals_df.index,
+        ).reindex(index=ranks_df["method"], columns=ranks_df["method"])
+        print("Successfully retrieved rankings and p-values.")
+        return ranks_df, pvals_upper_df
diff --git a/test_ranking_fix.py b/test_ranking_fix.py
new file mode 100644
index 0000000..e69de29
diff --git a/tests_full.py b/tests_full.py
index 6a936fd..6eac063 100644
--- a/tests_full.py
+++ b/tests_full.py
@@ -1,6 +1,7 @@
 from pathlib import Path
-import pandas as pd
+
 import numpy as np
+import pandas as pd
 
 from pyranker.cli.run import main
 
@@ -43,9 +44,9 @@ def _sanity_check(output_dir: str) -> None:
 
 
 def test_main_dir_input():
-    cwd = Path.cwd()
-    test_data_dir = (cwd / "data").absolute().as_posix()
-    test_output_dir = (cwd / "data" / "temp_output").absolute().as_posix()
+    test_dir = Path(__file__).parent
+    test_data_dir = (test_dir / "data").absolute().as_posix()
+    test_output_dir = (test_dir / "data" / "temp_output").absolute().as_posix()
     main(
         input=test_data_dir,
         outputdir=test_output_dir,
@@ -56,9 +57,9 @@ def test_main_dir_input():
 
 
 def test_main_files_input():
-    cwd = Path.cwd()
-    test_data_dir = cwd / "data"
-    test_output_dir = (cwd / "data" / "temp_output").absolute().as_posix()
+    test_dir = Path(__file__).parent
+    test_data_dir = test_dir / "data"
+    test_output_dir = (test_dir / "data" / "temp_output").absolute().as_posix()
     input_files = ""
     for file in test_data_dir.iterdir():
         if file.suffix == ".csv":
@@ -71,3 +72,56 @@ def test_main_files_input():
     )
 
     _sanity_check(test_output_dir)
+
+
+def test_main_weighted_ranking(tmp_path):
+    """
+    Test the weighted ranking functionality.
+    """
+    # Create a temporary directory for test data
+    data_dir = tmp_path / "data"
+    data_dir.mkdir()
+    output_dir = tmp_path / "output"
+    output_dir.mkdir()
+
+    # Create sample CSV files
+    method1_data = {
+        "subjectid": ["s1", "s2"],
+        "metricA": [10, 20],
+        "metricB": [0.1, 0.2],
+    }
+    method1_df = pd.DataFrame(method1_data)
+    method1_df.to_csv(data_dir / "method1.csv", index=False)
+
+    method2_data = {
+        "subjectid": ["s1", "s2"],
+        "metricA": [15, 5],
+        "metricB": [0.3, 0.4],
+    }
+    method2_df = pd.DataFrame(method2_data)
+    method2_df.to_csv(data_dir / "method2.csv", index=False)
+
+    # Call main with weighted ranking arguments
+    main(
+        input=str(data_dir),
+        outputdir=str(output_dir),
+        metrics_for_reversal="metricB",
+        metric_to_use="metricA,metricB",
+        weight="3,1",
+    )
+
+    # Check the output
+    ranks_file = output_dir / "ranks.csv"
+    assert ranks_file.exists(), "Ranks file does not exist"
+    ranks_df = pd.read_csv(ranks_file)
+
+    expected_ranks = {
+        "method1": 1.5,
+        "method2": 1.5,
+    }
+
+    for method, expected_rank in expected_ranks.items():
+        assert (
+            ranks_df[ranks_df["method"] == method]["final_rank"].values[0]
+            == expected_rank
+        ), f"Final rank for {method} is not as expected"