marrlab · smilesun · Jan 17, 2024 · Jan 12, 2024 · Jan 12, 2024 · Jan 12, 2024
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -41,4 +41,4 @@ jobs:
     - name: test if examples in markdown works
       run: bash -x -v ci_run_examples.sh
     - name: test if benchmark works
-      run: pip install snakemake && sed -i '1s/^/#!\/bin\/bash -x -v\n/' run_benchmark_standalone.sh && bash -x -v run_benchmark_standalone.sh examples/benchmark/demo_shared_hyper_grid.yaml && cat zoutput/benchmarks/mnist_benchmark_grid/hyperparameters.csv && cat zoutput/benchmarks/mnist_benchmark_grid/results.csv
+      run: pip install snakemake==7.32.0 && pip install pulp==2.7.0 && sed -i '1s/^/#!\/bin\/bash -x -v\n/' run_benchmark_standalone.sh && bash -x -v run_benchmark_standalone.sh examples/benchmark/demo_shared_hyper_grid.yaml && cat zoutput/benchmarks/mnist_benchmark_grid/hyperparameters.csv && cat zoutput/benchmarks/mnist_benchmark_grid/results.csv
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -0,0 +1,19 @@
+repos:
+  - repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v4.5.0 # Use the specific version of the repo
+    hooks:
+      - id: trailing-whitespace
+      - id: end-of-file-fixer
+      - id: check-yaml
+  - repo: https://github.com/pycqa/flake8
+    rev: 7.0.0
+    hooks:
+      - id: flake8
+  - repo: https://github.com/PyCQA/isort
+    rev: 5.13.2
+    hooks:
+      - id: isort
+  - repo: https://github.com/psf/black
+    rev: 23.12.1
+    hooks:
+      - id: black
diff --git a/README.md b/README.md
@@ -8,18 +8,18 @@
 
 ## Distribution shifts, domain generalization and DomainLab
 
-Neural networks trained using data from a specific distribution (domain) usually fails to generalize to novel distributions (domains). Domain generalization aims at learning domain invariant features by utilizing data from multiple domains (data sites, corhorts, batches, vendors) so the learned feature can generalize to new unseen domains (distributions). 
+Neural networks trained using data from a specific distribution (domain) usually fails to generalize to novel distributions (domains). Domain generalization aims at learning domain invariant features by utilizing data from multiple domains (data sites, corhorts, batches, vendors) so the learned feature can generalize to new unseen domains (distributions).
 
 <div style="align: center; text-align:center;">
-<img src="https://github.com/marrlab/DomainLab/blob/master/docs/figs/invarfeat4dg.png?raw=true" style="width:400px;"/> 
+<img src="https://github.com/marrlab/DomainLab/blob/master/docs/figs/invarfeat4dg.png?raw=true" style="width:400px;"/>
 </div>
 
 DomainLab is a software platform with state-of-the-art domain generalization algorithms implemented, designed by maximal decoupling of different software components thus enhances maximal code reuse.
 
 DomainLab decouples the following concepts or objects:
 - task $M$: a combination of datasets (e.g. from distribution $D_1$ and $D_2$)
 - neural network: a map $\phi$ from the input data to the feature space and a map $\varphi$ from feature space to output $\hat{y}$ (e.g. decision variable).
-- model: structural risk in the form of $\ell() + \mu R()$  where 
+- model: structural risk in the form of $\ell() + \mu R()$  where
   - $\ell(Y, \hat{y}=\varphi(\phi(X)))$ is the task specific empirical loss (e.g. cross entropy for classification task).
   - $R(\phi(X))$ is the penalty loss to boost domain invariant feature extraction using $\phi$.
   - $\mu$ is the corresponding multiplier to each penalty function factor.
@@ -35,21 +35,21 @@ DomainLab makes it possible to combine models with models, trainers with models,
 ### Installation
 For development version in Github, see [Installation and Dependencies handling](./docs/doc_install.md)
 
-We also offer a PyPI version here https://pypi.org/project/domainlab/  which one could install via `pip install domainlab` and it is recommended to create a virtual environment for it. 
+We also offer a PyPI version here https://pypi.org/project/domainlab/  which one could install via `pip install domainlab` and it is recommended to create a virtual environment for it.
 
 ### Task specification
 In DomainLab, a task is a container for datasets from different domains. See detail in
 [Task Specification](./docs/doc_tasks.md)
 
 ### Example and usage
 
-#### Either clone this repo and use command line 
+#### Either clone this repo and use command line
 
 `python main_out.py -c ./examples/conf/vlcs_diva_mldg_dial.yaml`
 where the configuration file below can be downloaded [here](https://raw.githubusercontent.com/marrlab/DomainLab/master/examples/conf/vlcs_diva_mldg_dial.yaml)
 ```
 te_d: caltech                       # domain name of test domain
-tpath: examples/tasks/task_vlcs.py  # python file path to specify the task 
+tpath: examples/tasks/task_vlcs.py  # python file path to specify the task
 bs: 2                               # batch size
 model: dann_diva                    # combine model DANN with DIVA
 epos: 1                             # number of epochs
@@ -67,16 +67,16 @@ See example here: [Transformer as feature extractor, decorate JIGEN with DANN, t
 
 
 ### Benchmark different methods
-DomainLab provides a powerful benchmark functionality. 
+DomainLab provides a powerful benchmark functionality.
 To benchmark several algorithms(combination of neural networks, models, trainers and associated hyperparameters), a single line command along with a benchmark configuration files is sufficient. See details in [benchmarks documentation and tutorial](./docs/doc_benchmark.md)
 
-One could simply run 
-`bash run_benchmark_slurm.sh your_benchmark_configuration.yaml` to launch different experiments with specified configuraiton. 
+One could simply run
+`bash run_benchmark_slurm.sh your_benchmark_configuration.yaml` to launch different experiments with specified configuraiton.
 
 
 For example,  the following result (without any augmentation like flip) is for PACS dataset.
 
 <div style="align: center; text-align:center;">
-<img src="https://github.com/marrlab/DomainLab/blob/master/docs/figs/stochastic_variation_two_rows.png?raw=true" style="width:800px;"/> 
+<img src="https://github.com/marrlab/DomainLab/blob/master/docs/figs/stochastic_variation_two_rows.png?raw=true" style="width:800px;"/>
 </div>
 where each rectangle represent one model trainer combination, each bar inside the rectangle represent a unique hyperparameter index associated with that method combination, each dot represent a random seeds.
diff --git a/ci.sh b/ci.sh
@@ -21,5 +21,3 @@ endtime=`date +%s`
 runtime=$((endtime-starttime))
 echo "total time used:"
 echo "$runtime"
-
-
diff --git a/ci_pytest_cov.sh b/ci_pytest_cov.sh
@@ -1,5 +1,5 @@
 #!/bin/bash
-export CUDA_VISIBLE_DEVICES=""   
+export CUDA_VISIBLE_DEVICES=""
 # although garbage collector has been explicitly called, sometimes there is still CUDA out of memory error
 # so it is better not to use GPU to do the pytest to ensure every time there is no CUDA out of memory error occuring
 # --cov-report term-missing to show in console file wise coverage and lines missing

diff --git a/ci_run_examples.sh b/ci_run_examples.sh
@@ -6,8 +6,8 @@ set -e  # exit upon first error
 # echo "#!/bin/bash -x -v" > sh_temp_example.sh
 sed -n '/```shell/,/```/ p' docs/doc_examples.md | sed '/^```/ d' >> ./sh_temp_example.sh
 split -l 5 sh_temp_example.sh sh_example_split
-for file in sh_example_split*; 
-do (echo "#!/bin/bash -x -v" > "$file"_exe && cat "$file" >> "$file"_exe && bash -x -v "$file"_exe && rm -r zoutput); 
+for file in sh_example_split*;
+do (echo "#!/bin/bash -x -v" > "$file"_exe && cat "$file" >> "$file"_exe && bash -x -v "$file"_exe && rm -r zoutput);
 done
 # bash -x -v -e sh_temp_example.sh
 echo "general examples done"

diff --git a/data/mixed_codec/caltech/auto/text.txt b/data/mixed_codec/caltech/auto/text.txt
@@ -1 +1 @@
-Hello World
+Hello World
diff --git a/data/script/download_pacs.py b/data/script/download_pacs.py
@@ -1,26 +1,29 @@
-'this script can be used to download the pacs dataset'
+"this script can be used to download the pacs dataset"
 import os
 import tarfile
 from zipfile import ZipFile
+
 import gdown
 
+
 def stage_path(data_dir, name):
-    '''
+    """
     creates the path to data_dir/name
     if it does not exist already
-    '''
+    """
     full_path = os.path.join(data_dir, name)
 
     if not os.path.exists(full_path):
         os.makedirs(full_path)
 
     return full_path
 
+
 def download_and_extract(url, dst, remove=True):
-    '''
+    """
     downloads and extracts the data behind the url
     and saves it at dst
-    '''
+    """
     gdown.download(url, dst, quiet=False)
 
     if dst.endswith(".tar.gz"):
@@ -43,17 +46,19 @@ def download_and_extract(url, dst, remove=True):
 
 
 def download_pacs(data_dir):
-    '''
+    """
     download and extract dataset pacs.
     Dataset is saved at location data_dir
-    '''
+    """
     full_path = stage_path(data_dir, "PACS")
 
-    download_and_extract("https://drive.google.com/uc?id=1JFr8f805nMUelQWWmfnJR3y4_SYoN5Pd",
-                         os.path.join(data_dir, "PACS.zip"))
+    download_and_extract(
+        "https://drive.google.com/uc?id=1JFr8f805nMUelQWWmfnJR3y4_SYoN5Pd",
+        os.path.join(data_dir, "PACS.zip"),
+    )
+
+    os.rename(os.path.join(data_dir, "kfold"), full_path)
 
-    os.rename(os.path.join(data_dir, "kfold"),
-              full_path)
 
-if __name__ == '__main__':
-    download_pacs('../pacs')
+if __name__ == "__main__":
+    download_pacs("../pacs")
diff --git a/data/ztest_files/dummy_file.py b/data/ztest_files/dummy_file.py
@@ -1,4 +1,4 @@
-'''
+"""
 I am a dummy file used in tests/test_git_tag.py
 to produce a file which is not commited
-'''
+"""
diff --git a/docs/.nojekyll b/docs/.nojekyll
@@ -1 +0,0 @@
-

diff --git a/docs/conf.py b/docs/conf.py
@@ -15,18 +15,16 @@
 # Incase the project was not installed
 import os
 import sys
-import sphinx_material
 from datetime import datetime
 
+import sphinx_material
+
 sys.path.insert(0, os.path.abspath(".."))
 
 # -- Project information -----------------------------------------------------
 
 project = "domainlab"  # @FIXME
-copyright = (
-    f"2021-{datetime.now().year}, Marr Lab."
-    ""
-)
+copyright = f"2021-{datetime.now().year}, Marr Lab." ""
 
 author = "Xudong Sun, et.al."
 
@@ -94,11 +92,11 @@
 #        '.md': 'recommonmark.parser.CommonMarkParser',
 # }
 
-source_suffix = ['.rst', '.md']
+source_suffix = [".rst", ".md"]
 source_suffix = {
-        '.rst': 'restructuredtext',
-        '.txt': 'markdown',
-        '.md': 'markdown',
+    ".rst": "restructuredtext",
+    ".txt": "markdown",
+    ".md": "markdown",
 }
 
 # The master toctree document.
@@ -114,11 +112,13 @@
 # List of patterns, relative to source directory, that match files and
 # directories to ignore when looking for source files.
 # This pattern also affects html_static_path and html_extra_path .
-exclude_patterns = ["setup.py",
-                    "_build",
-                    "Thumbs.db",
-                    ".DS_Store",
-                    "**.ipynb_checkpoints"]
+exclude_patterns = [
+    "setup.py",
+    "_build",
+    "Thumbs.db",
+    ".DS_Store",
+    "**.ipynb_checkpoints",
+]
 
 # The name of the Pygments (syntax highlighting) style to use.
 pygments_style = "default"
@@ -129,10 +129,9 @@
 # -- HTML theme settings ------------------------------------------------
 html_short_title = "domainlab"  # @FIXME
 html_show_sourcelink = False
-html_sidebars = {"**": ["logo-text.html",
-                        "globaltoc.html",
-                        "localtoc.html",
-                        "searchbox.html"]}
+html_sidebars = {
+    "**": ["logo-text.html", "globaltoc.html", "localtoc.html", "searchbox.html"]
+}
 
 html_theme_path = sphinx_material.html_theme_path()
 html_context = sphinx_material.get_html_context()
@@ -157,40 +156,30 @@
     "master_doc": False,
     "nav_title": "DomainLab",
     "nav_links": [
-        {
-            "href": "readme_link",
-            "internal": True,
-            "title": "Introduction"},
-        {
-            "href": "doc_tasks",
-            "internal": True,
-            "title": "Task Specification"},
+        {"href": "readme_link", "internal": True, "title": "Introduction"},
+        {"href": "doc_tasks", "internal": True, "title": "Task Specification"},
         {
             "href": "doc_custom_nn",
             "internal": True,
-            "title": "Specify neural network in commandline"},
+            "title": "Specify neural network in commandline",
+        },
         {
             "href": "doc_MNIST_classification",
             "internal": True,
-            "title": "Examples with MNIST"},
+            "title": "Examples with MNIST",
+        },
         {
             "href": "doc_examples",
             "internal": True,
-            "title": "More commandline examples"},
-
-        {
-            "href": "doc_benchmark",
-            "internal": True,
-            "title": "Benchmarks tutorial"},
-
-        {
-            "href": "doc_output",
-            "internal": True,
-            "title": "Output Structure"},
+            "title": "More commandline examples",
+        },
+        {"href": "doc_benchmark", "internal": True, "title": "Benchmarks tutorial"},
+        {"href": "doc_output", "internal": True, "title": "Output Structure"},
         {
             "href": "doc_extend_contribute",
             "internal": True,
-            "title": "Specify custom model in commandline"},
+            "title": "Specify custom model in commandline",
+        },
         # {
         #     "href": "https://squidfunk.github.io/mkdocs-material/",
         #     "internal": False,
@@ -251,7 +240,7 @@
 
 # One entry per manual page. List of tuples
 # (source start file, name, description, authors, manual section).
-man_pages = [(master_doc, "domainlab", "domainlab", [author], 1)]   # @FIXME
+man_pages = [(master_doc, "domainlab", "domainlab", [author], 1)]  # @FIXME
 
 
 # -- Options for Texinfo output ----------------------------------------------

diff --git a/docs/conf0.py b/docs/conf0.py
@@ -12,17 +12,18 @@
 #
 import os
 import sys
-sys.path.insert(0, os.path.abspath('..'))
+
+sys.path.insert(0, os.path.abspath(".."))
 sys.setrecursionlimit(1500)
 
 # -- Project information -----------------------------------------------------
 
-project = 'domainlab'
-copyright = '2022, Xudong Sun'
-author = 'Xudong Sun'
+project = "domainlab"
+copyright = "2022, Xudong Sun"
+author = "Xudong Sun"
 
 # The full version, including alpha/beta/rc tags
-release = '0.0.0'
+release = "0.0.0"
 
 
 # -- General configuration ---------------------------------------------------
@@ -46,7 +47,7 @@
 ]
 
 # Add any paths that contain templates here, relative to this directory.
-templates_path = ['_templates']
+templates_path = ["_templates"]
 
 # The language for content autogenerated by Sphinx. Refer to documentation
 # for a list of supported languages.
@@ -59,17 +60,17 @@
 # List of patterns, relative to source directory, that match files and
 # directories to ignore when looking for source files.
 # This pattern also affects html_static_path and html_extra_path.
-exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
+exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]
 
 
 # -- Options for HTML output -------------------------------------------------
 
 # The theme to use for HTML and HTML Help pages.  See the documentation for
 # a list of builtin themes.
 #
-html_theme = 'alabaster'
+html_theme = "alabaster"
 
 # Add any paths that contain custom static files (such as style sheets) here,
 # relative to this directory. They are copied after the builtin static files,
 # so a file named "default.css" will overwrite the builtin "default.css".
-html_static_path = ['_static']
+html_static_path = ["_static"]
Original file line number	Diff line number	Diff line change
Expand Up		@@ -21,5 +21,3 @@ endtime=`date +%s`
		runtime=$((endtime-starttime))
		echo "total time used:"
		echo "$runtime"