diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index 36cdc2044..0e825f216 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -41,4 +41,4 @@ jobs:
- name: test if examples in markdown works
run: bash -x -v ci_run_examples.sh
- name: test if benchmark works
- run: pip install snakemake && sed -i '1s/^/#!\/bin\/bash -x -v\n/' run_benchmark_standalone.sh && bash -x -v run_benchmark_standalone.sh examples/benchmark/demo_shared_hyper_grid.yaml && cat zoutput/benchmarks/mnist_benchmark_grid/hyperparameters.csv && cat zoutput/benchmarks/mnist_benchmark_grid/results.csv
+ run: pip install snakemake==7.32.0 && pip install pulp==2.7.0 && sed -i '1s/^/#!\/bin\/bash -x -v\n/' run_benchmark_standalone.sh && bash -x -v run_benchmark_standalone.sh examples/benchmark/demo_shared_hyper_grid.yaml && cat zoutput/benchmarks/mnist_benchmark_grid/hyperparameters.csv && cat zoutput/benchmarks/mnist_benchmark_grid/results.csv
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
new file mode 100644
index 000000000..9b66a6665
--- /dev/null
+++ b/.pre-commit-config.yaml
@@ -0,0 +1,19 @@
+repos:
+ - repo: https://github.com/pre-commit/pre-commit-hooks
+ rev: v4.5.0 # Use the specific version of the repo
+ hooks:
+ - id: trailing-whitespace
+ - id: end-of-file-fixer
+ - id: check-yaml
+ - repo: https://github.com/pycqa/flake8
+ rev: 7.0.0
+ hooks:
+ - id: flake8
+ - repo: https://github.com/PyCQA/isort
+ rev: 5.13.2
+ hooks:
+ - id: isort
+ - repo: https://github.com/psf/black
+ rev: 23.12.1
+ hooks:
+ - id: black
diff --git a/README.md b/README.md
index e7f149b31..f328ee905 100644
--- a/README.md
+++ b/README.md
@@ -8,10 +8,10 @@
## Distribution shifts, domain generalization and DomainLab
-Neural networks trained using data from a specific distribution (domain) usually fails to generalize to novel distributions (domains). Domain generalization aims at learning domain invariant features by utilizing data from multiple domains (data sites, corhorts, batches, vendors) so the learned feature can generalize to new unseen domains (distributions).
+Neural networks trained using data from a specific distribution (domain) usually fails to generalize to novel distributions (domains). Domain generalization aims at learning domain invariant features by utilizing data from multiple domains (data sites, corhorts, batches, vendors) so the learned feature can generalize to new unseen domains (distributions).
-
+
DomainLab is a software platform with state-of-the-art domain generalization algorithms implemented, designed by maximal decoupling of different software components thus enhances maximal code reuse.
@@ -19,7 +19,7 @@ DomainLab is a software platform with state-of-the-art domain generalization alg
DomainLab decouples the following concepts or objects:
- task $M$: a combination of datasets (e.g. from distribution $D_1$ and $D_2$)
- neural network: a map $\phi$ from the input data to the feature space and a map $\varphi$ from feature space to output $\hat{y}$ (e.g. decision variable).
-- model: structural risk in the form of $\ell() + \mu R()$ where
+- model: structural risk in the form of $\ell() + \mu R()$ where
- $\ell(Y, \hat{y}=\varphi(\phi(X)))$ is the task specific empirical loss (e.g. cross entropy for classification task).
- $R(\phi(X))$ is the penalty loss to boost domain invariant feature extraction using $\phi$.
- $\mu$ is the corresponding multiplier to each penalty function factor.
@@ -35,7 +35,7 @@ DomainLab makes it possible to combine models with models, trainers with models,
### Installation
For development version in Github, see [Installation and Dependencies handling](./docs/doc_install.md)
-We also offer a PyPI version here https://pypi.org/project/domainlab/ which one could install via `pip install domainlab` and it is recommended to create a virtual environment for it.
+We also offer a PyPI version here https://pypi.org/project/domainlab/ which one could install via `pip install domainlab` and it is recommended to create a virtual environment for it.
### Task specification
In DomainLab, a task is a container for datasets from different domains. See detail in
@@ -43,13 +43,13 @@ In DomainLab, a task is a container for datasets from different domains. See det
### Example and usage
-#### Either clone this repo and use command line
+#### Either clone this repo and use command line
`python main_out.py -c ./examples/conf/vlcs_diva_mldg_dial.yaml`
where the configuration file below can be downloaded [here](https://raw.githubusercontent.com/marrlab/DomainLab/master/examples/conf/vlcs_diva_mldg_dial.yaml)
```
te_d: caltech # domain name of test domain
-tpath: examples/tasks/task_vlcs.py # python file path to specify the task
+tpath: examples/tasks/task_vlcs.py # python file path to specify the task
bs: 2 # batch size
model: dann_diva # combine model DANN with DIVA
epos: 1 # number of epochs
@@ -67,16 +67,16 @@ See example here: [Transformer as feature extractor, decorate JIGEN with DANN, t
### Benchmark different methods
-DomainLab provides a powerful benchmark functionality.
+DomainLab provides a powerful benchmark functionality.
To benchmark several algorithms(combination of neural networks, models, trainers and associated hyperparameters), a single line command along with a benchmark configuration files is sufficient. See details in [benchmarks documentation and tutorial](./docs/doc_benchmark.md)
-One could simply run
-`bash run_benchmark_slurm.sh your_benchmark_configuration.yaml` to launch different experiments with specified configuraiton.
+One could simply run
+`bash run_benchmark_slurm.sh your_benchmark_configuration.yaml` to launch different experiments with specified configuraiton.
For example, the following result (without any augmentation like flip) is for PACS dataset.
-
+
where each rectangle represent one model trainer combination, each bar inside the rectangle represent a unique hyperparameter index associated with that method combination, each dot represent a random seeds.
diff --git a/ci.sh b/ci.sh
index e95f51f1b..408dc2d95 100644
--- a/ci.sh
+++ b/ci.sh
@@ -21,5 +21,3 @@ endtime=`date +%s`
runtime=$((endtime-starttime))
echo "total time used:"
echo "$runtime"
-
-
diff --git a/ci_pytest_cov.sh b/ci_pytest_cov.sh
index 0b3f2b133..c0ebf6d70 100644
--- a/ci_pytest_cov.sh
+++ b/ci_pytest_cov.sh
@@ -1,5 +1,5 @@
#!/bin/bash
-export CUDA_VISIBLE_DEVICES=""
+export CUDA_VISIBLE_DEVICES=""
# although garbage collector has been explicitly called, sometimes there is still CUDA out of memory error
# so it is better not to use GPU to do the pytest to ensure every time there is no CUDA out of memory error occuring
# --cov-report term-missing to show in console file wise coverage and lines missing
diff --git a/ci_run_examples.sh b/ci_run_examples.sh
index 15cdc0fbc..9f6b4e041 100644
--- a/ci_run_examples.sh
+++ b/ci_run_examples.sh
@@ -6,8 +6,8 @@ set -e # exit upon first error
# echo "#!/bin/bash -x -v" > sh_temp_example.sh
sed -n '/```shell/,/```/ p' docs/doc_examples.md | sed '/^```/ d' >> ./sh_temp_example.sh
split -l 5 sh_temp_example.sh sh_example_split
-for file in sh_example_split*;
-do (echo "#!/bin/bash -x -v" > "$file"_exe && cat "$file" >> "$file"_exe && bash -x -v "$file"_exe && rm -r zoutput);
+for file in sh_example_split*;
+do (echo "#!/bin/bash -x -v" > "$file"_exe && cat "$file" >> "$file"_exe && bash -x -v "$file"_exe && rm -r zoutput);
done
# bash -x -v -e sh_temp_example.sh
echo "general examples done"
diff --git a/data/mixed_codec/caltech/auto/text.txt b/data/mixed_codec/caltech/auto/text.txt
index 5e1c309da..557db03de 100644
--- a/data/mixed_codec/caltech/auto/text.txt
+++ b/data/mixed_codec/caltech/auto/text.txt
@@ -1 +1 @@
-Hello World
\ No newline at end of file
+Hello World
diff --git a/data/script/download_pacs.py b/data/script/download_pacs.py
index b05c7a4de..51c346f24 100644
--- a/data/script/download_pacs.py
+++ b/data/script/download_pacs.py
@@ -1,14 +1,16 @@
-'this script can be used to download the pacs dataset'
+"this script can be used to download the pacs dataset"
import os
import tarfile
from zipfile import ZipFile
+
import gdown
+
def stage_path(data_dir, name):
- '''
+ """
creates the path to data_dir/name
if it does not exist already
- '''
+ """
full_path = os.path.join(data_dir, name)
if not os.path.exists(full_path):
@@ -16,11 +18,12 @@ def stage_path(data_dir, name):
return full_path
+
def download_and_extract(url, dst, remove=True):
- '''
+ """
downloads and extracts the data behind the url
and saves it at dst
- '''
+ """
gdown.download(url, dst, quiet=False)
if dst.endswith(".tar.gz"):
@@ -43,17 +46,19 @@ def download_and_extract(url, dst, remove=True):
def download_pacs(data_dir):
- '''
+ """
download and extract dataset pacs.
Dataset is saved at location data_dir
- '''
+ """
full_path = stage_path(data_dir, "PACS")
- download_and_extract("https://drive.google.com/uc?id=1JFr8f805nMUelQWWmfnJR3y4_SYoN5Pd",
- os.path.join(data_dir, "PACS.zip"))
+ download_and_extract(
+ "https://drive.google.com/uc?id=1JFr8f805nMUelQWWmfnJR3y4_SYoN5Pd",
+ os.path.join(data_dir, "PACS.zip"),
+ )
+
+ os.rename(os.path.join(data_dir, "kfold"), full_path)
- os.rename(os.path.join(data_dir, "kfold"),
- full_path)
-if __name__ == '__main__':
- download_pacs('../pacs')
+if __name__ == "__main__":
+ download_pacs("../pacs")
diff --git a/data/ztest_files/dummy_file.py b/data/ztest_files/dummy_file.py
index ee817a687..e0c7faa27 100644
--- a/data/ztest_files/dummy_file.py
+++ b/data/ztest_files/dummy_file.py
@@ -1,4 +1,4 @@
-'''
+"""
I am a dummy file used in tests/test_git_tag.py
to produce a file which is not commited
-'''
+"""
diff --git a/docs/.nojekyll b/docs/.nojekyll
index 8b1378917..e69de29bb 100644
--- a/docs/.nojekyll
+++ b/docs/.nojekyll
@@ -1 +0,0 @@
-
diff --git a/docs/conf.py b/docs/conf.py
index 0d5d79a76..bfa3e6eb3 100644
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -15,18 +15,16 @@
# Incase the project was not installed
import os
import sys
-import sphinx_material
from datetime import datetime
+import sphinx_material
+
sys.path.insert(0, os.path.abspath(".."))
# -- Project information -----------------------------------------------------
project = "domainlab" # @FIXME
-copyright = (
- f"2021-{datetime.now().year}, Marr Lab."
- ""
-)
+copyright = f"2021-{datetime.now().year}, Marr Lab." ""
author = "Xudong Sun, et.al."
@@ -94,11 +92,11 @@
# '.md': 'recommonmark.parser.CommonMarkParser',
# }
-source_suffix = ['.rst', '.md']
+source_suffix = [".rst", ".md"]
source_suffix = {
- '.rst': 'restructuredtext',
- '.txt': 'markdown',
- '.md': 'markdown',
+ ".rst": "restructuredtext",
+ ".txt": "markdown",
+ ".md": "markdown",
}
# The master toctree document.
@@ -114,11 +112,13 @@
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path .
-exclude_patterns = ["setup.py",
- "_build",
- "Thumbs.db",
- ".DS_Store",
- "**.ipynb_checkpoints"]
+exclude_patterns = [
+ "setup.py",
+ "_build",
+ "Thumbs.db",
+ ".DS_Store",
+ "**.ipynb_checkpoints",
+]
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = "default"
@@ -129,10 +129,9 @@
# -- HTML theme settings ------------------------------------------------
html_short_title = "domainlab" # @FIXME
html_show_sourcelink = False
-html_sidebars = {"**": ["logo-text.html",
- "globaltoc.html",
- "localtoc.html",
- "searchbox.html"]}
+html_sidebars = {
+ "**": ["logo-text.html", "globaltoc.html", "localtoc.html", "searchbox.html"]
+}
html_theme_path = sphinx_material.html_theme_path()
html_context = sphinx_material.get_html_context()
@@ -157,40 +156,30 @@
"master_doc": False,
"nav_title": "DomainLab",
"nav_links": [
- {
- "href": "readme_link",
- "internal": True,
- "title": "Introduction"},
- {
- "href": "doc_tasks",
- "internal": True,
- "title": "Task Specification"},
+ {"href": "readme_link", "internal": True, "title": "Introduction"},
+ {"href": "doc_tasks", "internal": True, "title": "Task Specification"},
{
"href": "doc_custom_nn",
"internal": True,
- "title": "Specify neural network in commandline"},
+ "title": "Specify neural network in commandline",
+ },
{
"href": "doc_MNIST_classification",
"internal": True,
- "title": "Examples with MNIST"},
+ "title": "Examples with MNIST",
+ },
{
"href": "doc_examples",
"internal": True,
- "title": "More commandline examples"},
-
- {
- "href": "doc_benchmark",
- "internal": True,
- "title": "Benchmarks tutorial"},
-
- {
- "href": "doc_output",
- "internal": True,
- "title": "Output Structure"},
+ "title": "More commandline examples",
+ },
+ {"href": "doc_benchmark", "internal": True, "title": "Benchmarks tutorial"},
+ {"href": "doc_output", "internal": True, "title": "Output Structure"},
{
"href": "doc_extend_contribute",
"internal": True,
- "title": "Specify custom model in commandline"},
+ "title": "Specify custom model in commandline",
+ },
# {
# "href": "https://squidfunk.github.io/mkdocs-material/",
# "internal": False,
@@ -251,7 +240,7 @@
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
-man_pages = [(master_doc, "domainlab", "domainlab", [author], 1)] # @FIXME
+man_pages = [(master_doc, "domainlab", "domainlab", [author], 1)] # @FIXME
# -- Options for Texinfo output ----------------------------------------------
diff --git a/docs/conf0.py b/docs/conf0.py
index 21f1ad1ce..c6138b9f6 100644
--- a/docs/conf0.py
+++ b/docs/conf0.py
@@ -12,17 +12,18 @@
#
import os
import sys
-sys.path.insert(0, os.path.abspath('..'))
+
+sys.path.insert(0, os.path.abspath(".."))
sys.setrecursionlimit(1500)
# -- Project information -----------------------------------------------------
-project = 'domainlab'
-copyright = '2022, Xudong Sun'
-author = 'Xudong Sun'
+project = "domainlab"
+copyright = "2022, Xudong Sun"
+author = "Xudong Sun"
# The full version, including alpha/beta/rc tags
-release = '0.0.0'
+release = "0.0.0"
# -- General configuration ---------------------------------------------------
@@ -46,7 +47,7 @@
]
# Add any paths that contain templates here, relative to this directory.
-templates_path = ['_templates']
+templates_path = ["_templates"]
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
@@ -59,7 +60,7 @@
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
-exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
+exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]
# -- Options for HTML output -------------------------------------------------
@@ -67,9 +68,9 @@
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
-html_theme = 'alabaster'
+html_theme = "alabaster"
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
-html_static_path = ['_static']
+html_static_path = ["_static"]
diff --git a/docs/docDIAL.md b/docs/docDIAL.md
index 2765b5762..8b8111de8 100644
--- a/docs/docDIAL.md
+++ b/docs/docDIAL.md
@@ -7,22 +7,22 @@ The algorithm introduced in https://arxiv.org/pdf/2104.00322.pdf uses adversaria
## generating the adversarial domain
The generation of adversary images is demonstrated in figure 1.
-The task is to find an adversary image $x'$ to the natural image $x$ with $||x- x'||$ small, such that the output of a classification network $\phi$ fulfills $||\phi(x) - \phi(x')||$ big. In the example in figure 1 you can for example see, that the difference between the left and the right image of the panda is unobservable, but the classifier does still classify them differently.
+The task is to find an adversary image $x'$ to the natural image $x$ with $||x- x'||$ small, such that the output of a classification network $\phi$ fulfills $||\phi(x) - \phi(x')||$ big. In the example in figure 1 you can for example see, that the difference between the left and the right image of the panda is unobservable, but the classifier does still classify them differently.
In Domainlab the adversary images are created starting from a random perturbation of the natural image $x'_0 = x + \sigma \tilde{x}~$, $\tilde{x} \sim \mathcal{N}(0, 1)$ and using $n$ steps in a gradient descend with step size $\tau$ to maximize $||\phi(x) - \phi(x')||$. In general machine learning, the generation of adversary images is used during the training process to make networks more robust to adversarial attacks.
-
+
Figure 1: adversarial domain (Image source: Figure 1 of Explaining and Harnessing Adversarial Examples https://arxiv.org/abs/1412.6572)
## network structure
-The network consists of three parts. At first a feature extractor, which extracts the main characteristics of the images. This features are then used as the input to a label classifier and a domain classifier.
+The network consists of three parts. At first a feature extractor, which extracts the main characteristics of the images. This features are then used as the input to a label classifier and a domain classifier.
During training the network is optimized to a have low error on the classification task, while ensuring that the internal representation (output of the feature extractor) cannot discriminate between the natural and adversarial domain. This goal can be archived by using a special loss function in combination with a gradient reversal layer.
@@ -42,7 +42,7 @@ During training the network is optimized to a have low error on the classificati
[comment]: <> ($$)
-[comment]: <> (DIAL_{CE} = CE_{nat} + \lambda ~ CE_{adv} - r / D_{nat} + D_{adv} / )
+[comment]: <> (DIAL_{CE} = CE_{nat} + \lambda ~ CE_{adv} - r / D_{nat} + D_{adv} / )
[comment]: <> ($$)
diff --git a/docs/docFishr.md b/docs/docFishr.md
index 3949c332f..08580d9fe 100644
--- a/docs/docFishr.md
+++ b/docs/docFishr.md
@@ -1,14 +1,14 @@
# Trainer Fishr
## Invariant Gradient Variances for Out-of-distribution Generalization
-The goal of the Fishr regularization technique is locally aligning the domain-level loss landscapes
+The goal of the Fishr regularization technique is locally aligning the domain-level loss landscapes
around the final weights, finding a minimizer around which the inconsistencies between
the domain-level loss landscapes are as small as possible.
This is done by considering second order terms during training, matching
the variances between the domain-level gradients.
-
+
Figure 1: Fishr matches the domain-level gradient variances of the
distributions across the training domains (Image source: Figure 1 of "Fishr:
Invariant gradient variances for out-of-distribution generalization")
@@ -19,18 +19,18 @@ Invariant gradient variances for out-of-distribution generalization")
### Quantifying inconsistency between domains
Intuitively, two domains are locally inconsistent around a minimizer, if a small
perturbation of the minimizer highly affects its optimality in one domain, but only
-minimally affects its optimality in the other domain. Under certain assumptions, most importantly
+minimally affects its optimality in the other domain. Under certain assumptions, most importantly
the Hessians being positive definite, it is possible to measure the inconsistency between two domains
$A$ and $B$ with the following inconsistency score:
$$
-\mathcal{I}^\epsilon ( \theta^* ) = \text{max}_ {(A,B)\in\mathcal{E}^2} \biggl( \mathcal{R}_ B (\theta^* ) - \mathcal{R}_ {A} ( \theta^* ) + \text{max}_ {\frac{1}{2} \theta^T H_A \theta\leq\epsilon}\frac{1}{2}\theta^T H_B \theta \biggl)
+\mathcal{I}^\epsilon ( \theta^* ) = \text{max}_ {(A,B)\in\mathcal{E}^2} \biggl( \mathcal{R}_ B (\theta^* ) - \mathcal{R}_ {A} ( \theta^* ) + \text{max}_ {\frac{1}{2} \theta^T H_A \theta\leq\epsilon}\frac{1}{2}\theta^T H_B \theta \biggl)
$$
, whereby $\theta^*$ denotes the minimizer, $\mathcal{E}$ denotes the set of training domains,
$H_e$ denotes the Hessian for $e\in\mathcal{E}$, $\theta$ denote the network parameters
and $\mathcal{R}_e$ for $e\in\mathcal{E}$ denotes the domain-level ERM objective.
-The Fishr regularization method forces both terms on the right hand side
+The Fishr regularization method forces both terms on the right hand side
of the inconsistency score to become small. The first term represents the difference
between the domain-level risks and is implicitly forced to be small by applying
the Fishr regularization. For the second term it suffices to align diagonal approximations of the
@@ -64,7 +64,7 @@ $v = \frac{1}{|\mathcal{E}|}\sum_{e\in\mathcal{E}} v_e$.
### Implementation
The variance of the gradients within each domain can be computed with the
BACKPACK package (see: Dangel, Felix, Frederik Kunstner, and Philipp Hennig.
-"Backpack: Packing more into backprop." https://arxiv.org/abs/1912.10985).
+"Backpack: Packing more into backprop." https://arxiv.org/abs/1912.10985).
Further on, we use $ \textnormal{Var}(G) \approx \textnormal{diag}(H) $.
The Hessian is then approximated by the Fisher Information Matrix, which
again is approximated by an empirical estimator for computational efficiency.
diff --git a/docs/docHDUVA.md b/docs/docHDUVA.md
index d70b4dcf9..4abcb71e9 100644
--- a/docs/docHDUVA.md
+++ b/docs/docHDUVA.md
@@ -1,26 +1,24 @@
# Model HDUVA
## HDUVA: HIERARCHICAL VARIATIONAL AUTO-ENCODING FOR UNSUPERVISED DOMAIN GENERALIZATION
-HDUVA builds on a generative approach within the framework of variational autoencoders to facilitate generalization to new domains without supervision. HDUVA learns representations that disentangle domain-specific information from class-label specific information even in complex settings where domain structure is not observed during training.
+HDUVA builds on a generative approach within the framework of variational autoencoders to facilitate generalization to new domains without supervision. HDUVA learns representations that disentangle domain-specific information from class-label specific information even in complex settings where domain structure is not observed during training.
## Model Overview
-More specifically, HDUVA is based on three latent variables that are used to model distinct sources of variation and are denoted as $z_y$, $z_d$ and $z_x$. $z_y$ represents class specific information, $z_d$ represents domain specific information and $z_x$ models residual variance of the input. We introduce an additional hierarchical level and use a continuous latent representation s to model (potentially unobserved) domain structure. This means that we can encourage disentanglement of the latent variables through conditional priors without the need of conditioning on a one-hot-encoded, observed domain label. The model along with its parameters and hyperparameters is shown in Figure 1:
+More specifically, HDUVA is based on three latent variables that are used to model distinct sources of variation and are denoted as $z_y$, $z_d$ and $z_x$. $z_y$ represents class specific information, $z_d$ represents domain specific information and $z_x$ models residual variance of the input. We introduce an additional hierarchical level and use a continuous latent representation s to model (potentially unobserved) domain structure. This means that we can encourage disentanglement of the latent variables through conditional priors without the need of conditioning on a one-hot-encoded, observed domain label. The model along with its parameters and hyperparameters is shown in Figure 1:
-
+
Figure 1: Probabilistic graphical model for HDUVA:Hierarchical Domain Unsupervised Variational Autoencoding.
-
-
Note that as part of the model a latent representation of $X$ is concatentated with $s$ and $z_d$ (dashed arrows), requiring respecive encoder networks.
## Evidence lower bound and overall loss
-The ELBO of the model can be decomposed into 4 different terms:
+The ELBO of the model can be decomposed into 4 different terms:
-Likelihood: $E_{q(z_d, s|x), q(z_x|x), q(z_y|x)}\log p_{\theta}(x|s, z_d, z_x, z_y)$
+Likelihood: $E_{q(z_d, s|x), q(z_x|x), q(z_y|x)}\log p_{\theta}(x|s, z_d, z_x, z_y)$
-KL divergence weighted as in the Beta-VAE: $-\beta_x KL(q_{\phi_x}(z_x|x)||p_{\theta_x}(z_x)) - \beta_y KL(q_{\phi_y}(z_y|x)||p_{\theta_y}(z_y|y))$
+KL divergence weighted as in the Beta-VAE: $-\beta_x KL(q_{\phi_x}(z_x|x)||p_{\theta_x}(z_x)) - \beta_y KL(q_{\phi_y}(z_y|x)||p_{\theta_y}(z_y|y))$
Hierarchical KL loss (domain term): $- \beta_d E_{q_{\phi_s}(s|x), q_{\phi_d}(z_d|x, s)} \log \frac{q_{\phi_d}(z_d|x, s)}{p_{\theta_d}(z_d|s)}$
@@ -30,28 +28,28 @@ In addition, we construct the overall loss by adding an auxiliary classsifier, b
## Hyperparameters loss function
-For fitting the model, we need to specify the 4 $\beta$-weights related to the the different terms of the ELBO ( $\beta_x$ , $\beta_y$, $\beta_d$, $\beta_t$) as well as $\gamma_y$.
+For fitting the model, we need to specify the 4 $\beta$-weights related to the the different terms of the ELBO ( $\beta_x$ , $\beta_y$, $\beta_d$, $\beta_t$) as well as $\gamma_y$.
## Model hyperparameters
-In addition to these hyperparameters, the following model parameters can be specified:
+In addition to these hyperparameters, the following model parameters can be specified:
- `zd_dim`: size of latent space for domain-specific information
- `zx_dim`: size of latent space for residual variance
- `zy_dim`: size of latent space for class-specific information
- `topic_dim`: size of dirichlet distribution for topics $s$
-The user need to specify at least two neural networks for the **encoder** part via
+The user need to specify at least two neural networks for the **encoder** part via
-- `npath_encoder_x2topic_h`: the python file path of a neural network that maps the image (or other
+- `npath_encoder_x2topic_h`: the python file path of a neural network that maps the image (or other
modal of data to a one dimensional (`topic_dim`) hidden representation serving as input to Dirichlet encoder: `X->h_t(X)->alpha(h_t(X))` where `alpha` is the neural network to map a 1-d hidden layer to dirichlet concentration parameter.
-- `npath_encoder_sandwich_x2h4zd`: the python file path of a neural network that maps the
+
+- `npath_encoder_sandwich_x2h4zd`: the python file path of a neural network that maps the
image to a hidden representation (same size as `topic_dim`), which will be used to infere the posterior distribution of `z_d`: `topic(X), X -> [topic(X), h_d(X)] -> zd_mean, zd_scale`
Alternatively, one could use an existing neural network in DomainLab using `nname` instead of `npath`:
- `nname_encoder_x2topic_h`
- `nname_encoder_sandwich_x2h4zd`
-
## Hyperparameter for warmup
Finally, the number of epochs for hyper-parameter warm-up can be specified via the argument `warmup`.
diff --git a/docs/docJiGen.md b/docs/docJiGen.md
index cd17ac140..8830842ee 100644
--- a/docs/docJiGen.md
+++ b/docs/docJiGen.md
@@ -1,23 +1,23 @@
# Model JiGen
The JiGen method extends the understanding of the concept of spatial correlation in the
-neural network by training the network not only on a classification task, but also on solving jigsaw puzzles.
+neural network by training the network not only on a classification task, but also on solving jigsaw puzzles.
-To create a jigsaw puzzle, an image is split into $n \times n$ patches, which are then permuted.
-The goal is training the model to predict the correct permutation, which results in the permuted image.
+To create a jigsaw puzzle, an image is split into $n \times n$ patches, which are then permuted.
+The goal is training the model to predict the correct permutation, which results in the permuted image.
To solve the classification problem and the jigsaw puzzle in parallel, the permuted and
the original images are first fed into a convolutional network for feature extraction and then given
to two classifiers, one being the image classifier and the other the jigsaw classifier.
-For the training of both classification networks, a cross-entropy loss is used. The total loss is then
+For the training of both classification networks, a cross-entropy loss is used. The total loss is then
given by the loss of the image classification task plus the loss of the jigsaw task, whereby the
jigsaw loss is weighted by a hyperparameter.
Another hyperparameter denotes the probability of shuffling the patches of one instance from the training
data set, i.e. the relative ratio.
The advantage of this method is that it does not require domain labels, as the jigsaw puzzle can be
-solved despite missing domain labels.
+solved despite missing domain labels.
### Model parameters
The following hyperparameters can be specified:
@@ -29,4 +29,3 @@ Furthermore, the user can specify a custom grid length via `grid_len`.
_Reference_: Carlucci, Fabio M., et al. "Domain generalization by solving jigsaw puzzles."
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.
-
diff --git a/docs/docMatchDG.md b/docs/docMatchDG.md
index 8fb0495e7..96eea99f1 100644
--- a/docs/docMatchDG.md
+++ b/docs/docMatchDG.md
@@ -9,28 +9,28 @@ The authors of the paper motivate their approach by looking at the data-generati
-
+
Figure 1: Structural causal model for the data-generating process. Observed variables are shaded; dashed arrows denote correlated nodes. Object may not be observed. (Image source: Figure 2 of Domain Generalization using Causal Matching https://arxiv.org/pdf/2006.07500.pdf)
## Network
-Before defining the network, one needs to define three sets:
-- $\mathcal{X}$: image space with $x \in \mathcal{X}$
+Before defining the network, one needs to define three sets:
+- $\mathcal{X}$: image space with $x \in \mathcal{X}$
- $\mathcal{C}$: causal feature space with $x_C \in \mathcal{C}$
-- $\mathcal{Y}$: label space with $y \in \mathcal{Y}$
+- $\mathcal{Y}$: label space with $y \in \mathcal{Y}$
-For the classification the goal is to classify an object only based on its causal features $x_C$, hence we define a network $h: \mathcal{C} \rightarrow \mathcal{Y}$. Since $x_C$ for an image $x$ is unknown, one needs to learn a representation function $\phi: \mathcal{X} \rightarrow \mathcal{C}$. By assumption for two images $x_j^{(d)}$ and $x_k^{(d')}$ of the same class, but from different domains $\text{ dist}\left(\phi(x_j^{(d)}), \phi(x_k^{(d')})\right)$ is small to enforce that the features in $\phi(x) \in \mathcal{C}$ are affected by the associated object and not the domain. This motivates the definition of a match function $\Omega: \mathcal{X} \times \mathcal{X} \rightarrow \{0, 1\}$,
+For the classification the goal is to classify an object only based on its causal features $x_C$, hence we define a network $h: \mathcal{C} \rightarrow \mathcal{Y}$. Since $x_C$ for an image $x$ is unknown, one needs to learn a representation function $\phi: \mathcal{X} \rightarrow \mathcal{C}$. By assumption for two images $x_j^{(d)}$ and $x_k^{(d')}$ of the same class, but from different domains $\text{ dist}\left(\phi(x_j^{(d)}), \phi(x_k^{(d')})\right)$ is small to enforce that the features in $\phi(x) \in \mathcal{C}$ are affected by the associated object and not the domain. This motivates the definition of a match function $\Omega: \mathcal{X} \times \mathcal{X} \rightarrow \{0, 1\}$,
$$
\Omega(x_j, x_k) = \begin{cases}
1 \quad & \text{$x_j$ and $x_k$ correspond to the same object} \\
0 & \text{otherwise}
-\end{cases}
+\end{cases}
$$
-by using
+by using
$$
\sum_{\substack{\Omega(x_j, x_k) = 1,\\ d \neq d'}} \text{dist}\left(\phi(x_j^{(d)}), \phi(x_k^{(d')})\right) = 0.
@@ -38,7 +38,7 @@ $$
Together the networks form the desired classifier $f = h \circ \phi : \mathcal{X} \rightarrow \mathcal{Y}$.
-
+
## Training
**Initialisation:** first of all match pairs of same-class data points from different domains are constructed. Given a data point, another data point with the same label from a different domain is selected randomly. The matching across domains is done relative to a base domain, which is chosen as the domain with the highest number of samples for that class. This leads to a matched data matrix $\mathcal{M}$ of size $(N', K)$ with $N'$ sum of the size of base domains over all classes and $K$ number ob domains.
@@ -60,7 +60,7 @@ $$
\underset{h, \phi}{\text{arg min}} ~ \sum_{d \in D} \sum_{i=1}^{n_d} ~ l\left(h(\phi(x_i^{(d)})), y_i^{(d)}\right) + \gamma_{\text{reg}} \sum_{\substack{\Omega(x_j, x_k) = 1,\\ d \neq d'}} \text{dist}\left(\phi(x_j^{(d)}), \phi(x_k^{(d')})\right).
$$
-The training of $h$ and $\phi$ is performed from scratch. The trained network $\phi^*$ from phase 1 is only used to update the matched data matrix using yielding $\Omega$.
+The training of $h$ and $\phi$ is performed from scratch. The trained network $\phi^*$ from phase 1 is only used to update the matched data matrix using yielding $\Omega$.
---
diff --git a/docs/doc_MNIST_classification.md b/docs/doc_MNIST_classification.md
index 7cf9d65b2..a979b66f3 100644
--- a/docs/doc_MNIST_classification.md
+++ b/docs/doc_MNIST_classification.md
@@ -1,10 +1,10 @@
# colored MNIST classification
-We include in the DomainLab package colored verion of MNIST where the color corresponds to the domain and digit corresponds to the semantic concept that we want to classify.
+We include in the DomainLab package colored verion of MNIST where the color corresponds to the domain and digit corresponds to the semantic concept that we want to classify.
## colored MNIST dataset
-We provide 10 different colored version of the MNIST dataset with numbers 0 to 9 as 10 different domains. The digit and background are colored differently, thus a domain correspond to a 2-color combination.
-An extraction of digit 0 to 9 from domain 0 is shown in Figure 1.
+We provide 10 different colored version of the MNIST dataset with numbers 0 to 9 as 10 different domains. The digit and background are colored differently, thus a domain correspond to a 2-color combination.
+An extraction of digit 0 to 9 from domain 0 is shown in Figure 1.
digits 0 - 9:
@@ -21,7 +21,7 @@ digits 0 - 9: >
+ <>
# ... you may like to add more shared samples here like:
# gamma_y, gamma_d, zy_dim, zd_dim
@@ -109,13 +109,13 @@ Shared params:
Task_Diva_Dial:
# set the method to be used, if model is skipped the Task will not be executed
- model: diva
+ model: diva
# select a trainer to be used, if trainer is skipped adam is used
# options: "dial" or "mldg"
trainer: dial
-
- # Here we can also set task specific hyperparameters
+
+ # Here we can also set task specific hyperparameters
# which shall be fixed among all experiments.
# f not set, the default values will be used.
zd_dim: 32
@@ -133,11 +133,11 @@ Task_Diva_Dial:
# define task specific hyperparameter sampling
hyperparameters:
<>
-
- # add constraints for your sampled hyperparameters,
+
+ # add constraints for your sampled hyperparameters,
# by using theire name in a python expression.
- # You can use all hyperparameters defined in the hyperparameter section of
- # the current task and the shared hyperparameters specified in the shared
+ # You can use all hyperparameters defined in the hyperparameter section of
+ # the current task and the shared hyperparameters specified in the shared
# section of the current task
constraints:
- 'zx_dim <= zy_dim'
@@ -161,14 +161,14 @@ For filling in the sampling description for the into the `Shared params` and the
1. uniform samples in the interval [min, max]
```yaml
tau: # name of the hyperparameter
- min: 0.01
+ min: 0.01
max: 1
distribution: uniform # name of the distribution
##### for grid search #####
num: 3 # number of grid points created for this hyperparameter
```
-2. loguniform samples in the interval [min, max]. This is usefull if the interval spans over multiple magnitudes.
+2. loguniform samples in the interval [min, max]. This is usefull if the interval spans over multiple magnitudes.
```yaml
gamma_y: # name of the hyperparameter
min: 1e4
@@ -182,14 +182,14 @@ gamma_y: # name of the hyperparameter
1. normal samples with mean and standard deviation
```yaml
pperm: # name of the hyperparameter
- mean: 0.5
+ mean: 0.5
std: 0.2
distribution: normal # name of the distribution
##### for grid search #####
num: 3 # number of grid points created for this hyperparameter
```
-2. lognormal samples with mean and standard deviation. This is usefull if the interval spans over multiple magnitudes.
+2. lognormal samples with mean and standard deviation. This is usefull if the interval spans over multiple magnitudes.
```yaml
gamma_y: # name of the hyperparameter
mean: 1e5
@@ -205,7 +205,7 @@ choose the values of the hyperparameter from a predefined list. If one uses grid
```yaml
nperm: # name of the hyperparameter
distribution: categorical # name of the distribution
- datatype: int
+ datatype: int
values: # concrete values to choose from
- 30
- 31
@@ -250,15 +250,15 @@ it is possible to have all sorts of combinations:
1. a task which includes shared and task specific sampled hyperparameters
```yaml
Task_Name:
- model: ...
+ model: ...
...
- # specify sections from the Shared params section
+ # specify sections from the Shared params section
shared:
- ...
# specify task specific hyperparameter sampling
hyperparameters:
- ...
+ ...
# add the constraints to the hperparameters section
constraints:
- '...' # constraints using params from the hyperparameters and the shared section
@@ -267,7 +267,7 @@ Task_Name:
2. Only task specific sampled hyperparameters
```yaml
Task_Name:
- model: ...
+ model: ...
...
# specify task specific hyperparameter sampling
@@ -281,10 +281,10 @@ Task_Name:
3. Only shared sampled hyperparamters
```yaml
Task_Name:
- model: ...
+ model: ...
...
- # specify sections from the Shared params section
+ # specify sections from the Shared params section
shared:
- ...
# add the constraints as a standalone section to the task
@@ -295,6 +295,6 @@ Task_Name:
4. No hyperparameter sampling. All Hyperparameters are either fixed to a user defined value or to the default value. No hyperparameter samples indicates no constraints.
```yaml
Task_Name:
- model: ...
+ model: ...
...
```
diff --git a/docs/doc_diva.md b/docs/doc_diva.md
index b02d0dc97..b083e7fca 100644
--- a/docs/doc_diva.md
+++ b/docs/doc_diva.md
@@ -2,21 +2,21 @@
## Domain Invariant Variational Autoencoders
DIVA addresses the domain generalization problem with a variational autoencoder
-with three latent variables, using three independent encoders.
+with three latent variables, using three independent encoders.
By encouraging the network to store each the domain,
class and residual features in one of the latent spaces, the class-specific information
-is disentangled.
+is disentangled.
In order to obtain marginally independent latent variables, the densities of the domain
and class latent spaces are conditioned on the domain and the class, respectively. These densities are then
parameterized by learnable parameters. During training, all three latent variables are fed into a single decoder
-reconstructing the input image.
+reconstructing the input image.
Additionally, two classifiers are trained, predicting the domain and class label
from the respective latent variable.
This leads to an overall large network. However, during inference only the class encoder and classifier
-are used.
+are used.
DIVA can improve the classification accuracy also in a semi-supervised setting, where class labels
are missing for some data or domains. This is an advantage, as prediction
@@ -30,7 +30,7 @@ decreased performance.
### Model parameters
The following hyperparameters can be specified:
-- `zd_dim`: size of latent space for domain-specific information
+- `zd_dim`: size of latent space for domain-specific information
- `zx_dim`: size of latent space for residual variance
- `zy_dim`: size of latent space for class-specific information
- `gamma_y`: multiplier for y classifier ($\alpha_y$ of eq. (2) in paper below)
diff --git a/docs/doc_examples.md b/docs/doc_examples.md
index da5812c4a..21d0b2eb2 100755
--- a/docs/doc_examples.md
+++ b/docs/doc_examples.md
@@ -171,7 +171,7 @@ python main_out.py --te_d 0 1 2 --tr_d 3 7 --task=mnistcolor10 --model=diva --nn
### Set hyper-parameters for trainer as well
```shell
python main_out.py --te_d 0 1 2 --tr_d 3 7 --task=mnistcolor10 --model=diva --nname=conv_bn_pool_2 --nname_dom=conv_bn_pool_2 --gamma_y=7e5 --gamma_d=1e5 --trainer=dial --dial_steps_perturb=1
-```
+```
## Meta Learning Domain Generalization
```shell
diff --git a/docs/doc_install.md b/docs/doc_install.md
index e8c077d68..c819f6e50 100644
--- a/docs/doc_install.md
+++ b/docs/doc_install.md
@@ -4,7 +4,7 @@
`conda create --name domainlab_py39 python=3.9`
-then
+then
`conda activate domainlab_py39`
@@ -15,13 +15,13 @@ Suppose you have cloned the repository and have changed directory to the cloned
```norun
pip install -r requirements.txt
```
-then
+then
`python setup.py install`
#### Dependencies management
- [python-poetry](https://python-poetry.org/) and use the configuration file `pyproject.toml` in this repository.
-
+
### Install Release
-It is strongly recommended to create a virtual environment first, then
+It is strongly recommended to create a virtual environment first, then
- Install via `pip install domainlab`
diff --git a/docs/doc_output.md b/docs/doc_output.md
index 61c67e17a..345bb79f0 100644
--- a/docs/doc_output.md
+++ b/docs/doc_output.md
@@ -4,16 +4,16 @@ By default, this package generates outputs into a folder `zoutput` relative to t
The output structure is something similar to below. ([] means the folder might or might not exist, texts inside () are comments)
-```
+```text
zoutput/
├── aggrsts (aggregation of results)
│ ├── task1_test_domain1_tagName.csv
│ ├── task2_test_domain3_tagName.csv
-│
-│
+│
+│
├── [gen] (counterfactual image generation, only exist for generative models with "--gen" specified)
│ ├── [task1_test_domain1]
-│
+│
└── saved_models (persisted pytorch model)
├── task1_algo1_git-commit-hashtag1_seed_1_instance_wise_predictions.txt (instance wise prediction of the model)
├── [task1_algo1_git-commit-hashtag1_seed_1.model] (only exist if with command line argument "--keep_model")
diff --git a/docs/doc_tasks.md b/docs/doc_tasks.md
index 2eec7e460..3d3f0d73d 100644
--- a/docs/doc_tasks.md
+++ b/docs/doc_tasks.md
@@ -2,10 +2,10 @@
The package offers various ways to specify a domain generalization task (where to find the data, which domain to use as training, which to test) according to user's need.
-For all thress ways covered below, the user has to prepare a python file to feed via argument `--tpath` (means task path) into DomainLab. We provide example python files in our repository [see all examples here for specifying domain generalization task](https://github.com/marrlab/DomainLab/tree/master/examples/tasks) so that the user could follow the example to create their own domain generalization task specification. We provide inline comment to explain what each line is doing, as well as below in this documentation.
+For all thress ways covered below, the user has to prepare a python file to feed via argument `--tpath` (means task path) into DomainLab. We provide example python files in our repository [see all examples here for specifying domain generalization task](https://github.com/marrlab/DomainLab/tree/master/examples/tasks) so that the user could follow the example to create their own domain generalization task specification. We provide inline comment to explain what each line is doing, as well as below in this documentation.
## Possibility 1: Specify train and test domain dataset directly
-The most straightforward way to specify a domain generalization task is, if you have already a [PyTorch Dataset](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html) class for each domain: you could define a dictionary with the key being name for domain, and the value being the PyTorch Dataset you created corresponding to that domain (train and validation or only training)
+The most straightforward way to specify a domain generalization task is, if you have already a [PyTorch Dataset](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html) class for each domain: you could define a dictionary with the key being name for domain, and the value being the PyTorch Dataset you created corresponding to that domain (train and validation or only training)
[See an example python file here](https://github.com/marrlab/DomainLab/blob/master/examples/tasks/task_dset_custom.py)
To train a ERM (Emperical Risk Minimization) network on this task:
@@ -28,24 +28,24 @@ python main_out.py --te_d=sketch --tpath=examples/tasks/demo_task_path_list_smal
In this mode, we assume there are structured folders where each folder contains all data from one domain, and each domain folder contains subfolders corresponding to different classes. See examples below.
### Data organization
-To give an example, suppose we have a classification task to classify between car, dog, human, chair and bird and there are 3 data sources (domains) with folder name "folder_a", "folder_b" and "folder_c" respectively as shown below.
+To give an example, suppose we have a classification task to classify between car, dog, human, chair and bird and there are 3 data sources (domains) with folder name "folder_a", "folder_b" and "folder_c" respectively as shown below.
In each folder, the images are organized in sub-folders by their class. For example, "/path/to/3rd_domain/folder_c/dog" folder contains all the images of class "dog" from the 3rd domain.
-It might be the case that across the different data sources the same class is named differently. For example, in the 1st data source, the class dog is stored in sub-folder named
+It might be the case that across the different data sources the same class is named differently. For example, in the 1st data source, the class dog is stored in sub-folder named
"hund", in the 2nd data source, the dog is stored in sub-folder named "husky" and in the 3rd data source, the dog is stored in sub-folder named "dog".
It might also be the case that some classes exist in one data source but does not exist in another data source. For example, folder "/path/to/2nd_domain/folder_b" does not have a sub-folder for class "human".
Folder structure of the 1st domain:
-```
+```text
── /path/to/1st_domain/folder_a
├── auto
├── hund
├── mensch
├── stuhl
└── vogel
-
+
```
Folder structure of the 2nd domain:
@@ -56,7 +56,7 @@ Folder structure of the 2nd domain:
├── sit
└── husky
```
-Folder structure of the 3rd domain:
+Folder structure of the 3rd domain:
```
── /path/to/3rd_domain/folder_c
@@ -146,7 +146,7 @@ of domain information so only a unique transform (composition) is allowed.
isize: domainlab.tasks.ImSize(image channel, image height, image width)
-dict_domain2imgroot: a python dictionary with keys as user specified domain names and values
+dict_domain2imgroot: a python dictionary with keys as user specified domain names and values
as the absolute path to each domain's data.
taskna: user defined task name
diff --git a/docs/doc_usage_cmd.md b/docs/doc_usage_cmd.md
index 40250d7b5..b6320328b 100644
--- a/docs/doc_usage_cmd.md
+++ b/docs/doc_usage_cmd.md
@@ -6,7 +6,7 @@ Suppose you have cloned the repository and have the dependencies ready, change d
To train a domain invariant model on the vlcs_mini task
```shell
-python main_out.py --te_d=caltech --tpath=examples/tasks/task_vlcs.py --config=examples/yaml/demo_config_single_run_diva.yaml
+python main_out.py --te_d=caltech --tpath=examples/tasks/task_vlcs.py --config=examples/yaml/demo_config_single_run_diva.yaml
```
where `--tpath` specifies the path of a user specified python file which defines the domain generalization task [see here](../examples/tasks/task_vlcs.py), `--te_d` specifies the test domain name (or index starting from 0), `--config` specifies the configurations of the domain generalization algorithms, [see here](../examples/yaml/demo_config_single_run_diva.yaml)
diff --git a/docs/figs/tikz_hduva.svg b/docs/figs/tikz_hduva.svg
index 159e7eb91..e1149b73c 100644
--- a/docs/figs/tikz_hduva.svg
+++ b/docs/figs/tikz_hduva.svg
@@ -531,4 +531,4 @@
transform="translate(-110.686)"
id="g514" />
\ No newline at end of file
+ id="g516" />
diff --git a/docs/index.html b/docs/index.html
index 3af30f6c1..fd99d5a8c 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -19,7 +19,7 @@
-
+
@@ -35,6 +35,6 @@
font-family: "Roboto Mono", "Courier New", Courier, monospace
}
- Welcome to domainlab’s documentation! — libdg documentation
+ Welcome to domainlab’s documentation! — libdg documentation