Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,4 +41,4 @@ jobs:
- name: test if examples in markdown works
run: bash -x -v ci_run_examples.sh
- name: test if benchmark works
run: pip install snakemake && sed -i '1s/^/#!\/bin\/bash -x -v\n/' run_benchmark_standalone.sh && bash -x -v run_benchmark_standalone.sh examples/benchmark/demo_shared_hyper_grid.yaml && cat zoutput/benchmarks/mnist_benchmark_grid/hyperparameters.csv && cat zoutput/benchmarks/mnist_benchmark_grid/results.csv
run: pip install snakemake==7.32.0 && pip install pulp==2.7.0 && sed -i '1s/^/#!\/bin\/bash -x -v\n/' run_benchmark_standalone.sh && bash -x -v run_benchmark_standalone.sh examples/benchmark/demo_shared_hyper_grid.yaml && cat zoutput/benchmarks/mnist_benchmark_grid/hyperparameters.csv && cat zoutput/benchmarks/mnist_benchmark_grid/results.csv
19 changes: 19 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0 # Use the specific version of the repo
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- repo: https://github.com/pycqa/flake8
rev: 7.0.0
hooks:
- id: flake8
- repo: https://github.com/PyCQA/isort
rev: 5.13.2
hooks:
- id: isort
- repo: https://github.com/psf/black
rev: 23.12.1
hooks:
- id: black
20 changes: 10 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,18 +8,18 @@

## Distribution shifts, domain generalization and DomainLab

Neural networks trained using data from a specific distribution (domain) usually fails to generalize to novel distributions (domains). Domain generalization aims at learning domain invariant features by utilizing data from multiple domains (data sites, corhorts, batches, vendors) so the learned feature can generalize to new unseen domains (distributions).
Neural networks trained using data from a specific distribution (domain) usually fails to generalize to novel distributions (domains). Domain generalization aims at learning domain invariant features by utilizing data from multiple domains (data sites, corhorts, batches, vendors) so the learned feature can generalize to new unseen domains (distributions).

<div style="align: center; text-align:center;">
<img src="https://github.com/marrlab/DomainLab/blob/master/docs/figs/invarfeat4dg.png?raw=true" style="width:400px;"/>
<img src="https://github.com/marrlab/DomainLab/blob/master/docs/figs/invarfeat4dg.png?raw=true" style="width:400px;"/>
</div>

DomainLab is a software platform with state-of-the-art domain generalization algorithms implemented, designed by maximal decoupling of different software components thus enhances maximal code reuse.

DomainLab decouples the following concepts or objects:
- task $M$: a combination of datasets (e.g. from distribution $D_1$ and $D_2$)
- neural network: a map $\phi$ from the input data to the feature space and a map $\varphi$ from feature space to output $\hat{y}$ (e.g. decision variable).
- model: structural risk in the form of $\ell() + \mu R()$ where
- model: structural risk in the form of $\ell() + \mu R()$ where
- $\ell(Y, \hat{y}=\varphi(\phi(X)))$ is the task specific empirical loss (e.g. cross entropy for classification task).
- $R(\phi(X))$ is the penalty loss to boost domain invariant feature extraction using $\phi$.
- $\mu$ is the corresponding multiplier to each penalty function factor.
Expand All @@ -35,21 +35,21 @@ DomainLab makes it possible to combine models with models, trainers with models,
### Installation
For development version in Github, see [Installation and Dependencies handling](./docs/doc_install.md)

We also offer a PyPI version here https://pypi.org/project/domainlab/ which one could install via `pip install domainlab` and it is recommended to create a virtual environment for it.
We also offer a PyPI version here https://pypi.org/project/domainlab/ which one could install via `pip install domainlab` and it is recommended to create a virtual environment for it.

### Task specification
In DomainLab, a task is a container for datasets from different domains. See detail in
[Task Specification](./docs/doc_tasks.md)

### Example and usage

#### Either clone this repo and use command line
#### Either clone this repo and use command line

`python main_out.py -c ./examples/conf/vlcs_diva_mldg_dial.yaml`
where the configuration file below can be downloaded [here](https://raw.githubusercontent.com/marrlab/DomainLab/master/examples/conf/vlcs_diva_mldg_dial.yaml)
```
te_d: caltech # domain name of test domain
tpath: examples/tasks/task_vlcs.py # python file path to specify the task
tpath: examples/tasks/task_vlcs.py # python file path to specify the task
bs: 2 # batch size
model: dann_diva # combine model DANN with DIVA
epos: 1 # number of epochs
Expand All @@ -67,16 +67,16 @@ See example here: [Transformer as feature extractor, decorate JIGEN with DANN, t


### Benchmark different methods
DomainLab provides a powerful benchmark functionality.
DomainLab provides a powerful benchmark functionality.
To benchmark several algorithms(combination of neural networks, models, trainers and associated hyperparameters), a single line command along with a benchmark configuration files is sufficient. See details in [benchmarks documentation and tutorial](./docs/doc_benchmark.md)

One could simply run
`bash run_benchmark_slurm.sh your_benchmark_configuration.yaml` to launch different experiments with specified configuraiton.
One could simply run
`bash run_benchmark_slurm.sh your_benchmark_configuration.yaml` to launch different experiments with specified configuraiton.


For example, the following result (without any augmentation like flip) is for PACS dataset.

<div style="align: center; text-align:center;">
<img src="https://github.com/marrlab/DomainLab/blob/master/docs/figs/stochastic_variation_two_rows.png?raw=true" style="width:800px;"/>
<img src="https://github.com/marrlab/DomainLab/blob/master/docs/figs/stochastic_variation_two_rows.png?raw=true" style="width:800px;"/>
</div>
where each rectangle represent one model trainer combination, each bar inside the rectangle represent a unique hyperparameter index associated with that method combination, each dot represent a random seeds.
2 changes: 0 additions & 2 deletions ci.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,3 @@ endtime=`date +%s`
runtime=$((endtime-starttime))
echo "total time used:"
echo "$runtime"


2 changes: 1 addition & 1 deletion ci_pytest_cov.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/bin/bash
export CUDA_VISIBLE_DEVICES=""
export CUDA_VISIBLE_DEVICES=""
# although garbage collector has been explicitly called, sometimes there is still CUDA out of memory error
# so it is better not to use GPU to do the pytest to ensure every time there is no CUDA out of memory error occuring
# --cov-report term-missing to show in console file wise coverage and lines missing
Expand Down
4 changes: 2 additions & 2 deletions ci_run_examples.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ set -e # exit upon first error
# echo "#!/bin/bash -x -v" > sh_temp_example.sh
sed -n '/```shell/,/```/ p' docs/doc_examples.md | sed '/^```/ d' >> ./sh_temp_example.sh
split -l 5 sh_temp_example.sh sh_example_split
for file in sh_example_split*;
do (echo "#!/bin/bash -x -v" > "$file"_exe && cat "$file" >> "$file"_exe && bash -x -v "$file"_exe && rm -r zoutput);
for file in sh_example_split*;
do (echo "#!/bin/bash -x -v" > "$file"_exe && cat "$file" >> "$file"_exe && bash -x -v "$file"_exe && rm -r zoutput);
done
# bash -x -v -e sh_temp_example.sh
echo "general examples done"
Expand Down
2 changes: 1 addition & 1 deletion data/mixed_codec/caltech/auto/text.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
Hello World
Hello World
31 changes: 18 additions & 13 deletions data/script/download_pacs.py
Original file line number Diff line number Diff line change
@@ -1,26 +1,29 @@
'this script can be used to download the pacs dataset'
"this script can be used to download the pacs dataset"
import os
import tarfile
from zipfile import ZipFile

import gdown


def stage_path(data_dir, name):
'''
"""
creates the path to data_dir/name
if it does not exist already
'''
"""
full_path = os.path.join(data_dir, name)

if not os.path.exists(full_path):
os.makedirs(full_path)

return full_path


def download_and_extract(url, dst, remove=True):
'''
"""
downloads and extracts the data behind the url
and saves it at dst
'''
"""
gdown.download(url, dst, quiet=False)

if dst.endswith(".tar.gz"):
Expand All @@ -43,17 +46,19 @@ def download_and_extract(url, dst, remove=True):


def download_pacs(data_dir):
'''
"""
download and extract dataset pacs.
Dataset is saved at location data_dir
'''
"""
full_path = stage_path(data_dir, "PACS")

download_and_extract("https://drive.google.com/uc?id=1JFr8f805nMUelQWWmfnJR3y4_SYoN5Pd",
os.path.join(data_dir, "PACS.zip"))
download_and_extract(
"https://drive.google.com/uc?id=1JFr8f805nMUelQWWmfnJR3y4_SYoN5Pd",
os.path.join(data_dir, "PACS.zip"),
)

os.rename(os.path.join(data_dir, "kfold"), full_path)

os.rename(os.path.join(data_dir, "kfold"),
full_path)

if __name__ == '__main__':
download_pacs('../pacs')
if __name__ == "__main__":
download_pacs("../pacs")
4 changes: 2 additions & 2 deletions data/ztest_files/dummy_file.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
'''
"""
I am a dummy file used in tests/test_git_tag.py
to produce a file which is not commited
'''
"""
1 change: 0 additions & 1 deletion docs/.nojekyll
Original file line number Diff line number Diff line change
@@ -1 +0,0 @@

71 changes: 30 additions & 41 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,18 +15,16 @@
# Incase the project was not installed
import os
import sys
import sphinx_material
from datetime import datetime

import sphinx_material

sys.path.insert(0, os.path.abspath(".."))

# -- Project information -----------------------------------------------------

project = "domainlab" # @FIXME
copyright = (
f"2021-{datetime.now().year}, Marr Lab."
""
)
copyright = f"2021-{datetime.now().year}, Marr Lab." ""

author = "Xudong Sun, et.al."

Expand Down Expand Up @@ -94,11 +92,11 @@
# '.md': 'recommonmark.parser.CommonMarkParser',
# }

source_suffix = ['.rst', '.md']
source_suffix = [".rst", ".md"]
source_suffix = {
'.rst': 'restructuredtext',
'.txt': 'markdown',
'.md': 'markdown',
".rst": "restructuredtext",
".txt": "markdown",
".md": "markdown",
}

# The master toctree document.
Expand All @@ -114,11 +112,13 @@
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path .
exclude_patterns = ["setup.py",
"_build",
"Thumbs.db",
".DS_Store",
"**.ipynb_checkpoints"]
exclude_patterns = [
"setup.py",
"_build",
"Thumbs.db",
".DS_Store",
"**.ipynb_checkpoints",
]

# The name of the Pygments (syntax highlighting) style to use.
pygments_style = "default"
Expand All @@ -129,10 +129,9 @@
# -- HTML theme settings ------------------------------------------------
html_short_title = "domainlab" # @FIXME
html_show_sourcelink = False
html_sidebars = {"**": ["logo-text.html",
"globaltoc.html",
"localtoc.html",
"searchbox.html"]}
html_sidebars = {
"**": ["logo-text.html", "globaltoc.html", "localtoc.html", "searchbox.html"]
}

html_theme_path = sphinx_material.html_theme_path()
html_context = sphinx_material.get_html_context()
Expand All @@ -157,40 +156,30 @@
"master_doc": False,
"nav_title": "DomainLab",
"nav_links": [
{
"href": "readme_link",
"internal": True,
"title": "Introduction"},
{
"href": "doc_tasks",
"internal": True,
"title": "Task Specification"},
{"href": "readme_link", "internal": True, "title": "Introduction"},
{"href": "doc_tasks", "internal": True, "title": "Task Specification"},
{
"href": "doc_custom_nn",
"internal": True,
"title": "Specify neural network in commandline"},
"title": "Specify neural network in commandline",
},
{
"href": "doc_MNIST_classification",
"internal": True,
"title": "Examples with MNIST"},
"title": "Examples with MNIST",
},
{
"href": "doc_examples",
"internal": True,
"title": "More commandline examples"},

{
"href": "doc_benchmark",
"internal": True,
"title": "Benchmarks tutorial"},

{
"href": "doc_output",
"internal": True,
"title": "Output Structure"},
"title": "More commandline examples",
},
{"href": "doc_benchmark", "internal": True, "title": "Benchmarks tutorial"},
{"href": "doc_output", "internal": True, "title": "Output Structure"},
{
"href": "doc_extend_contribute",
"internal": True,
"title": "Specify custom model in commandline"},
"title": "Specify custom model in commandline",
},
# {
# "href": "https://squidfunk.github.io/mkdocs-material/",
# "internal": False,
Expand Down Expand Up @@ -251,7 +240,7 @@

# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [(master_doc, "domainlab", "domainlab", [author], 1)] # @FIXME
man_pages = [(master_doc, "domainlab", "domainlab", [author], 1)] # @FIXME


# -- Options for Texinfo output ----------------------------------------------
Expand Down
19 changes: 10 additions & 9 deletions docs/conf0.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,17 +12,18 @@
#
import os
import sys
sys.path.insert(0, os.path.abspath('..'))

sys.path.insert(0, os.path.abspath(".."))
sys.setrecursionlimit(1500)

# -- Project information -----------------------------------------------------

project = 'domainlab'
copyright = '2022, Xudong Sun'
author = 'Xudong Sun'
project = "domainlab"
copyright = "2022, Xudong Sun"
author = "Xudong Sun"

# The full version, including alpha/beta/rc tags
release = '0.0.0'
release = "0.0.0"


# -- General configuration ---------------------------------------------------
Expand All @@ -46,7 +47,7 @@
]

# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
templates_path = ["_templates"]

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
Expand All @@ -59,17 +60,17 @@
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]


# -- Options for HTML output -------------------------------------------------

# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'alabaster'
html_theme = "alabaster"

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
html_static_path = ["_static"]
Loading