Requirement

This is the official implementation for the work

Tri Nguyen, Shahana Ibrahim, and Xiao Fu. "Noisy Label Learning with Instance-Dependent Outliers: Identifiability via Crowd Wisdom." The Thirty-eighth Annual Conference on Neural Information Processing Systems,

which was accepted as Splotlight at NeurIPS 2024.

Paper: arXiv
NeurIPS poster and slide page: https://neurips.cc/virtual/2024/poster/95831

Requirement

Setup virtual env with required packages. All runs were conducted using python3.9.

mkdir coinnet
cd coinnet
python -m venv localenv
source localenv/bin/activate
git clone https://github.com/ductri/COINNet src
pip install -r src/requirement.txt

General instruction

Configurations in conf directory are read and parsed by hydra. These configuration includes hyperparameter setting and data setting. You can leave most of the settings intact, except
- If you want to use wandb for logging and exp management: Modify project_name and entity name in function utils.py:create_wandb_wrapper.
- The data location related configs

To overwrite any configurations, you can either modify these config files or use command line arguments. Please refer to hydra documentation for details.

Clone and setup the project structure: Assume you are at coinnet, clone this repo and rename it, and put it to coinnet/src, then create several supporting directories: mkdir coinnet/data/, mkdir coinnet/lightning_saved_models.
All the logging and monitoring are handled by wandb unless you turn it off. Thus you might want to specify project_name and entity_name for wandb to know where to upload the logs. These can be found in function utils.py:create_wandb_wrapper.

Datasets

Download the machine annotations data used to produce Table 1 at: google drive. You should then put them to coinnet/data/
Download the CIFAR-10N dataset at http://www.noisylabels.com/
For ImageNet-15N, we provide 2 pkl files, both can be loaded using pickle:
- imagenet15/clip_feature_M=100.pkl: a dictionary with following keys:
  
  . feature: a matrix of size (2514, 2048), row i-th is the feature vector extracted from CLIP for sample i-th.
  
  . noisy_label: a matrix of size (2514, 100), row i-th is the labels annotated by 100 annotators. Labels are indexed from 0 to 14, label -1 is reserved for missing cases.
  
  . true_label: a vector of size (2514), element i-th is true label for the i-th sample.
  
  . idx_2_classname: mapping from label index to label name, defined by the original ImageNet dataset.
- imagenet15/clip_feature_M=100_test.pkl: similar to the clip_feature_M=100_test.pkl, but should be used for testing.
- In case you want to use your own feature extractor, please refer to imagenet15/imagenet15_M=100.pkl. This pickle file contains a list of 2514 items, each is a dictionary with the original ImageNet file name, true label, and noisy label. You would need to download the ImageNet dataset by yourself.

Running COINNet on CIFAR-10N:

Download noisy labels: wget http://ucsc-real.soe.ucsc.edu:1995/files/cifar-10-100n-main.zip
Unzip and put the file cifar-10-100n-main/data/CIFAR-10_human.pt to ./data/cifar10n/CIFAR-10_human.pt
python my_training data=cifar10n

An example of a run can be found at: wandb-logging

For other datasets, please take a look at conf/data/ for corresponding configs.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
algorithms		algorithms
baselines		baselines
conf		conf
helpers		helpers
helpers_bk		helpers_bk
imagenet15		imagenet15
public		public
scripts		scripts
cluster_acc_metric.py		cluster_acc_metric.py
constants.py		constants.py
dataset_for_shahana.py		dataset_for_shahana.py
feature_extractor_clip.py		feature_extractor_clip.py
imagenet15_preprocessing.py		imagenet15_preprocessing.py
inspect_machine_annotations.py		inspect_machine_annotations.py
inspect_machine_annotator.ipynb		inspect_machine_annotator.ipynb
machine_annotations_generation.py		machine_annotations_generation.py
machine_annotators_training.py		machine_annotators_training.py
machine_annotators_training_fmnist.py		machine_annotators_training_fmnist.py
machine_annotators_training_imagenet.py		machine_annotators_training_imagenet.py
machine_annotators_training_stl10.py		machine_annotators_training_stl10.py
machine_gaussiannb_annotator.py		machine_gaussiannb_annotator.py
machine_kmeans_annotator.py		machine_kmeans_annotator.py
machine_knn_annotator.py		machine_knn_annotator.py
machine_regression_annotator.py		machine_regression_annotator.py
my_dataset.py		my_dataset.py
my_lit_model.py		my_lit_model.py
my_training.py		my_training.py
outlier_detector_acc.py		outlier_detector_acc.py
plot_outliers.ipynb		plot_outliers.ipynb
plot_outliers_2.ipynb		plot_outliers_2.ipynb
readme.md		readme.md
real_dataset_preprocessing_1.py		real_dataset_preprocessing_1.py
requirement.txt		requirement.txt
share_config.py		share_config.py
unified_metrics.py		unified_metrics.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Requirement

General instruction

Datasets

Running COINNet on CIFAR-10N:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

ductri/COINNet

Folders and files

Latest commit

History

Repository files navigation

Requirement

General instruction

Datasets

Running COINNet on CIFAR-10N:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages