Coreset Selection with Feature Distance and Quality Weighting

This repository provides a PyTorch-based implementation of a greedy coreset selection algorithm. The method incrementally selects samples that maximize diversity in feature space, optionally weighted by a per-sample quality score.

Overview

Given:

A set of feature vectors
(Optionally) a quality score for each sample

The algorithm:

Starts from an initial sample (index 0)
Iteratively selects the next sample that maximizes the objective function

Typical use cases:

Dataset subset selection
Diversity-aware sampling
Quality-aware coreset construction

Usage

from coreset import execute_coreset_selection

selected_files = execute_coreset_selection(
    feature_dict_path="diversity.pt",
    quality_dict_path="nisqa.pt",
    size=4000,
    device=None  # automatically uses CUDA if available
)

Function Arguments

Argument	Type	Description
`feature_dict_path`	`str`	Path to a `.pt` file containing `{key: feature_tensor}`
`quality_dict_path`	`str`	Path to a `.pt` file containing `{key: quality_score}`
`size`	`int`	Number of samples to select
`device`	`torch.device` or `None`	Computation device

Return Value

List[str]: Keys corresponding to the selected samples, in the order they were selected.

Input Format

Feature Dictionary

{
    "file_001": torch.Tensor([...]),
    "file_002": torch.Tensor([...]),
    ...
}

All feature tensors must have the same shape.

Quality Dictionary (`nisqa.pt`)

{
    "file_001": float,
    "file_002": float,
    ...
}

All keys must match those in the feature dictionary.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
coreset.py		coreset.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Coreset Selection with Feature Distance and Quality Weighting

Overview

Usage

Function Arguments

Return Value

Input Format

Feature Dictionary

Quality Dictionary (`nisqa.pt`)

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Coreset Selection with Feature Distance and Quality Weighting

Overview

Usage

Function Arguments

Return Value

Input Format

Feature Dictionary

Quality Dictionary (nisqa.pt)

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Quality Dictionary (`nisqa.pt`)

Packages