LibOPFcpp

C++17 implementation of OPF classifier in one header. For a Python binding, check PyOPF.

How to cite

The algorithm is described in:

J. P. Papa, A. X. Falcão, and Celso T. N. Suzuki. Supervised pattern classification based on optimum-path forest. International Journal of Imaging Systems and Technology, 19(2):120-131, 2009.

Bibtex:

@article{papa2009,
 author = {Papa, J. P. and Falc\~{a}o, A. X. and Suzuki, C. T. N.},
 title = {Supervised Pattern Classification Based on Optimum-path Forest},
 journal = {International Journal of Imaging Systems and Technology},
 issue_date = {June 2009},
 volume = {19},
 number = {2},
 month = jun,
 year = {2009},
 issn = {0899-9457},
 pages = {120--131},
 numpages = {12},
 doi = {10.1002/ima.v19:2},
 publisher = {John Wiley \& Sons, Inc.},
 address = {New York, NY, USA},
 keywords = {graph-search algorithms, image analysis, image foresting transform, pattern recognition, supervised learning},
}

Usage

The library was designed to look like scikit-learn. The two main classes are SupervisedOPF and UnsupervisedOPF, and it contains one constructor and two functions:

Supervised

Constructor

SupervisedOPF(bool precomputed=false, distance_function<T> distance=euclidean_distance<T>);

Inputs: precomputed: true if it will receive precomputed distance matrices. False otherwise. distance: distance function, case precomputed == false.

Functions

void fit(const Mat<T> &train_data, const std::vector<int> &labels);

Fits the classifier.

Inputs:

train_data: if precomputed == false, a matrix where the rows are the samples and the columns are the features. If precomputed == true, a distance matrix of size n_samples x n_samples.
labels: an int vector with the ground truths.

std::vector<int> predict(const Mat<T> &test_data);`

Predicts label values for given data.

Inputs:

test_data: if precomputed == false, a matrix where the rows are the samples and the columns are the features. If precomputed == true, a distance matrix of size n_test_samples x n_train_samples.

Returns:

A vector of n_test_samples integers corresponding to predicted classes.

Unsupervised

Constructor

UnsupervisedOPF(int k=5, bool anomaly=false, float thresh=.1, bool precomputed=false, distance_function<T> distance=euclidean_distance<T>);

Inputs:

k: The number of clusters
anomaly: True if it is a problem of anomaly detection, false otherwise
threshold: Threshold value for clustering
precomputed: true if it will receive precomputed distance matrices. False otherwise.
distance: distance function, case precomputed == false.

Functions

void fit(const Mat<T> &train_data);

Fits the classifier.

Inputs:

train_data: if precomputed == false, a matrix where the rows are the samples and the columns are the features. If precomputed == true, a distance matrix of size n_samples x n_samples.

std::vector<int> predict(const Mat<T> &test_data);`

Predicts label values for given data.

Inputs:

test_data: if precomputed == false, a matrix where the rows are the samples and the columns are the features. If precomputed == true, a distance matrix of size n_test_samples x n_train_samples.

Returns:

A vector of n_test_samples integers corresponding to predicted classes.

The Mat class

The class Mat<T> was heavily inspired by OpenCV's Mat, although it is not nearly as powerful. It implements RAII memory management when built with all constructors but Mat(T* data, size_t rows, size_t cols);.

Its constructors include:

Mat();
Mat(Mat<T>& other);
Mat(const Mat<T>& other);
Mat(size_t rows, size_t cols);
Mat(size_t rows, size_t cols, T val);
Mat(std::shared_ptr<T>& data, size_t rows, size_t cols);
// Does not use RAII.
Mat(T* data, size_t rows, size_t cols);

For general data access and manipulation, the following functions:

// access a row's content, similar to OpenCV's m.ptr<T>(i)
T* row(size_t i);

// m.at(i, j), like OpenCV Mat's m.at<T>(i, j)
T& at(size_t i, size_t j);

// m[i][j] element access
T* operator[](size_t i);

// similar to above, but const
const T* operator[](size_t i) const;

// assign operator does not copy matrix contents
Mat<T>& operator=(const Mat<T>& other);

// copies matrix contents
Mat<T> copy();

// releases content (subtracts the reference counter)
void release();

Example usage

The library works in a scikit-learn manner:

opf::SupervisedOPF<float> opf;
opf.fit(train_data, train_labels);
preds = opf.predict(test_data);

The following example uses a dataset contained in data/. Note that most of the code is preamble, declaration, data reading and pre-processing.

#include <iostream>

#include "libopfcpp/OPF.hpp"
#include "libopfcpp/util.hpp" // General utilities

using namespace std;

int main()
{
    opf::Mat<float> data, train_data, test_data;
    vector<int> labels, train_labels, test_labels;

    // Read the data however you prefer
    // There are a few helper functions in util.hpp. This one reads a .dat file
    // into a opf::Mat object.
    opf::read_mat_labels("data/digits.dat", data, labels); // <-- util.hpp

    opf::StratifiedShuffleSplit sss(0.5); // in util.hpp
    pair<vector<int>, vector<int>> splits = sss.split(labels);

    opf::index_by_list<float>(data, splits.first, train_data); // <-- util.hpp
    opf::index_by_list<float>(data, splits.second, test_data);

    opf::index_by_list<int>(labels, splits.first, train_labels);
    opf::index_by_list<int>(labels, splits.second, test_labels);

    // Fit and predict
    opf::SupervisedOPF<float> opf;
    opf.fit(train_data, train_labels);
    vector<int> preds = opf.predict(test_data);

    // And print accuracy
    float acc = opf::accuracy(test_labels, preds); // <-- util.hpp
    cout << "Accuracy: " << acc*100 << "%" << endl;
}

Compile with:

g++ samples/example.cpp -std=c++1y -o example -Iinclude

Or run:

make example

Output varies, since it is a random shuffle:

Accuracy: 98.5491%

OPF in the literature

João Paulo Papa, Silas Evandro Nachif Fernandes, and Alexandre Xavier Falcão. 2017. Optimum-Path Forest based on k-connectivity. Pattern Recognition Letters, 87 (February 2017), 117-126.

Leonardo Marques Rocha, Fábio A. M. Cappabianco, and Alexandre Xavier Falcão. 2009. Data clustering as an optimum-path forest problem with applications in image analysis. Int J. International Journal of Imaging Systems and Technology. 19, 2 (June 2009), 50-68.

João P. Papa, Alexandre X. Falcão, Victor Hugo C. de Albuquerque, João Manuel R.S. Tavares. Efficient supervised optimum-path forest classification for large datasets. Pattern Recognition, Volume 45, Issue 1, (2012), 512-520.

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
data		data
include/libopfcpp		include/libopfcpp
samples		samples
tools		tools
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LibOPFcpp

How to cite

Usage

Supervised

Constructor

Functions

Unsupervised

Constructor

Functions

The Mat class

Example usage

OPF in the literature

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LibOPFcpp

How to cite

Usage

Supervised

Constructor

Functions

Unsupervised

Constructor

Functions

The Mat class

Example usage

OPF in the literature

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages