Skip to content
benwing edited this page Jan 4, 2015 · 2 revisions

The file that holds data instances as used in lktrain and lkpredict looks as follows:

Iris-setosa | sepal-length:4.7 sepal-width:3.2 petal-length:1.6 petal-width:0.2
Iris-versicolor | sepal-length:5.0 sepal-width:2.3 petal-length:3.3 petal-width:1.0
...
Iris-versicolor | sepal-length:6.0 sepal-width:2.7 petal-length:5.1 petal-width:1.6
...

That is, each line has one data instance, with the correct label followed by a space-separated vertical bar and then the features, consisting of feature name and value, separated by a colon. The colon and value can be omitted, with the value defaulting to 1.0.

An optional importance weight can be specified after the label, e.g.

Iris-setosa 1.5 | sepal-length:4.7 sepal-width:3.2 petal-length:1.6 petal-width:0.2
Iris-versicolor 0.1 | sepal-length:5.0 sepal-width:2.3 petal-length:3.3 petal-width:1.0
...
Iris-versicolor 4 | sepal-length:6.0 sepal-width:2.7 petal-length:5.1 petal-width:1.6
...

This weights the data instance accordingly when training, and for integral weights is similar to duplicating the data instance that many times.

Clone this wiki locally