Skip to content

Using an ImageClassificationTrainer with a dataset that only contains one class of images #4660

@antoniovs1029

Description

@antoniovs1029

Issue

  • What did you do?
    I tried to train a model that uses the ImageClassificationTrainer, using a dataset that only contains images labeled as 'dog'.

  • What happened?
    I got a System.InvalidOperationException: 'Metadata KeyValues does not exist' while fitting the model, which isn't very informative about the problem. If I didn't know my dataset only contained one label, or if I were to split the dataset in a way that the training set had only one label, then getting that exception wouldn't help to fix the problem.

Also notice that the exception is thrown after training is done, so an user would have already invested time on it before noticing that something is wrong.

  • What did you expect?
    Probably an exception which tells me that there is only one class in my dataset, as done by other multiclass trainers such as the Multiclass LightGBM which throws a "System.InvalidOperationException: 'LightGBM Error, code is -1, error message is 'Number of classes should be specified and greater than 1 for multiclass training'.'", or the LinearMulticlassModelParametersBase which throws a System.ArgumentOutOfRangeException: 'Must be at least 2. Parameter name: numClasses' in here.

If I changed my dataset to include other labels, then the exception is gone and it works as expected.

Source code / logs

dataset.zip

using Microsoft.ML.Data;
using Microsoft.ML.Vision;

namespace Microsoft.ML.Samples
{
    public class ModelInput
    {
        [ColumnName("Label"), LoadColumn(0)]
        public string Label { get; set; }


        [ColumnName("ImageSource"), LoadColumn(1)]
        public string ImageSource { get; set; }
    }

    public static class Program
    {
        // private static string TRAIN_DATA_FILEPATH = @"C:\Users\anvelazq\Desktop\issue19\dogs-cats-horses.tsv"; // This one works because it has multiple classes
        private static string TRAIN_DATA_FILEPATH = @"C:\Users\anvelazq\Desktop\issue19\only-dogs.tsv"; // This one doesn't work because it has only one class

        private static MLContext mlContext = new MLContext(seed: 1);

        public static void Main()
        {
            IDataView trainingDataView = mlContext.Data.LoadFromTextFile<ModelInput>(
                                            path: TRAIN_DATA_FILEPATH,
                                            hasHeader: true,
                                            separatorChar: '\t',
                                            allowQuoting: true,
                                            allowSparse: false);

            var dataProcessPipeline = mlContext.Transforms.Conversion.MapValueToKey("Label", "Label")
                                      .Append(mlContext.Transforms.LoadRawImageBytes("ImageSource_featurized", null, "ImageSource"))
                                      .Append(mlContext.Transforms.CopyColumns("Features", "ImageSource_featurized"));

            var trainer = mlContext.MulticlassClassification.Trainers.ImageClassification(new ImageClassificationTrainer.Options() { LabelColumnName = "Label", FeatureColumnName = "Features" })
                                      .Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel", "PredictedLabel"));
            var trainingPipeline = dataProcessPipeline.Append(trainer);

            var model = trainingPipeline.Fit(trainingDataView); // System.InvalidOperationException: 'Metadata KeyValues does not exist'

            var transformed = model.Transform(trainingDataView);
            var transformedPreview = transformed.Preview();
        }
    }
}

Nugets used:

image

Metadata

Metadata

Assignees

Labels

P0Priority of the issue for triage purpose: IMPORTANT, needs to be fixed right away.bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions