The metadata states what the network produces as its output but if the postprocessing transform sequence does a lot to change the shape and format of this output there will be a disconnect between what is expected and the actual results. One example is a bundle which produces a certain number of channels that don't directly relate to classes which is then transformed into an output with a different number of layers representing one-hot encoded multiclass labels