Hello, how many languages are the training data? Aand what is the amount of data in various languages look like?