Follow up to #3940
We are trying to make it possible to keep indices even though we don't have enough data to actually train them. In an earlier PR, we are adding a train: bool parameter to let the user choose whether they want to defer training an index. If train=False, then we just create an empty index and it will show num_indexed_rows: 0 in stats. They can later call optimize_indices() to do the actual training.
Vector indices are more complex because of IVF partitions, since we need a minimum amount of data to create them.
Here is the behavior I think we want:
- Can pass
train=False to create an empty index. No IVF or PQ will be trained. Will save metadata as empty array.
- If you call
create_index(..., train=True) or optimize_indices():
- < 256 non-null vectors -> same as
train=False
- 256 <= number of non-null vectors <
num_partitions * 256 => train on smaller number of partitions
- number of non-null vectors >=
num_partitions * 256 -> train full index
Follow up to #3940
We are trying to make it possible to keep indices even though we don't have enough data to actually train them. In an earlier PR, we are adding a
train: boolparameter to let the user choose whether they want to defer training an index. Iftrain=False, then we just create an empty index and it will shownum_indexed_rows: 0in stats. They can later calloptimize_indices()to do the actual training.Vector indices are more complex because of IVF partitions, since we need a minimum amount of data to create them.
Here is the behavior I think we want:
train=Falseto create an empty index. No IVF or PQ will be trained. Will save metadata as empty array.create_index(..., train=True)oroptimize_indices():train=Falsenum_partitions * 256=> train on smaller number of partitionsnum_partitions * 256-> train full index