We'd like to support the experience where users can set their desired index configuration before adding data. And do have their index configuration be sensible even as they add or delete rows. Currently, users set a fixed num_partitions value, which is only helpful within a certain range of row counts.
Instead, we should have them set a target_partition_size parameter, which can scale appropriately as they change their dataset size. optimize_indices should automatically handle retraining IVF depending when the num_partitions has drifted far enough from the ideal / requested value.

The default for target_partition_size should be 4096. That works well for datasets with fewer than 10 million rows. After that, we start to get too many partitions. So the calculation needs to be:
num_partitions = min( num_rows / target_partition_size, sqrt(num_rows), max_partitions )
Related issues
We'd like to support the experience where users can set their desired index configuration before adding data. And do have their index configuration be sensible even as they add or delete rows. Currently, users set a fixed
num_partitionsvalue, which is only helpful within a certain range of row counts.Instead, we should have them set a
target_partition_sizeparameter, which can scale appropriately as they change their dataset size.optimize_indicesshould automatically handle retraining IVF depending when the num_partitions has drifted far enough from the ideal / requested value.The default for
target_partition_sizeshould be4096. That works well for datasets with fewer than 10 million rows. After that, we start to get too many partitions. So the calculation needs to be:Related issues