-
-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Description
The current primary objectstore runs in scaleability issues with very large instances since it stores all objects into a single bucket and not all s3/swift implementations seem to like having multiple million objects in a bucket.
To work around this the plan is to add the ability balance objects between multiple buckets
Balance methods
There are 2 main distinct options to handle the balancing
-
Per user balancing
Either each user gets their own bucket (need to ensure that the used objectstore can handle very large amounts of buckets) or spread users evenly over N buckets (larger but less buckets).
Either way, all user files stay on a single bucket which has the disadvantage that, since users probably don't have the same usage patterns, files are distributed over buckets unevenly -
Per file balancing
Unlike 1. this doesn't keep user files together, instead this spreads all files evenly over N buckets (something like
$bucketId = $fileId % $numberOfBuckets), this makes sure all files are spread evenly no matter what the usage pattern is, even if a single user has tens of millions of files it will still balance.
Personally I favor 2. since it's a simpler solution and I feel that it solves the problem (the storage not handling very large buckets well) better, although it's not without it's downsides
Changing balance methods
A thing that should be taken into account is whether we want to support changing balance methods (like increasing the number of buckets used) on an existing system.
Since moving all objects around according to the updated balancing scheme is not practical we would need to add some way where existing files/users still use the old scheme while new ones are on the updated scheme.
For per-user balancing this can simply be done by storing the bucket id per user. For per-file balancing storing the bucket for each file is probably not practical.
One way to handle per-file balancing is instead storing the balancing scheme user per range of file ids. Since ids are incremental we only need to store that files 1 to 100000 use 10 buckets, and all newer files use 20