Conversation
|
I would rather the definition be better defined as either the approximate number of entries or the approximate disk space used. |
|
Yeah, we don't usually care about the size of the datastore. We care about the disk usage. I was thinking something like: type PersistentDatastore interface {
DiskUsage() (int64, error)
}Where datastores could optionally implement this. Implementing this interface wouldn't mean that the datastore necessarily persists its data, just that it understands the concept and can report any disk space it thinks it's using. I got stuck with this interface trying to figure out a more generic "stats" interface but I think we can just go with something like the above and deal with other stats later. |
|
ok, but if a persistent datastore is wrapped in any of the wrapper ones, then your "optional" method is gone, and you need to cast (which makes the interface less useful and not very helpful in the case of triages etc). We could just force everyone to report |
It would not hurt allowing to know this information though. In a datastore where all entries are roughly the same size (thinking ipfs chunks here) it can offer a pretty good ballpark of the total size with little effort... |
Datastore wrappers would have to handle this case. We're already using this pattern for, e.g., GC.
If we don't agree on what size means, it is an entirely useless metric. What if I have a mapped datastore that delegates to two datastores, one that treats size as the number of objects and another that treats size as bytes? What about caches? |
What I mean is, exposing the number of elements under a different method too (
ok I will proceed that way then |
|
How is this looking? |
| type PersistentDatastore interface { | ||
| Datastore | ||
|
|
||
| DiskUsage() (uint64, error) |
There was a problem hiding this comment.
Should have a doc explicitly saying what the unit here is (i.e. bytes).
There was a problem hiding this comment.
I would think that would be obvious, but it can't hurt.
There was a problem hiding this comment.
yeah you're totally right, my bad
kevina
left a comment
There was a problem hiding this comment.
If the DiskUsage method is not available an error should be returned. See inlined comments.
| } | ||
| log.Printf("%s: DiskUsage: %d\n", d.Name, du) | ||
| return du, err | ||
| } |
There was a problem hiding this comment.
This function should return an error if the child is not a PersistentDatastore.
I would also restructure it to avoid having to declare du and err up front.
Something like
pd, ok := d.child.(PersistentDatastore)
if !ok {
return 0, errors.New("Unimplemented")
}
du, err := pd.DiskUsage()
if err != nil {
return 0, err
}
log.Printf("%s: DiskUsage: %d\n", d.Name, du)
return du, nilBut use a better error message.
There was a problem hiding this comment.
I see why a batch operation for example, should return an error if the underlying datastore cannot perform it at all. But DiskUsage? This implies that mixing Persistent and non PersistentDatastores in a wrapper and asking for DiskUsage should probably result in error too (as returning error here would mean doing it everywhere for behavior consistency).
If the datastore is not persistent, it seems natural to say (from a wrapper point of view) that the DiskUsage is 0, while keeping the error result to actually signal for real errors when obtaining real disk usage. In the current approach wrappers never produce errors on DiskUsage(), they just bubble them.
There was a problem hiding this comment.
If the underlying datastore uses disk storage but doesn't implement this method, then a result of 0 without an error would be misleading.
Maybe we should just require this method be implemented, or perhaps a more generic Stat() method.
@Stebalien thoughts?
There was a problem hiding this comment.
If a datastore uses the disk, it should report it. If it doesn't, that's a bug in the datastore. However, I don't want to force every datastore to implement this method (it's not really necessary).
I'd like to go with a more generic stat but I couldn't come up with a concrete, nice solution that'll please everyone so, unless you can think of a way to make that work (that's both efficient and idiomatic) we might as well go with this.
There was a problem hiding this comment.
Okay, I am unconformable with the idea of unimplementable this method means it doesn't use any disk space, but I do agree that will be easier.
I take another look but I think this LGTM then.
callback/callback.go
Outdated
| if pd, ok := c.D.(ds.PersistentDatastore); ok { | ||
| return pd.DiskUsage() | ||
| } | ||
|
|
There was a problem hiding this comment.
See comments from previous review.
I would also refactor the logic info a helper function.
(Now GitHub is annoying me, This is the third time I wrote this comment.!)
There was a problem hiding this comment.
Would you create a separate util module just for this helper?
There was a problem hiding this comment.
I would just put in in the top-level package and give it would look something like
func DiskUsage(d *Datastore) (uint64, error) {
pd, ok := d.(PersistentDatastore)
if !ok {
return 0, errors.New("Unimplemented")
}
du, err := pd.DiskUsage()
if err != nil {
return 0, err
}
return du, nil
}
And then in the wrapper methods just use:
return DiskUsage(c.D)
There was a problem hiding this comment.
I agree with having a helper function to avoid casting everywhere. I disagree with not implementing this interface an error.
There was a problem hiding this comment.
(Note this will likely be overkill, if we decide not to return an error.)
coalesce/coalesce.go
Outdated
| if pd, ok := d.child.(ds.PersistentDatastore); ok { | ||
| return pd.DiskUsage() | ||
| } | ||
| return 0, nil |
This adds a PersistentDatastore interface which allows datastores to report DiskUsage(). It implementes the interface on all wrapper types, which return 0 if the wrapped datastore does not provide this method. Related: ipfs/kubo#4550
|
Thanks for your feedback and patience everyone! Ready for another round... |
kevina
left a comment
There was a problem hiding this comment.
One minor change. Otherwise LGTM.
We need to make sure that this new method is implemented on all Datastores before this method is used. (I would still have preferred an error be returned if it wasn't implemented.)
basic_ds.go
Outdated
| // DiskUsage implements the PersistentDatastore interface. | ||
| func (d *LogDatastore) DiskUsage() (uint64, error) { | ||
| du, err := DiskUsage(d.child) | ||
| log.Printf("%s: DiskUsage: %d\n", d.Name, du) |
There was a problem hiding this comment.
This should probably not report anything if an error was returned.
There was a problem hiding this comment.
The rest of methods there log the calls regardless of results. I can log the call and not the result though (just did that)
Consistent with the rest of calls.
|
Will someone be so kind to merge? |
|
@hsanjuan I believe that would be @whyrusleeping or @Stebalien job. |
|
@hsanjuan are you going to put together the implementations? |
|
@Stebalien yeah that's my intention |
The each implementation of a datastore should decide on their size
is measured and how accurate the result is.
i.e. Some datastores may consider Size as the number of entries,
others as the disk space they use etc. Some implementations may
opt to improve the performance of the operation by caching or
by reducing the accuracy of the calculation.
Related: ipfs/kubo#4550