Skip to content

Add Size() method to datastores#74

Merged
Stebalien merged 2 commits intomasterfrom
feat/size
Jan 26, 2018
Merged

Add Size() method to datastores#74
Stebalien merged 2 commits intomasterfrom
feat/size

Conversation

@hsanjuan
Copy link
Contributor

The each implementation of a datastore should decide on their size
is measured and how accurate the result is.

i.e. Some datastores may consider Size as the number of entries,
others as the disk space they use etc. Some implementations may
opt to improve the performance of the operation by caching or
by reducing the accuracy of the calculation.

Related: ipfs/kubo#4550

@hsanjuan hsanjuan self-assigned this Jan 18, 2018
@hsanjuan hsanjuan requested a review from Stebalien January 18, 2018 17:32
@ghost ghost added the status/in-progress In progress label Jan 18, 2018
@kevina
Copy link
Contributor

kevina commented Jan 18, 2018

I would rather the definition be better defined as either the approximate number of entries or the approximate disk space used.

@Stebalien
Copy link
Member

Yeah, we don't usually care about the size of the datastore. We care about the disk usage. I was thinking something like:

type PersistentDatastore interface {
    DiskUsage() (int64, error)
}

Where datastores could optionally implement this. Implementing this interface wouldn't mean that the datastore necessarily persists its data, just that it understands the concept and can report any disk space it thinks it's using.

I got stuck with this interface trying to figure out a more generic "stats" interface but I think we can just go with something like the above and deal with other stats later.

@hsanjuan
Copy link
Contributor Author

ok, but if a persistent datastore is wrapped in any of the wrapper ones, then your "optional" method is gone, and you need to cast (which makes the interface less useful and not very helpful in the case of triages etc).

We could just force everyone to report DiskUsage and those that don't care about it should just say 0...

@hsanjuan
Copy link
Contributor Author

we don't usually care about the size of the datastore

It would not hurt allowing to know this information though. In a datastore where all entries are roughly the same size (thinking ipfs chunks here) it can offer a pretty good ballpark of the total size with little effort...

@Stebalien
Copy link
Member

ok, but if a persistent datastore is wrapped in any of the wrapper ones, then your "optional" method is gone, and you need to cast (which makes the interface less useful and not very helpful in the case of triages etc).

Datastore wrappers would have to handle this case. We're already using this pattern for, e.g., GC.

It would not hurt allowing to know this information though. In a datastore where all entries are roughly the same size (thinking ipfs chunks here) it can offer a pretty good ballpark of the total size with little effort...

If we don't agree on what size means, it is an entirely useless metric. What if I have a mapped datastore that delegates to two datastores, one that treats size as the number of objects and another that treats size as bytes? What about caches?

@hsanjuan
Copy link
Contributor Author

If we don't agree on what size means, it is an entirely useless metric.

What I mean is, exposing the number of elements under a different method too (Size and DiskUsage for example), since we're at it.

Datastore wrappers would have to handle this case. We're already using this pattern for, e.g., GC.

ok I will proceed that way then

@hsanjuan
Copy link
Contributor Author

How is this looking?

Copy link
Member

@magik6k magik6k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM, one comment

type PersistentDatastore interface {
Datastore

DiskUsage() (uint64, error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have a doc explicitly saying what the unit here is (i.e. bytes).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think that would be obvious, but it can't hurt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah you're totally right, my bad

Copy link
Contributor

@kevina kevina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the DiskUsage method is not available an error should be returned. See inlined comments.

}
log.Printf("%s: DiskUsage: %d\n", d.Name, du)
return du, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should return an error if the child is not a PersistentDatastore.

I would also restructure it to avoid having to declare du and err up front.

Something like

pd, ok := d.child.(PersistentDatastore)
if !ok {
  return 0, errors.New("Unimplemented")
}
du, err := pd.DiskUsage()
if err != nil {
  return 0, err
}
log.Printf("%s: DiskUsage: %d\n", d.Name, du)
return du, nil

But use a better error message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see why a batch operation for example, should return an error if the underlying datastore cannot perform it at all. But DiskUsage? This implies that mixing Persistent and non PersistentDatastores in a wrapper and asking for DiskUsage should probably result in error too (as returning error here would mean doing it everywhere for behavior consistency).

If the datastore is not persistent, it seems natural to say (from a wrapper point of view) that the DiskUsage is 0, while keeping the error result to actually signal for real errors when obtaining real disk usage. In the current approach wrappers never produce errors on DiskUsage(), they just bubble them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the underlying datastore uses disk storage but doesn't implement this method, then a result of 0 without an error would be misleading.

Maybe we should just require this method be implemented, or perhaps a more generic Stat() method.

@Stebalien thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a datastore uses the disk, it should report it. If it doesn't, that's a bug in the datastore. However, I don't want to force every datastore to implement this method (it's not really necessary).

I'd like to go with a more generic stat but I couldn't come up with a concrete, nice solution that'll please everyone so, unless you can think of a way to make that work (that's both efficient and idiomatic) we might as well go with this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I am unconformable with the idea of unimplementable this method means it doesn't use any disk space, but I do agree that will be easier.

I take another look but I think this LGTM then.

if pd, ok := c.D.(ds.PersistentDatastore); ok {
return pd.DiskUsage()
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments from previous review.

I would also refactor the logic info a helper function.

(Now GitHub is annoying me, This is the third time I wrote this comment.!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you create a separate util module just for this helper?

Copy link
Contributor

@kevina kevina Jan 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just put in in the top-level package and give it would look something like

func DiskUsage(d *Datastore) (uint64, error) {
	pd, ok := d.(PersistentDatastore)
	if !ok {
		return 0, errors.New("Unimplemented")
	}
	du, err := pd.DiskUsage()
	if err != nil {
		return 0, err
	}
	return du, nil
}

And then in the wrapper methods just use:

return DiskUsage(c.D)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with having a helper function to avoid casting everywhere. I disagree with not implementing this interface an error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Note this will likely be overkill, if we decide not to return an error.)

if pd, ok := d.child.(ds.PersistentDatastore); ok {
return pd.DiskUsage()
}
return 0, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Etc.

This adds a PersistentDatastore interface which allows datastores
to report DiskUsage().

It implementes the interface on all wrapper types, which return
0 if the wrapped datastore does not provide this method.

Related: ipfs/kubo#4550
@hsanjuan
Copy link
Contributor Author

Thanks for your feedback and patience everyone! Ready for another round...

@Stebalien
Copy link
Member

@kevina, @magik6k LGTY?

Copy link
Contributor

@kevina kevina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor change. Otherwise LGTM.

We need to make sure that this new method is implemented on all Datastores before this method is used. (I would still have preferred an error be returned if it wasn't implemented.)

basic_ds.go Outdated
// DiskUsage implements the PersistentDatastore interface.
func (d *LogDatastore) DiskUsage() (uint64, error) {
du, err := DiskUsage(d.child)
log.Printf("%s: DiskUsage: %d\n", d.Name, du)
Copy link
Contributor

@kevina kevina Jan 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably not report anything if an error was returned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest of methods there log the calls regardless of results. I can log the call and not the result though (just did that)

Consistent with the rest of calls.
Copy link
Contributor

@kevina kevina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@hsanjuan
Copy link
Contributor Author

Will someone be so kind to merge?

@kevina
Copy link
Contributor

kevina commented Jan 26, 2018

@hsanjuan I believe that would be @whyrusleeping or @Stebalien job.

@Stebalien Stebalien merged commit 501cc61 into master Jan 26, 2018
@ghost ghost removed the status/in-progress In progress label Jan 26, 2018
@Stebalien Stebalien deleted the feat/size branch January 26, 2018 19:14
@Stebalien
Copy link
Member

@hsanjuan are you going to put together the implementations?

@hsanjuan
Copy link
Contributor Author

@Stebalien yeah that's my intention

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants