Skip to content

Multiple operations instead of means #33

@bnlawrence

Description

@bnlawrence

In working through the implications of implementing means in chunks, it is notable that once missing data is in play, we need to return two numbers from the reduce_chunk method: the sum, and the count, because means over chunks will be needed to be weighted by the actual number of values being meaned.

There are a number of ways we could implement this:

  1. Always return (X, N), where X is the expected operation, and N the number of values contributing
  2. Only return (X, N) when required (e.g. for means) otherwise return (X,None) or (X,)
  3. Return X, except when it needs to be (X,N)
  4. Something else.

The something else option could be slightly more interesting: do we think it's a smart idea to say we could chain a series of methods and expect a series of results, in a lightweight sort of caching?

Obvious use cases would be:

  • mean = sum, count
  • range = min, max
  • sqmean = sum(squares), sum, count

This could be facilitated by handing not just "a method" but a list of 1.. many methods, and expect back a list of 1..many results.

Metadata

Metadata

Assignees

Labels

excaliburNeeds discussion by the excalibur team

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions