Skip to content

sum_of_multiples: add tests for logic and input validation#139

Closed
mmakaay wants to merge 1 commit intoexercism:masterfrom
mmakaay:sum_of_multiples
Closed

sum_of_multiples: add tests for logic and input validation#139
mmakaay wants to merge 1 commit intoexercism:masterfrom
mmakaay:sum_of_multiples

Conversation

@mmakaay
Copy link
Copy Markdown

@mmakaay mmakaay commented Oct 21, 2014

Extend the test set with the following test cases:

  • Trigger a bug in an inclusion/exclusion algorithm implementation
    that I did and which passed the already existing tests. It captures
    a possible thinking error when implementing this type of solution.
  • Feed the factors in reverse order (high to low).
  • Feed the same factor multiple times.
  • Feed a zero factor (should result in a ValueError).
  • Feed a negative factor (should result in a ValueError).

@mmakaay
Copy link
Copy Markdown
Author

mmakaay commented Oct 21, 2014

Oh, I added some input validation cases to the tests as well, but the example must be upgraded for that too.

@sjakobi
Copy link
Copy Markdown
Contributor

sjakobi commented Oct 21, 2014

Oh, I added some input validation cases to the tests as well, but the example must be upgraded for that too.

Right.

I'm not sure whether the additional test cases really improve the exercises or rather distract. I also find it debatable whether the test cases that you declare invalid really need to be invalid.

The lines in your commit message still are way too long BTW.

@mmakaay
Copy link
Copy Markdown
Author

mmakaay commented Oct 21, 2014

Way to long as "too verbose" or "longer than 72 characters"? And is this about the subject line or the body text? I have wrapped my lines at 72 chars and looking at the commit message, that seems to have worked. In the message at the top, the body lines have been merged.

I wonder in what way test cases would distract. They all describe distinct inputs that should be handled by the code and should result in correct output. A coder that produces correct code right away gets green output. A coder that misses a detail, is notified by a red test. Being a TDD addict might trouble my view on this of course ;-)

Maybe feeding the factors in reverse order is not needed, but I do remember that during development I had to adapt my code for handling it, therefore I copied it from my test set.

The error cases seem pretty vital to me. I do not think that these outputs are what a user of the produced module would expect:

>>> SumOfMultiples(0).to(10)
ZeroDivisionError: integer division or modulo by zero
>>> SumOfMultiples(-3).to(10)
18

Of course, for the second one you could argue that 3, 6 and 9 are multiples of -3, but the readme states "If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9". To me, this communicates the intent that the module is about natural numbers and not about integers.

@mmakaay
Copy link
Copy Markdown
Author

mmakaay commented Oct 21, 2014

Here's a good example of what I am mainly afraid of when accepting negative factors (this is a recently submitted solution for sum-of-multiples):

class SumOfMultiples:
    def __init__(self, *numbers):
        self.sum = 0
        self.takeaway = 1
        if not numbers:
            self.multiples = [3, 5]
        else:
            self.multiples = list(numbers)
    def to(self, number):
        for a in self.multiples:
            a1 = a
            self.takeaway *= a
            while a1 < (number):
                self.sum += a1
                a1 += a
        b = self.takeaway
        while b < (number):
            self.sum -= b
            b += self.takeaway
        return self.sum

The while a1 < (number) check will loop indefinitely, since the code will do -3, -6, -9, etc. never reaching 10.

@mmakaay mmakaay changed the title sum_of_multiples: extend tests for algorithm logic and input validation sum_of_multiples: add tests for logic and input validation Oct 22, 2014
@mmakaay
Copy link
Copy Markdown
Author

mmakaay commented Oct 22, 2014

I gave the changes some thought and applied a few changes.

  • When using a zero factor, no ValueError is expected anymore. Reading up on natural numbers, apparently zero being part of them is debatable. I kept the test, but changed it to expect a sensible outcome now. I think it is better than allowing a division by zero error.
  • The test that uses repeated factors as input, now uses (3, 3, 6, 6). What I mainly hope to gain from that, is that coders discover that they can collapse that into simply (3).

I squashed the changes into a single commit (thanks to your excellent guide explaining exactly how to do this).

Comment thread sum-of-multiples/example.py Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't recall ever seeing 0 or 1 as factors in a number.

@mmakaay
Copy link
Copy Markdown
Author

mmakaay commented Oct 22, 2014

They aren't factors when you're talking about extracting prime factors for a number. But this exercise is not working with prime factors, is it? In this case, the factor means "half of a multiplication". The other factor in the multiplication would be n in range(1, ...), as long as n * factor < the provided number.

The number 1 could be input for the SumOfMultiples class. If I would do:

SumOfMultiples(1),to(5)

Then I assume that I should get:

1 * 1 + 1 * 2 + 1 * 3 + 1 * 4 = 10

1 not being a prime factor of 10 (or any number) is not related to the question that has to be answered.
0 is a special case, since all multiples of 0 are below 10. When not considering 0 a case to raise an exception for, then we have to state that it should result in 0 * ∞ = 0.

@kytrinyx
Copy link
Copy Markdown
Member

this exercise is not working with prime factors, is it

Good point. I was thinking prime factors, but there's no reason why it would have to be.

I'm unsure about the zero case.

@pminten do you have any thoughts about the mathematical underpinnings of this problem?

@pminten
Copy link
Copy Markdown

pminten commented Oct 23, 2014

@kytrinyx Uh, this isn't really my area of expertise, but I can add some info.

When we talk about factoring numbers we usually talk about the principle/theory/whatever that any whole number over 1 can be expressed as the multiplication of primes. Numbers can often also be expressed as multiplication of other-than-prime numbers but the problem with that is that you don't really get a unique solution. For example 12 = 2 * 6 = 3 * 4 = 3 * 2 * 2. But there's no other way to factor 12 into prime numbers since prime numbers can't be divided into whole numbers themselves.

However this exercise, judging by the description in x-common prime numbers are not relevant. The question that's asked is merely to implement an algorithm like:

sum(i for i in range(1, n) if i % 3 == 0 or i % 5 == 0)

Or put imperatively:

sum = 0
for i in range(1, n):
  if i % 3 == 0 or i % 5 == 0:
    sum += i

@pminten
Copy link
Copy Markdown

pminten commented Oct 23, 2014

Btw, the API of this exercise is quite over complicated. It would be much more natural to require a single function instead of an object. Teaching people to use objects for such trivial calculations is I fear teaching them bad design.

@mmakaay
Copy link
Copy Markdown
Author

mmakaay commented Oct 23, 2014

If you only consider the slow short version of the code, you are right. I implemented a computational solution, which computes the outcome for any input factors in a few steps. For that implementation I was quite happy with the class.

One small learning experience about the class design as used here, would be that it shows that you can configure a class on construction and keep reusing it afterwards.

@pminten
Copy link
Copy Markdown

pminten commented Oct 23, 2014

sum(i for i in range(1, n) if any(i % j == 0 for j in divisors))

I agree that configuring an object on construction and reusing it afterwards is useful, in fact it's the most useful part of having objects. However a design should fit the exercise, if an experienced programmer in a language would not use objects for a particular task the tests shouldn't require objects.

@mmakaay
Copy link
Copy Markdown
Author

mmakaay commented Oct 23, 2014

I'll show you what I came up with below, so you can see why a class worked well for me. You are still very right though. A class structure is not the most Pythonic approach for this case. And if a simple function interface were defined for the exercise, I could of course still have added a class for encapsulating the logic when that seemed fit to me. I therefore have no real preference for function or class here. I was just focusing on some unit test additions, to narrow down some ambiguities in the problem field.

    from itertools import combinations
    from fractions import gcd

    class SumOfMultiples:

        def __init__(self, *factors):
            factors = factors or (3, 5)
            self._validate_factors(factors)
            self._factors = self._filter_redundant_factors(factors)

        def _validate_factors(self, factors):
            for factor in factors:
                if factor <= 0:
                    raise ValueError(
                        "Invalid factor: %s (factors must be > 0)" % factor)

        def _filter_redundant_factors(self, factors):
            """Deletes duplicate factors and factors that can be divided by a
            lower factor. These factors will always produce duplicate multiples
            that all must be subtracted from the total. So better not use them
            in the first place.
            """
            return [
                factor for factor in set(factors)
                if not any(
                    other_factor for other_factor in factors
                    if other_factor < factor and not factor % other_factor
                )
            ]

        def to(self, to):
            """Computes the sum of multiples for all configured factors up to the
            value of the 'to' parameter. Duplicate multiples are only counted
            once in the sum.
            """
            return sum(
                plus_or_minus * sum_of_multiples
                for plus_or_minus, sum_of_multiples
                in self._steps(to)
            )

        def _steps(self, to):
            plus_or_minus = +1

            sum_of_multiples = self._sum_of_multiples(self._factors, to)
            yield (plus_or_minus, sum_of_multiples)

            for combi_length in range(2, len(self._factors) + 1):
                plus_or_minus = -plus_or_minus
                factors = self._lcm_for_factor_combinations(combi_length, to)
                sum_of_multiples = self._sum_of_multiples(factors, to)
                yield (plus_or_minus, sum_of_multiples)

        def _lcm_for_factor_combinations(self, combi_len, to):
            return [
                factor for factor in [
                    self._lcm(*factors)
                    for factors in combinations(self._factors, combi_len)
                ] if factor < to
            ]

        def _lcm(self, *factors):
            if len(factors) == 1:
                return factors[0]
            elif len(factors) == 2:
                return (factors[0] * factors[1]) // gcd(*factors)
            else:
                lcm = factors[0]
                for factor in factors[1::]:
                    lcm = self._lcm(lcm, factor)
                return lcm

        def _sum_of_multiples(self, factors, to):
            total = 0
            for factor in factors:
                c = (to - 1) // factor
                total += factor * (c * c + c) // 2
            return total

@kytrinyx
Copy link
Copy Markdown
Member

a design should fit the exercise, if an experienced programmer in a language would not use objects for a particular task the tests shouldn't require objects.

I agree. I have simplified some of the exercises in the Ruby track to not require objects, and intend to keep going in that direction where it makes sense.

Overall I think that in most language tracks a lot could be done to improve the design of the exercises / test suites, as the first pass really was more about quantity than quality.

@mmakaay
Copy link
Copy Markdown
Author

mmakaay commented Oct 24, 2014

I can do that update as well if you like in this pull branch.

About the correct way to handle zero and negative numbers:
The code that @pminten wrote, will raise a division by zero error when using a zero in the input. To me, it is good design for any kind of software component to not let exceptions that stem from implementation details bubble up. They mean nothing to the user of the component. So what do you guys see as the expected behavior?

  • ValueError explaining not to use 0
  • Count the sum ofultiples for 0 as 0

@sjakobi
Copy link
Copy Markdown
Contributor

sjakobi commented Oct 24, 2014

I agree with @pminten's point that this should be a functional exercise that doesn't require a class.

I have opened an issue for this.

@sjakobi
Copy link
Copy Markdown
Contributor

sjakobi commented Oct 24, 2014

I find that thinking about the exercise as a mathematical function helps clarifying the problem:

  • If the input numbers are all 0, the output is 0 because there is no number but 0 that can be expressed as 0 * x.
  • In principle there's no problem with negative inputs - positive numbers too can be expressed as <some negative factor> * <some other negative factor>.

I think it's more important to ask whether the space of possible inputs should be narrowed down to some extent although the problem description doesn't explicitely require it. These are some ideas to limit the set of possible inputs for this exercise:

  • The input numbers are unique.
  • The input numbers are non-negative (allowing for 0).
  • The input numbers are positive (prohibiting 0).
  • The input numbers are sorted in ascending order.

I believe that guaranteeing these properties for the input may allow the users to focus on the core of problem and may motivate them to come up with smart solutions for it.

What do you think about these limitations?

@kytrinyx
Copy link
Copy Markdown
Member

I think that given the above observations, "the input numbers are non-negative" works well. The problem description has natural numbers, and there's no universal agreement on whether or not zero is included in that.

I also like the constraints given that the input numbers will be uniq and sorted in ascending order. That will remove some pretty arbitrary argument munging that I think is less than interesting.

@pminten
Copy link
Copy Markdown

pminten commented Oct 24, 2014

Explicit is better than implicit. Depending on how performance critical
and API-like the function is I'd either go for:

  • Explicit check throwing ValueError (most clean solution, also takes most
    space in code and requires extra computation)
  • A quick assert at the start of the function (can be disabled if needed)
  • Nothing (let it fail with ZeroDivisionError)

However what I'd definitely do is add a note to the function documentation
that a precondition of the function is that all divisors are more than 0.

I can do that update as well if you like in this pull branch.

About the correct way to handle zero and negative numbers:
The code that @pminten wrote, will raise a division by zero error when
using a zero in the input. To me, it is good design for any kind of
software component to not let exceptions that stem from implementation
details bubble up. They mean nothing to the user of the component. So what
do you guys see as the expected behavior?

  • ValueError explaining not to use 0
  • Count the sum ofultiples for 0 as 0

Reply to this email directly or view it on GitHub:
#139 (comment)

@mmakaay
Copy link
Copy Markdown
Author

mmakaay commented Oct 24, 2014

Exceptions are best for writing tests, especially when creating custom exception classes. Especially since I discovered (through exercises on exercism) that they are not so heavy as in most other languages. So for my code, I prefer raising exception.

But am I right that the general idea here is to only provide green input to the function from the exercise tests, to let the coder decide how far he/she wants to go for input validation?
In that case I will cleanup the tests that I added for this.

The negative factor input is still interesting for cases where the implementation ends in an endless loop (like the one I posted above). But for those, I can still nitpick in the comment thread for code that gets this wrong :) It is not really possible to add a test for it, without fixating the direction of the solution (since documenting how code should behave is documented in a unit test).

@kytrinyx
Copy link
Copy Markdown
Member

for those, I can still nitpick in the comment thread for code that gets this wrong

Yeah, this is one place where the discussions are invaluable. It's often during conversation that I have insights. If a test just "made me" do something, I often don't think very hard about it.

@mmakaay
Copy link
Copy Markdown
Author

mmakaay commented Oct 25, 2014

Perfect, then I think I've got a good feel for what you want to archieve with the tests now. I will update the changes to this solution and push it into this pull (that sounds weird :). I just finished the Zebra puzzle in the Python tree, so I've got some time to spend now for making improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants