pangram: rework tests (discussion)#893
Conversation
214a54e to
d15d739
Compare
| @@ -12,57 +12,81 @@ | |||
| ], | |||
There was a problem hiding this comment.
Don't forget to change the version accordingly, the current one is (semantic versioning):
"version": "1.1.0",
There was a problem hiding this comment.
Funny you brought this up right in the moment I amended this change ^^
I changed to 2.0.0, because the structure changed completely. Is this correct or should I do 1.2.0?
There was a problem hiding this comment.
Here are the guidelines governing test versioning: https://github.com/exercism/problem-specifications#test-data-versioning
Since it appears that (correct me if I'm wrong):
- there's no new property
- the existing property has not been renamed
- there are no new keys
- no key types have been changed
we don't need a major version bump here, and v1.2.0 should suffice (minor vs patch because inputs/outputs were changed).
There was a problem hiding this comment.
thanks for the guide. Changed it to 1.2
224c3f3 to
f08b482
Compare
…with suitable edge cases: exercism#893
f08b482 to
2f0ae2f
Compare
…with suitable edge cases: exercism#893
| "description": "handles an undefined message as empty message", | ||
| "property": "isPangram", | ||
| "input": "the quick brown fox jumps over the lazy dog", | ||
| "input": "null", |
There was a problem hiding this comment.
The description is misleading and the input being the string "null" here is confusing.
What are you trying to test?
There was a problem hiding this comment.
I need to change this to null without quotation marks instead.
My reasoning is the same as I wrote here:
#895 (comment)
For everyone else, here's the quote to prevent jumping around:
I see what you mean.
But as far as I experienced it, test suits should check for edge cases. null checks are especially important (at least in almost all languages).
In every educational piece of material the advice is not to only check for usual inputs, but for all possible (equivalent kinds of) inputs. null checks, empty checks and others are always on top of the example lists given. Just take any arbitrary course about QA and this will be part of it.
I think this is especially important, because we are in an educational context. The users should learn from the getgo what it means to write (arguably good) tests (first), because it is still a major problem, even upon seasoned programmers.
From my own business experience I know that new programmers are completely lost when it comes to writing tests. But also experienced programmers fall in the traps of "I'll do it later", "This is a trivial method", "I can't test this", "There are too many dependencies" etc. Just take any talk or tutorial about writing tests or even TDD and you will see the same old counter-arguments or questions again and again from the comments or the audience. This has a reason: Writing good tests is hard! Very hard to be exact and it is a skill that needs exercise. And where better to begin than right from the start?
| { | ||
| "description": "recognizes a lower case pangram", | ||
| "property": "isPangram", | ||
| "input": "thequickbrownfoxjumpsoverthelazydog", |
There was a problem hiding this comment.
Why not just test against the complete alphabet?
There was a problem hiding this comment.
Good point, this thought never occurred to me. ^^
The input was already there so I reused it.
| "description": "recognizes a missing character 'h'", | ||
| "property": "isPangram", | ||
| "input": "a quick movement of the enemy will jeopardize five gunboats", | ||
| "input": "fiveboxingwizardsjumpquicklyatit", |
There was a problem hiding this comment.
What is this testing? Why is 'h' different from 'x' and more important than 'y'?
There was a problem hiding this comment.
This is the core functionality of this exercise, so I'd argue that we should test it with more than just one possible failing input.
The original version of the test had the following structure:
{
"description": "missing character 'x'",
"property": "isPangram",
"input": "a quick movement of the enemy will jeopardize five gunboats",
"expected": false
},
{
"description": "another missing character 'x'",
"property": "isPangram",
"input": "the quick brown fish jumps over the lazy dog",
"expected": false
},
So I guess, checking for 'h' instead of 'x' twice is already an improvement.
I'd rather like to add even more test cases, maybe about 3-5 in total. However, because the schema does not allow for multiple "expected" entries, I decided to keep it at two.
There was a problem hiding this comment.
What if it doesn't catch a missing 'a' ?
Either the 'h' test is not necessary or you're missing another 24 tests.
There was a problem hiding this comment.
@Insti Exactly! That is my problem I'm trying to balance.
But it does not help to check for x in two cases like the original suite does atm.
If you want, then I'll do 26 test cases for every character.
But just one is simply not a reasonable option for testing the core functionality.
Here's the gist of what I learned from my QA class (already several years back, though...):
See, to test reasonably, you have to build equivalence classes of test data. (see https://en.wikipedia.org/wiki/Equivalence_partitioning)
For each equivalence class you introduce a bunch of data points to test for the edge cases and to test representatively. You can't and shouldn't test all possible data permutations, but you should include enough data points to be reasonably sure that your code does what it should do. Only one data point for each equivalence class is just not reasonable. This even more important in black-box or grey-box testing, where we don't know the implementation or just parts or generalizations of it.
e.g. say we build the following equivalence classes:
- testing null
- testing invalid pangrams
- testing valid pangrams
- testing agnonistic behaviour to non-a-zA-Z input
(Class 4 is up for debate, it might as well be included into 2 and 3 directly. There is no one way, other class structures are also reasonable.)
Now we introduce edge cases to the classes.
Edge cases are data points that represent the logical edges of, or transitions between each equivalence class. The chief suspects, so to say. The bare minimum.:
- testing null input(s):
a) input null
b) input undefined (e.g. for JS) - testing invalid pangrams:
a) "" (empty input)
b) "abc[...]xy" (missing single z)
c) "bcd[...]xyz" (missing single a)
d) "AbC[...]xY" (mix-case missing single z) - testing valid pangrams:
a) "abc[...]xyz" (perfect pangram)
b) "abc[...]xyzabc[...]xyz"" (pangram with every char twice)
c) "AbC[...]xYz" (mix-case perfect pangram) - testing agnonistic behaviour to non-a-zA-Z input:
a) "a bc def ghij [...]xyz" (valid pangram with spaces)
b) "a bc def ghij [...]xy" (invalid pangram with spaces)
c) "a1b!c?[...]x.y,z;" (valid pangram with some random special chars)
c) "a1b!c?[...]x.y,;" (invalid pangram with some random special chars)
Now we introduce some further representative random data where reasonable:
- testing null input(s):
a) input null
b) input undefined (e.g. for JS) - testing invalid pangrams:
a) "" (empty input)
b) "abc[...]xy" (missing single z)
c) "bcd[...]xyz" (missing single a)
d) "AbC[...]xY" (mix-case missing single z)
d) "The quick brown[...]" (random sentence with one missing char, e.g. h)
e) "Five boxing wizards[...]" (random sentence with one missing char, e.g. t)
f) "A quick movement[...]" (random pangram with several missing chars, e.g. b, d, y) - testing valid pangrams:
a) "abc[...]xyz" (perfect pangram)
b) "abc[...]xyzabc[...]xyz"" (pangram with every char twice)
c) "AbC[...]xYz" (mix-case perfect pangram)
d) "The quick brown[...]" (random pangram)
e) "Five boxing wizards[...]" (random pangram)
f) "A quick movement[...]" (random pangram - testing agnonistic behaviour to non-a-zA-Z input:
a) "a Bc Def Ghij [...]xyz" (valid pangram with spaces)
b) "a Bc Def Ghij [...]xy" (missing z with spaces)
c) "A1b!C?[...]x.Y,z;" (valid pangram with some typical ASCII sentence chars)
c) "A1b!C?[...]x.Y,;" (missing z with some typical ASCII sentence chars)
d) "Victor_jagt-zwölf.B0xkämpfer >qu3r<über;den'großen" Sylter#Deich!" (valid pangram with extended ASCII-chars)
e) "Victor_jagt-zwölf.B0xkämpfer >qu3r< über;einen'großen" Sylter#Teich!" (missing d, with extended ASCII-chars)
f) "Few quips galvanized the ① mock 😮 jury box 🗃️." (valid pangram with non-ASCII chars)
g) "Few quips gal℣anized the ① mock 😮 jury box 🗃️." (missing v with non-ASCII chars)
The issue is now to find the right balance of the number of additional representative data. This number of cases must be reasonable to be sure enough the method does what it should.
So, we could add more, testing for all sorts of missing chars, but I think it should not have only one that tests only for a missing x. That is not reasonable.
| "description": "ignores other characters in incorrect pangram", | ||
| "property": "isPangram", | ||
| "input": "7h3 qu1ck brown fox jumps ov3r 7h3 lazy dog", | ||
| "input": "Victor_jagt-zwölf.B0xkämpfer >qu3r< über;einen'großen\" Sylter#Teich!", |
There was a problem hiding this comment.
This has the same description as the previous test.
Non ASCII test cases should not be included, see: #428
There was a problem hiding this comment.
The descriptions seem to be the same at first glance, but they are not. One is for (correct) pangrams and one for (incorrect) non-pangrams. I'll try to make this more readable.
OK, I'll remove non-ASCII tests.
There was a problem hiding this comment.
I like what you're trying to do here, and think it's good to look to improve the test case ordering.
Some suggestions:
Add the non-alphabet character exclusion test(s) early so you can use spaces in the subsequent tests.
Ensure that test descriptions are unique.
Be certain what you are testing for in each test and ensure that it is possible for the test case to fail if all the previous tests pass.
Be very wary of removing/changing existing tests, they are all there for a reason. (Re-ordering is generally fine.)
| "expected": false | ||
| }, | ||
| { | ||
| "description": "ignores other characters in correct pangram", |
There was a problem hiding this comment.
"correct" is not the right word here - it's a pangram or a non-pangram.
There was a problem hiding this comment.
I agree, but because of the context I wrote it extra clear to make it distinct to the complementary non-pangram input. Sometimes I feel it might be better to be redundantly clear instead.
But I have to change it anyway, because of bad readability:
#893 (comment)
|
@Insti Please see my replies above. (What a pity they get hidden after a new commit.) Regarding one of your suggestions:
I see what you want to achieve here. It would increase the readability of subsequent test cases. Regards:
Yes, that's true. But I had a feeling it is redundant to test multiple times for characters to ignore. Basically every character beside a-zA-Z should be ignored, so in this particular case it made sense to me to merge the cases into one. |
…with suitable edge cases: exercism#893
2f0ae2f to
8d9b0a1
Compare
…with suitable edge cases: exercism#893
|
Hi @Vankog, thanks for taking on board the feedback and arguing for the cases you disagree with. ❤️ The main requirement from the description is:
It would be nice to use sentence in the examples we're testing. Otherwise why not just sort everything alphabetically so the student can more easily see which letters are missing? (This is also undesirable.) "the quick brown fox jumps over the lazy dog" is clearly a sentence.
It also makes your PR harder to review and agree to merge. |
|
Thanks, it is nice to see involved people. I made some replies above. The sentence-argument was also something that came to me and I considered it. In a way it is hard to trade-off. Either using a single long word all the time or introducing spaces early.
I understand. Well, there is a reason why I worded the title of this discussion "rework" and the nucleotide one "refactor". |
| "description": "returns false for an undefined/null message as argument.", | ||
| "property": "isPangram", | ||
| "input": "the quick brown fox jumps over the lazy dog", | ||
| "input": null, |
There was a problem hiding this comment.
If null is tested as an input, I am thinking it should be an error, rather than to return false. The reason for this belief is that if we consider that the function under test must only accept strings, well null is not a string, so this is in error.
(Of course, I imagine languages for which strings are non-nullable will simply skip this test)
This comment must not be read as an endorsement or rejection of the statement "null should be tested as an input".
There was a problem hiding this comment.
Yeah, I agree. I had this same thought but had not changed it, yet, because of the ongoing discussion. I will probably change it to expect a throw in an upcoming commit.
| "description": "recognizes a pangram even if additional characters other than a-z are present.", | ||
| "property": "isPangram", | ||
| "input": "\"Five quacking Zephyrs jolt my wax bed.\"", | ||
| "input": "Victor_jagt-zwölf.B0xkämpfer >qu3r<über;den'großen\" Sylter#Deich!", |
There was a problem hiding this comment.
can you confirm whether or not ö ä ü and ß are in ASCII, given the declaration of intent in #893 (comment) of
OK, I'll remove non-ASCII tests.
| "description": "recognizes missing characters (e.g. 'd') even if additional characters other than a-z are present.", | ||
| "property": "isPangram", | ||
| "input": "the quick brown fox jumps over with lazy FX", | ||
| "input": "Victor_jagt-zwölf.B0xkämpfer >qu3r< über;einen'großen\" Sylter#Teich!", |
There was a problem hiding this comment.
can you confirm whether or not ö ä ü and ß are in ASCII, given the declaration of intent in #893 (comment) of
OK, I'll remove non-ASCII tests.
Are there any that need a response that I've not responded to? PRs get complicated when they contain many different conversation threads and Github doesn't help when it starts hiding conversations. (Another reason for small focused PRs.) |
|
@Insti hm... besides the input validation discussion I think this discussion thread is the last: Here is the equivalent code in the overall diff: Yeah, keeping track of all the replies is currently horrible. I always have to check from top to bottom. |
|
@Vankog are you interested in continuing to work on these pull requests? (If no further progress is made, I will close this on or after the 8th October.) |
|
I do. I was just waiting for the sub-discussions to conclude so we can finally decide what should or shouldn't be done. |
aaa359f to
2927bb5
Compare
…with suitable edge cases: exercism#893
|
OK, new suggestion.
|
63a09d8 to
c0c6f01
Compare
c0c6f01 to
2748b6f
Compare
petertseng
left a comment
There was a problem hiding this comment.
I am thinking this progression allows the student to take the smaller steps in the right order, compared to 1.1.0, so it achieves its goal of improving in that regard.
I have one comment about a class of incorrect solution that I think it could be worth to test.
I express no opinion on changed descriptions.
| "referential transparency (i.e. evaluating, a function/method gives the same value for same arguments every time.)", | ||
| "etc.", | ||
| "", | ||
| "'error' and 'null' should be treated according to the languages' specifics and possibilities." |
There was a problem hiding this comment.
this is not strictly necessary, since there are no longer any error or null used in this file (unless I missed one, sorry!)
| "description": "missing letters replaced by numbers", | ||
| "description": "recognizes a missing character in mixed-case, e.g. 'Y'.", | ||
| "property": "isPangram", | ||
| "input": "7h3 qu1ck brown fox jumps ov3r 7h3 lazy dog", |
There was a problem hiding this comment.
one interesting thing about this input is that the number of unique non-space characters in it is exactly 26, which is the intent of #852 .
Now it looks like we are testing mixed-case before non-alphabetics, all right, so we wouldn't have the letters -> numbers case, that seems fine.
If you believe that is still a useful bug to catch, then it would be useful to have a case like "Aabcdefghijklmnopqrstuvwxy"
The criterion is not met by "ThE FiVe bOxInG WiZaRdS JuMp qUiCkL" because the number of unique non-space characters in it is 28.
There was a problem hiding this comment.
@petertseng Thanks for this hint.
However, I am not sure I understand the intention of this bugfix or the changed testcase. Is it to test for mixed-case, but with 26 distinct chars?
If so, I think we should add a testcase with exactly that intent to prevent future misunderstandings/changes. I added this as a ToDo item below: #893 (comment)
There was a problem hiding this comment.
Is it to test for mixed-case, but with 26 distinct chars?
Yes. The test case that got replaced is: lowercase + numbers, with 26 distinct non-space chars.
I was thinking it would get replaced with: lowercase + uppercase, with 26 distinct non-space chars.
I think you understood correctly.
|
As a reviewer this PR has gotten to a stage where reviewing it is very difficult. I recommend this PR be closed and new separate PRs be opened for the different changes. |
2748b6f to
ed25b5c
Compare
|
Tasks:
|
|
Closing this one too. |
…with suitable edge cases: exercism#893
As proposed in #893. Currently two cases check for a missing 'x'. This is changed to 'h' to test more thoroughly. This improves the TDD progression and forces the user to check for more than just 'x'.
Hi,
I'm currently contributing some bits and tits to the JS and ecmascript tracks and while doing so I came across some issues I see with some of the test files. Upon proposing some changes the maintainers asked me to contribute to the canonical track first, before it can be addressed in the particular track. So here I am. :-)
In this PR I'd like to discuss the test spec for the pangram exercise in particular.
First I have some assumptions:
The pangram test spec is the first test that I took a closer look at and at least to me there seem to be some things that could be improved.
Here is the current one:
https://github.com/exercism/problem-specifications/blob/fba1aef6b2237f504bfdd0bf575fd49489f12523/exercises/pangram/canonical-data.json
The following things came to me:
I am far from an expert in this field. To be fair I'm even a learner who needs practise in TDD. However, I am particularly interested in a good TDD design and the clean code movement. And I think I can add some value from experienced talks and tutorials and videos about testable code and clean code in general to this discussion.
Therefore, I have a suggested version of the pangram test spec you can see in this PR.
Here is the file in a whole:
https://github.com/Vankog/problem-specifications/blob/pangram-test-rework/exercises/pangram/canonical-data.json
I think I provided a good sliming progression, adding one aspect after another. Also I tried to think about the edge cases more carefully and what needs to be tested, adding some and merging others.
Here is a concrete implementation of this proposal in JS that I proposed in an PR over there:
exercism/javascript#406
What do you think?