Conversation
| def replacement_show(): | ||
| # form a closure on test_case and tolerance | ||
| test_case.check_graphic(tol=tol) | ||
| test_case.check_graphic() |
There was a problem hiding this comment.
In the hope of refraining from a culture of tweaking graphical testing tolerances, I've closed the door to passing a tol through to check_graphic.
| with fail_any_deprecation_warnings(): | ||
| with add_examples_to_path(): | ||
| import atlantic_profiles | ||
| with show_replaced_by_check_graphic(self, tol=14.0): |
There was a problem hiding this comment.
This covers up an issue in the use of twiny ... changing the tolerance of an individual test is not healthy.
| result = func(*args, **kwargs) | ||
| return result | ||
| return decorated_func | ||
|
|
There was a problem hiding this comment.
A simple decorator to facilitate the use of a re-entrant lock to provide thread-safe plotting. A re-entrant lock is required here as iris.plot.contourf calls iris.plot.contour ...
|
|
||
|
|
||
| @_locker | ||
| def contour(cube, *args, **kwargs): |
There was a problem hiding this comment.
Use the lock for all public api plotting functions ...
There was a problem hiding this comment.
I wonder whether this will have any unexpected behavioural impacts in normal usage of iris.plot? I must admit that I expected that the lock would be applied within the scope of the tests only.
There was a problem hiding this comment.
I'm inclined to agree... it feels like this should be done when testing. We are mostly wrapping matplotlib calls, which I guess aren't thread safe either, so it doesn't seem consistent to lock our wrappers.
There was a problem hiding this comment.
@dkillick @ajdawson I see your point, and I was in two minds whether to lock at this level. To be honest, I expected that discussion to have happen during the review 😉 ...
Regardless of testing, iris plotting is not thread safe for users thanks to mpl, hence why I locked the wrappers at the plotting level. The problem lies deep within iris plotting regarding calls to iris.plot._replace_axes_with_cartopy_axes
Locking at the graphics test level completely makes sense as it totally ensures that graphics tests are atomic (as I've already implemented), so at a minimum we definitely require to keep that level of locking. The plotting level of locking may help the user in a threaded context, but I'm less convinced of this ... So we may be able to compromise and drop the plotting level of locking ... I'll take a look and see how it behaves. But I'd be more than happy to do that.
|
|
||
| _DEFAULT_IMAGE_TOLERANCE = 10.0 | ||
| # Threading non re-entrant blocking lock to ensure thread-safe plotting. | ||
| _lock = threading.Lock() |
There was a problem hiding this comment.
A non re-entrant lock is required here to ensure that there is no cross-pollination between graphic tests using the wrong plot figures or axes.
| with open(result_fname, 'rb') as fi: | ||
| sha1 = hashlib.sha1(fi.read()) | ||
| return sha1 | ||
|
|
There was a problem hiding this comment.
This hot-fix didn't work and needs to be purged.
At this point the plot was already wrong i.e. another thread had already clobbered either the plot title or axes title/s, so re-saving the clobbered plot image was never going to yield the correct sha.
| h = hexstr[i*2:i*2+2] | ||
| v = int("0x" + h, 16) | ||
| l.append([v & 2**i > 0 for i in range(8)]) | ||
| return imagehash.ImageHash(np.array(l)) |
There was a problem hiding this comment.
This fix has already been merged to imagehash but the fix has not yet been pushed to pypi ... we can remove this function when the fix is available via pypi .... or even conda-forge!
| buffer = io.BytesIO() | ||
| figure = plt.gcf() | ||
| figure.savefig(buffer, format='png') | ||
| buffer.seek(0) |
There was a problem hiding this comment.
Memory buffers are our friend ... only save to disk on failure, rather than save to disk and delete on success.
| distances = [e - phash for e in expected] | ||
|
|
||
| if sha1.hexdigest() not in expected: | ||
| if np.all([hd > tol for hd in distances]): |
There was a problem hiding this comment.
The result image fails to meet expectation iff it is not similar to all our registered expected images.
| @tests.skip_plot | ||
| class IntegrationTest(tests.GraphicsTest): | ||
| def setUp(self): | ||
| super(IntegrationTest, self).setUp() |
There was a problem hiding this comment.
We need to make sure that the GraphicsTest.setUp is called, as it aquires the non re-entrant lock that protects plot figures for a test.
There was a problem hiding this comment.
@bjlittle do we need to follow this pattern for all instances of test classes inheriting from tests.GraphicsTest? If so, you'll also need to update the following:
- In
lib/iris/tests:test_analysis.TestRotatedPoletest_coordsystem.Test_LambertConformaltest_mapping.TestBasic- In
testPlot:Test1dPlotMultiArgs,TestMissingCoord,TestMissingCS,TestAttributePositive, TestPlotOtherCoordSystems
- All affected test classes in
lib/iris/tests/integration/test_grib_load.py - Test class in
lib/iris/tests/integration/test_netcdftime.py
There was a problem hiding this comment.
@dkillick Thanks. Yes, we do indeed need to follow a pattern here ...
A GraphicsTest subclass must call the GraphicsTest.setUp method (in order to acquire the non-reentrant lock) if the subclass in question overrides the inherited setUp method. If the subclass doesn't specialise the setUp method then, thanks to inheritance, the right thing happens, in that the unittest framework calls the (inherited) GraphicsTest.setUp for the subclass (this behaviour also applies to GraphicsTest.tearDown btw to release the lock).
In all of the test cases that you identified above, there is no need to explicitly call GraphicsTest.setUp, simply because the subclass does not specialise the setUp method.
Note that, you incorrectly mentioned test_mapping.TestBasic:
class TestBasic(tests.GraphicsTest):
def setUp(self):
super(TestBasic, self).setUp()
self.cube = iris.tests.stock.realistic_4d()which does apply the pattern, and so does test_plot.Test1dPlotMultiArgs:
class Test1dPlotMultiArgs(tests.GraphicsTest):
# tests for iris.plot using multi-argument calling convention
def setUp(self):
super(Test1dPlotMultiArgs, self).setUp()
self.cube1d = _load_4d_testcube()[0, :, 0, 0]
self.draw_method = iplt.plotSo, I'd argue that this is a non-issue, unless I'm missing something ...
There was a problem hiding this comment.
I'd argue that this is a non-issue
Agreed. I was sure there must be a pattern to it, but I couldn't work out what the pattern was! Good ol' inheritance 😉
| import codecs | ||
| import contextlib | ||
| from glob import glob | ||
| import hashlib |
|
|
||
| from PIL import Image | ||
| import filelock | ||
| import imagehash |
There was a problem hiding this comment.
... long live imagehash! 😉
| h = hexstr[i*2:i*2+2] | ||
| v = int("0x" + h, 16) | ||
| l.append([v & 2**i > 0 for i in range(8)]) | ||
| return imagehash.ImageHash(np.array(l)) |
There was a problem hiding this comment.
Again this will disappear ...
| rdir = os.path.join(os.getcwd(), 'iris_image_test_output') | ||
|
|
||
| for expected, actual, diff in _failed_images_iter(): | ||
| for expected, actual, diff in step_over_diffs(rdir, 'similar', False): |
There was a problem hiding this comment.
This fixes the failed test runner --print-failed-images capability, so that we can see graphical failures that occurred remotely on travis again 😄
|
Whoop! All the tests passed first time! 🍻 |
🍻 indeed marqh clinks glasses with @bjlittle |
DPeterK
left a comment
There was a problem hiding this comment.
It's possible this PR was merged a little hastily. Here are my extra review comments, which I'll reference in a new issue to increase the chance they're acted upon.
| fi | ||
|
|
||
| # Perceptual image hashing (TBD: push recipe to conda-forge!) | ||
| - conda install pip |
There was a problem hiding this comment.
FWIW you don't need to install pip - it's already available on Travis.
There was a problem hiding this comment.
As part of the conda environment (and installing to it), though?
There was a problem hiding this comment.
pip is installed by default when you create a new conda environment (e.g. https://travis-ci.org/SciTools/iris/jobs/170017955#L253) so it doesn't need to be done separately. The result of this line is a no-op (https://travis-ci.org/SciTools/iris/jobs/170017955#L464).
|
|
||
|
|
||
| @_locker | ||
| def contour(cube, *args, **kwargs): |
There was a problem hiding this comment.
I wonder whether this will have any unexpected behavioural impacts in normal usage of iris.plot? I must admit that I expected that the lock would be applied within the scope of the tests only.
| @tests.skip_plot | ||
| class IntegrationTest(tests.GraphicsTest): | ||
| def setUp(self): | ||
| super(IntegrationTest, self).setUp() |
There was a problem hiding this comment.
@bjlittle do we need to follow this pattern for all instances of test classes inheriting from tests.GraphicsTest? If so, you'll also need to update the following:
- In
lib/iris/tests:test_analysis.TestRotatedPoletest_coordsystem.Test_LambertConformaltest_mapping.TestBasic- In
testPlot:Test1dPlotMultiArgs,TestMissingCoord,TestMissingCS,TestAttributePositive, TestPlotOtherCoordSystems
- All affected test classes in
lib/iris/tests/integration/test_grib_load.py - Test class in
lib/iris/tests/integration/test_netcdftime.py
This PR provides a more stable and scalable approach to the new graphic testing framework, replacing cryptographic sha based image hashes with perceptual image hashes.
It relies on the support of the new perceptual image hash repository test-iris-imagehash.
Perceptual image hashes allows checking for the degree of similarity between images through measuring the hamming distance difference (a count of bit differences) between image hashes.
The major benefit of this perceptual image hash approach is that it is simple, scalable (already almost halved the number of required reference images) and makes us less sensitive to scientific software stack dependency changes.
This PR also addresses the issues raised in #2195. Normally, I would have pushed those changes in a separate PR, but there is clearly a major overlap here with regards to graphical testing. In short, the problem simply relates to iris plotting not being thread-safe, and our
travis-citesting usesnose, which highlighted this problem in a typically dreaded, threaded sporadic way.Closes #2195