Change MfList.fmt_string for floats to use numpy's string formatter#322
Conversation
|
Hey Mike, thanks for looking into this. I see there is one failure, which I've looked into a little. I'm not sure how to move forward with this. The following is the first line of the river package for the original secp model, what flopy does now, and what flopy does with your proposed change (I removed some whitespace for clarity): It looks like the conductance value (and some others) might be problematic with your proposed changes. But its not entirely clear why, since a single precision version of MODFLOW wouldn't be able to store all those digits anyway. I like where you are headed with this. Any thoughts? |
|
If I remember right, we had a more general formatter but some people were reporting different results after loading writing rerunning. The culprit was the float formatter in MfList... |
|
I've been inspecting a few versions of secp too, and there are very small differences in the output solutions. GMS natively uses double precision floats, so the original formatted files have more than sufficient precision. Ideally, rewritten files would allow a model to reproduce an exact flow solution, but with either case I see differences of heads around 1e-6. The only way to "see" how these REALs are represented are in binary form using numpy's .tostring() method: import numpy as np
import itertools
strings = ['225.39999389648', '2.2539999E+02', '225.4']
def process(x, dtype):
valx = dtype(x)
strx = str(valx)
hexx = '0x' + valx.tostring().encode('hex')
return valx, strx, hexx
for a_str, b_str in itertools.combinations(strings, 2):
a, a_str, a_hex = process(a_str, np.float32)
b, b_str, b_hex = process(b_str, np.float32)
eq = '==' if a_hex == b_hex else '!='
print('A: {} {} {} B: {} {}'.format(a_str, a_hex, eq, b_str, b_hex))
# A: 225.4 0x66666143 == B: 225.4 0x66666143
# A: 225.4 0x66666143 == B: 225.4 0x66666143
# A: 225.4 0x66666143 == B: 225.4 0x66666143this example shows that all number combinations are the same for float32 types, so Fortran's REAL also should see them the same. (Repeating the example with float64 will make them each different.) I'm certain that somewhere in one of the several stress packages for secp there is a minor binary difference in one of these combinations that make a difference to the simulation. I'll need to dig around a bit more to think up some new ideas... |
|
I've finally made sense of the Travis CI results. There was one success with numpy >= 1.14.0 and most of the other failures were with older numpy versions. A very subtle and good change happened in this release, related to formatting floating point numbers:
The longer version of the Dragon4 implementation (primarily a comp-sci read) is described here. Since Steele and White (1990), several low-level floating-point formatters have been implemented and used across most of the compilers and software we use today, and they sometimes have subtle formatting differences. What this means is that older numpy versions may have formatted float32 values a "bit off". E.g., comparing a value from line 246 of the original secp.chd GMS MODFLOW file: With older numpy versions, here is where it formats a "bad" version that has a different value: And numpy 1.14.0, this is a "good" version, which is the same in both formatted and in binary these differences matter to FORTRAN REAL types, thus the flow solution is different for the secp example. So what does this mean? We can possibly switch to an improved floating-point precision for numpy >= 1.14, which looks better and preserves formatted floating point precision on a binary level. But older versions of numpy should stick to the current floating-point formatting. |
|
I've rewritten this commit to adjust the formatting string, depending on the numpy version. Versions before numpy 1.14 preserve the All Travis CI tests (except the usual Python nightly) now pass, where each have a different version of numpy, currently:
|
Stress data written by flopy inflates floating point precision, rather than preserving the native precision of the data type. This is due to a C-formatter tucked deep within flopy/utils/util_list.py. For example, a flux rate from a well package that is originally -8773.9 is effectively written by flopy as -8773.9004, introducing a difference of 0.0004. The good news is that these differences do not matter to single-precision floats (i.e. REAL or float32), but they are visually different when formatted, or used in double precision (or float64).
As shown above (and in this PR), the suggested fix is to just format a string for fixed-width floats. Numpy supports a wide range of different float types with different precisions, and they generally get formatted to string in a way that honors their precision.
There are possible consequences to this PR. For instance, files written by flopy will look different. E.g., a well package file that previously looks like this:
would then look like this: