Conversation
66d2f5b to
b1aefcf
Compare
|
@franzpoeschel I am sorry, but this PR will affect your current json backend please feel free to raise any questions in issues that might occur from this change! |
8cf1fa1 to
46eef88
Compare
Oh yes, I'm excited 😵 |
46eef88 to
56f91d6
Compare
| case DT::UINT64: | ||
| case DT::VEC_UINT64: | ||
| case DT::ULONG: | ||
| case DT::VEC_ULONG: |
There was a problem hiding this comment.
We're very likely going to have problems with ADIOS in this regard.
From what I can see, they do NOT make the destinction that causes problems with OSX and MSVC. Rather, they treat the bit-widths as I did initally (see adios_types.h, which explicitly lists sizes of the datatypes).
As e.g. int may now be either 16 or 32 bit, and ADIOS assumes a size of 32, this might explode in our face.
There was a problem hiding this comment.
Yep, I know but that's ADIOS' portability problem not ours: ornladios/ADIOS#187
(Side note: ADIOS1 only compiles on Linux and OSX, not on MSVC.)
There was a problem hiding this comment.
Uh wait, you mean we will not be able to store a (u)int64_t since ADIOS has no long long... Yeah... Well.
There was a problem hiding this comment.
that's ADIOS' portability problem not ours
So the baseline here is to rely on the C standard and ignore potentially faulty behaviour in one of our backends. Not ideal, but probably the only thing we can should do.
you mean we will not be able to store a (u)int64_t
We most likely can, unless we're on exotic platforms (MSVC, anyone?) where LONG and ULONG are 32 Bits wide... Which again comes back to the standard compliance stated above.
There was a problem hiding this comment.
What I try to say is: if adios_long is defined as 8 byte on all platforms, we will implement it as such and convert properly in the backend from whatever matches (int or long or long long).
There was a problem hiding this comment.
The fileformat is responsible for portability, e.g. HDF5 stores automatically platform information with the stored types
Hi, just so I get this right: When reading e.g. a long from some stored file, I will also have to take into account the platform that wrote the value and use the reading platform's datatype that corresponds with the actual length of the value stored, so possibly not a long but an int for example?
If so, it's good you mention this, because floating point datatypes already have the same issue and I ignored that so far.
There was a problem hiding this comment.
I will also have to take into account the platform
My idea is to handle it exactly the other way around. Our backends, such as HDF5, are portable across platforms. If HDF5 stores for example a long (aka int64_t) on x86-64 Linux it does store in the back automatically the type of the platform. If you read this file back on x86-64 Windows, the HDF type presented to you will be long long (still aka int64_t) since int (and long) are int32_t there. The precision of the data is preserved.
JSON
Let's migrate that discussion here: #65
|
Note: The leftover HDF5 CI errors on OSX and MSVC are "just" the Note2: we might have to fake types in the ADIOS1 backend on OSX in case it's really assuming fixed sizes behind it's |
src/ParticlePatches.cpp
Outdated
|
|
||
| using DT = Datatype; | ||
| if( DT::UINT64 != *dOpen.dtype ) | ||
| if( DT::ULONG != *dOpen.dtype ) |
There was a problem hiding this comment.
Correct me if I'm wrong, but this does not strictly enforce the standard, ULONG may be 32 or 64.
if( determineDatatype< uint64_t >() != *dOpen.dtype )There was a problem hiding this comment.
exactly, that's the leftover from my replace marathon and your suggestion is correct!
src/Series.cpp
Outdated
| IOHandler->enqueue(IOTask(this, aRead)); | ||
| IOHandler->flush(); | ||
| if( *aRead.dtype == DT::UINT32 ) | ||
| if( *aRead.dtype == DT::UINT ) |
There was a problem hiding this comment.
if( *aRead.dtype == determineDatatype< uint32_t >() )| else if( brc.getDatatype() == DT::UINT ) | ||
| return py::dtype("uint32"); | ||
| else if( brc.getDatatype() == DT::UINT64 ) | ||
| else if( brc.getDatatype() == DT::ULONG ) |
There was a problem hiding this comment.
This is now probably not precise enough as ULONG may be 32 or 64.
Numpy provides a number of compatible C types, but not all of the applicable ones:
https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.scalars.html#built-in-scalar-types
There was a problem hiding this comment.
I see you recorgnized the problem in RecordComponent.cpp already. 👍
There was a problem hiding this comment.
Yep, pls feel free to annotate any line you catch :)
test/SerialIOTest.cpp
Outdated
| RecordComponent& e_positionOffset_x = e_positionOffset["x"]; | ||
| REQUIRE(e_positionOffset_x.unitSI() == 2.599999993753294e-07); | ||
| REQUIRE(e_positionOffset_x.getDatatype() == Datatype::INT32); | ||
| REQUIRE(e_positionOffset_x.getDatatype() == Datatype::INT); |
There was a problem hiding this comment.
Same problem for all datatypes read back from a file.
INT may be 16, 32 or 64.
REQUIRE(e_positionOffset_x.getDatatype() == determineDatatype< int32_t >());10bc435 to
9d421be
Compare
| return py::dtype("int"); | ||
| // missing in numpy: covered by uint or ulonglong | ||
| // else if( brc.getDatatype() == DT::LONG ) | ||
| // return py::dtype("long"); |
There was a problem hiding this comment.
Now this is weird.
https://docs.scipy.org/doc/numpy/user/basics.types.html
int_| Default integer type (same as Clong; normally eitherint64orint32)
https://docs.scipy.org/doc/numpy-1.14.2/reference/arrays.scalars.html
int_| compatible: Python int |'l'
https://docs.python.org/3/c-api/long.html
(Note that there's two meanings of longhere: (a) C long (b) Pythons's 'bignum' integer of arbitrary precision)
PyLong_FromLong(long v)
Note the lacking PyLong_FromInt(int v)
So there seems to be an equivalent of long in int_, but no counterpart to unsigned long.
There was a problem hiding this comment.
https://docs.scipy.org/doc/numpy/user/basics.types.html#array-types-and-conversions-between-types
Additionally to
intcthe platform dependent C integer typesshort,long,longlongand their unsigned versions are defined.
Quick test:
In [1]: import numpy as np
In [2]: np.dtype('long')
Out[2]: dtype('int64')Haven't figured out how unsigned long works yet.
There was a problem hiding this comment.
Docs are wrong. The C integer type
longis callednp.int_, and the integer typeunsigned longis callednp.uint
So there we have it:
C int/ unsigned int is Python intc/uintc
C long/unsigned long is Python int_/ uint
In [1]: import numpy as np
In [2]: np.dtype('short')
Out[2]: dtype('int16')
In [3]: np.dtype('intc')
Out[3]: dtype('int32')
In [4]: np.dtype('int_')
Out[4]: dtype('int64')
In [5]: np.dtype('longlong')
Out[5]: dtype('int64')There was a problem hiding this comment.
Wow, as fancy as my float-confusion in Python xD Thanks a ton for checking!
There was a problem hiding this comment.
Ah yes, should be fixed now together with my float question in numpy's docs: numpy/numpy#11837
| else if( r.getDatatype() == Datatype::SHORT ) dtype = py::dtype("short"); | ||
| else if( r.getDatatype() == Datatype::INT ) dtype = py::dtype("int"); | ||
| // missing in numpy: covered by int or longlong | ||
| // else if( r.getDatatype() == Datatype::LONG ) dtype = py::dtype("long"); |
| else if( a.dtype().is(py::dtype("int")) ) | ||
| r.storeChunk( offset, extent, shareRaw( (int*)a.mutable_data() ) ); | ||
| // missing in numpy: covered by int or longlong | ||
| // else if( a.dtype().is(py::dtype("long")) ) |
| REQUIRE(Datatype::INT == a.dtype); | ||
| a = Attribute(static_cast< int >(0)); | ||
| REQUIRE(Datatype::LONG == a.dtype); | ||
| } |
There was a problem hiding this comment.
Scary thought:
There might be platforms where
sizeof(short) > 2umaking it impossible to write or read 16 Bit wide data (as int16_t does not alias to any of our provided integer types).
There was a problem hiding this comment.
Hm, that would be weird. But if a platform does not implement a 2 byte fundamental integer type, there will also be no alias int16_t for it.
There was a problem hiding this comment.
Just to throw out this SO answer again, this is present for char on certain (albeit rare) systems.
anokfireball
left a comment
There was a problem hiding this comment.
First of all, excuse the avalanche of comments.
Python dtypes need adjustment and a few questions need to be resolved.
Refactor the integer type system to support all fundamental C/C++ integer types: `short`, `int`, `long`, `long long` plus the unsigned versions of those. Fixed size ints are aliases of the above, leading to issues on OSX and MSVC since there is no fixed int alias for "long" in the old design.
176d37d to
01e54fd
Compare
|
(just pushing again since travis forgot to report its success state) |
01e54fd to
92dceaa
Compare
92dceaa to
0cdcb65
Compare
Besides returning true for the same types, identical implementations on some platforms, e.g. if long and long long are the same or double and long double will also return true. Affected by https://stackoverflow.com/questions/44515148/why-is-operator-overload-of-enum-ambiguous-in-msvc on MSVC.
0cdcb65 to
159d8f4
Compare
Refactor the integer type system to support all fundamental C/C++ integer types:
short,int,long,long longplus the unsigned versions of those.Fixed size and "least size" ints are aliases of the above, leading to issues on MSVC since there is no fixed int alias for
longin the old design.Another detail is that
uint64_tis an alias forlongon 64bit Linux but an alias forlong longon 64bit OSX. On both platforms,longandlong longare 64 bit.User Changes
Users will see no changes as long as they use helpers such as
determineDatatype<T>(). Otherwise, they must be aware thatDatatypes such asDT::INT32are now in one ofDT::SHORT,DT::INT, etc. Types such asint32_tcan still be used for attributes and data chunks, since they are simple aliases.Backend Changes
Instead of implementing types such as
INT16,INT32,INT64, ... one now needs to implement the four typesshort,int,longandlong long. Throw runtime errors on missing types, e.g. ADIOS 1.13.1 does not supportlong longyet.The fileformat is responsible for portability, e.g. HDF5 stores automatically platform information with the stored types, making sure a stored
longon one platform is anint32_ton one and andint64_ton another platform (and will not show up aslongon both, potentially).Related
Related and depending issues are #330 and #333.