Skip to content

Conversation

@aaronspring
Copy link
Collaborator

@aaronspring aaronspring commented Apr 7, 2022

  • implement bp.jl_bitround
  • test against bp.xr_bitround
  • align with numcodecs.bitround

Closes #25
deals with #27

@aaronspring aaronspring self-assigned this Apr 7, 2022
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@aaronspring aaronspring added the enhancement New feature or request label Apr 7, 2022
@aaronspring
Copy link
Collaborator Author

aaronspring commented Apr 7, 2022

bp.xr_bitround and bp.jl_bitround yield identical results except for keepbit=23, where

ds=xr.tutorial.load_dataset("air_temperature")
v=list(ds.data_vars)[0]
i=23
(bp.xr_bitround(ds[v],i)-bp.jl_bitround(ds[v],i)).squeeze().isel(time=0).plot()

by default 2022-04-07 at 13 58 51

xr.testing.assert_equal(bp.xr_bitround(ds[v],i),ds[v]) # passes
xr.testing.assert_equal(bp.jl_bitround(ds[v],i),ds[v]) # fails
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Input In [86], in <cell line: 3>()
      1 xr.testing.assert_equal(bp.xr_bitround(ds[v],i),ds[v])
----> 3 xr.testing.assert_equal(bp.jl_bitround(ds[v],i),ds[v])

    [... skipping hidden 1 frame]

File /work/mh0727/m300524/conda-envs/bitinfo/lib/python3.10/site-packages/xarray/testing.py:81, in assert_equal(a, b)
     79 assert type(a) == type(b)
     80 if isinstance(a, (Variable, DataArray)):
---> 81     assert a.equals(b), formatting.diff_array_repr(a, b, "equals")
     82 elif isinstance(a, Dataset):
     83     assert a.equals(b), formatting.diff_dataset_repr(a, b, "equals")

AssertionError: Left and right DataArray objects are not equal

Differing values:
L
    array([[[241.20001, 242.5    , ..., 235.5    , 238.6    ],
            [243.79999, 244.5    , ..., 235.29999, 239.29999],
            ...,
            [295.90002, 296.2    , ..., 295.90002, 295.2    ],
            [296.29004, 296.79004, ..., 296.79004, 296.60004]],
    
           [[242.1    , 242.70001, ..., 233.6    , 235.79999],
            [243.6    , 244.1    , ..., 232.5    , 235.70001],
            ...,
            [296.2    , 296.7    , ..., 295.5    , 295.10004],
            [296.29004, 297.2    , ..., 296.40002, 296.60004]],
    
           ...,
    
           [[245.79001, 244.79001, ..., 243.98999, 244.79001],
            [249.89001, 249.29001, ..., 242.48999, 244.29001],
            ...,
            [296.29004, 297.19   , ..., 295.09003, 294.39   ],
            [297.79004, 298.39   , ..., 295.49   , 295.19   ]],
    
           [[245.09   , 244.29001, ..., 241.48999, 241.79001],
            [249.89001, 249.29001, ..., 240.29001, 241.69   ],
            ...,
            [296.09003, 296.89   , ..., 295.69   , 295.19   ],
            [297.69   , 298.09003, ..., 296.19   , 295.69   ]]], dtype=float32)
R
    array([[[241.2    , 242.5    , ..., 235.5    , 238.59999],
            [243.79999, 244.5    , ..., 235.29999, 239.29999],
            ...,
            [295.9    , 296.19998, ..., 295.9    , 295.19998],
            [296.29   , 296.79   , ..., 296.79   , 296.6    ]],
    
           [[242.09999, 242.7    , ..., 233.59999, 235.79999],
            [243.59999, 244.09999, ..., 232.5    , 235.7    ],
            ...,
            [296.19998, 296.69998, ..., 295.5    , 295.1    ],
            [296.29   , 297.19998, ..., 296.4    , 296.6    ]],
    
           ...,
    
           [[245.79   , 244.79   , ..., 243.98999, 244.79   ],
            [249.89   , 249.29   , ..., 242.48999, 244.29   ],
            ...,
            [296.29   , 297.19   , ..., 295.09   , 294.38998],
            [297.79   , 298.38998, ..., 295.49   , 295.19   ]],
    
           [[245.09   , 244.29   , ..., 241.48999, 241.79   ],
            [249.89   , 249.29   , ..., 240.29   , 241.68999],
            ...,
            [296.09   , 296.88998, ..., 295.69   , 295.19   ],
            [297.69   , 298.09   , ..., 296.19   , 295.69   ]]], dtype=float32)

EDIT: I think the data with just one decimal is responsible.

@aaronspring aaronspring marked this pull request as ready for review April 7, 2022 12:04
@milankl
Copy link
Collaborator

milankl commented Apr 7, 2022

Can you link me to the python code for numcodecs.bitround? It's not in master yet right? In Ryan's version it seems that a simple escape is used for keepbits=23 (because no rounding should take place).

def encode(self, buf):
        if self.keepbits == 23:
            return buf

I can easily add that for BitInformation.jl too. Because at the moment we have

julia> bitstring.(A,:split)
10-element Vector{String}:
 "0 01111101 00100011110101011001100"
 "1 01111101 11111100000100010010110"
 "0 01111111 00100110111101011010101"
 "0 01111100 00000101000001101001010"
 "1 01111111 10100000111000010100000"
 "0 01111100 01011110111011000110101"
 "0 01111010 01100000001000101011100"
 "1 01111110 00101101011101001110111"
 "1 01111101 10000010000111001000111"
 "1 01111101 11000011000111110011100"

julia> bitstring.(round(A,22),:split)    # keepbits=22, all correct
10-element Vector{String}:
 "0 01111101 00100011110101011001100"    # no rounding
 "1 01111101 11111100000100010010110"    # no rounding
 "0 01111111 00100110111101011010100"    # round to zero=even (tie)
 "0 01111100 00000101000001101001010"    # no rounding
 "1 01111111 10100000111000010100000"    # no rounding
 "0 01111100 01011110111011000110100"    # round to zero=even (tie)
 "0 01111010 01100000001000101011100"    # no rounding
 "1 01111110 00101101011101001111000"    # round away from zero (with carry)
 "1 01111101 10000010000111001001000"    # round away from zero (with carry)
 "1 01111101 11000011000111110011100"    # no rounding

julia> bitstring.(round(A,23),:split)    # keepbits=23
10-element Vector{String}:
 "0 01111101 00100011110101011001100"    # no rounding, correct
 "1 01111101 11111100000100010010110"    # no rounding, correct
 "0 01111111 00100110111101011010110"    # round away from zero, incorrect
 "0 01111100 00000101000001101001010"    # no rounding, correct
 "1 01111111 10100000111000010100000"    # no rounding, correct
 "0 01111100 01011110111011000110110"    # round away from zero, incorrect
 "0 01111010 01100000001000101011100"    # no rounding, correct
 "1 01111110 00101101011101001111000"    # round away from zero, incorrect
 "1 01111101 10000010000111001001000"    # round away from zero, incorrect
 "1 01111101 11000011000111110011100"    # no rounding, correct

Meaning for the edge case of keepbits=23 there's still some rounding away from zero possible, which obviously shouldn't happen. I'll see in a patch release how to best deal with that. The ball here is in my court. But that shouldn't stop us if you want to make BitInformation.round the default.

@observingClouds
Copy link
Owner

This is awesome! I guess we handle keepbit==23 differently in the possible numcodecs implementation and BitInformation. See here. The suggested numcodecs implementation returns the input as is, while BitInformation should do the same according to this test. Seems like the issue needs to be fixed upstream.

@observingClouds
Copy link
Owner

Thanks @milankl. The codec is indeed not yet merged and I just have that in my own numcodecs development branch to make it available. With zarr-developers/numcodecs#290 being merged, we could also think of having bitround as an external filter which gets automatically registered with numcodecs. I played a little bit with that here.

@milankl
Copy link
Collaborator

milankl commented Apr 7, 2022

this will be addressed with milankl/BitInformation.jl#37 which can be released as v0.5.1 later today

@milankl
Copy link
Collaborator

milankl commented Apr 7, 2022

BitInformation.jl v0.5.1 is released.

@aaronspring aaronspring merged commit 7496237 into main Apr 7, 2022
@aaronspring aaronspring deleted the jl_bitround branch April 7, 2022 19:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bitround in python or julia

4 participants