Skip to content

Handle numpy pure scalar types#62

Closed
stefanoborini wants to merge 7 commits into
masterfrom
feature/handle-numpy-scalar-types-0408-1648
Closed

Handle numpy pure scalar types#62
stefanoborini wants to merge 7 commits into
masterfrom
feature/handle-numpy-scalar-types-0408-1648

Conversation

@stefanoborini
Copy link
Copy Markdown
Contributor

spawned from enthought/mayavi#61
The state pickler/unpickler could not handle numpy scalar types,
which were forced to None. This meant that trait classes having
Float() trait hosting a numpy.float64 could not be pickled
(to be precise they were pickled, with the content being None, silently).

spawned from enthought/mayavi#61
The state pickler/unpickler could not handle numpy scalar types,
which were forced to None. This meant that trait classes having
Float() trait hosting a numpy.float64 could not be pickled
(to be precise they were pickled, with the content being None, silently).
@stefanoborini
Copy link
Copy Markdown
Contributor Author

Failures were already in master and are due to 2.6 and 3.4 incompatibilities.
@dpinte do we still care about 2.6 support? I might have to fix master for 3.4 and merge back onto this to get green.

@dpinte
Copy link
Copy Markdown
Member

dpinte commented Apr 11, 2016

@stefanoborini I think we can start dropping the 2.6 support in apptools

@stefanoborini
Copy link
Copy Markdown
Contributor Author

@dpinte ok.

stefanoborini added a commit that referenced this pull request Apr 18, 2016
As agreed with @dpinte in pull request #62, we are dropping 2.6 support for apptools.
This will not bring master back to green, but it's a factor.
@stefanoborini
Copy link
Copy Markdown
Contributor Author

Strangely enough, tests pass on my machine with python 3.4.

# want to recover their true type, and the only way of doing so
# is to consider them as objects.
if isinstance(value, numpy.generic):
return id(value)
Copy link
Copy Markdown
Member

@itziakos itziakos Jun 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a small problem with using id here. The same id can be re-used when the original object is garbage collected. So if during the pickling process the numpy arrays are garbage collected then the registry might point to the wrong object.

Copy link
Copy Markdown
Member

@itziakos itziakos Jun 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the pickling _do_... method to keep the object alive in the chase (for the duration of the pickling process). This should be enough to avoid the id reuse case.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a simple way to avoid the caching altogether? The current caching code is broken (#66), and I don't think it makes a lot of sense to be caching simple immutable scalars in the first place: the size of the "reference" dictionary generated is going to be comparable to the size of the object dictionary.

Comment thread apptools/persistence/state_pickler.py Outdated
def _do_numpy_generic_type(self, value):
idx = self._register(value)
self._misc_cache.append(value)
data = base64_encode(pickle.dumps(value))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pickling numpy scalars is quite costly (in space). I wonder if there is a better way.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the HIGHEST_PROTOCOL

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But otherwise, not so much. You'll end up reproducing exactly what pickle does anyways, if you want to be general.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data = base64_encode(value.dumps())

@codecov-io
Copy link
Copy Markdown

codecov-io commented Jun 10, 2016

Current coverage is 80.36%

Merging #62 into master will increase coverage by 0.06%

@@           master        #62   diff @@
========================================
  Files         108        108          
  Lines        4586       4609    +23   
  Methods         0          0          
  Branches      732        712    -20   
========================================
+ Hits         3683       3704    +21   
- Misses        744        757    +13   
+ Partials      159        148    -11   

Powered by Codecov. Last updated by bb38492...daf1b79

@itziakos
Copy link
Copy Markdown
Member

@prabhuramachandran can you have a look. is it ok to merge this PR?

@stefanoborini
Copy link
Copy Markdown
Contributor Author

Imho, it's good for me. Good spot.

return id(value)

try:
key = hash(value)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of hash(value) as a key is broken: it's assuming that hash will be one-to-one. This is orthogonal to the issue fixed by this PR; I'll open a separate issue.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll open a separate issue.

#66

@rahulporuri rahulporuri deleted the feature/handle-numpy-scalar-types-0408-1648 branch October 27, 2020 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants