Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion python/pyspark/broadcast.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
from tempfile import NamedTemporaryFile

from pyspark.cloudpickle import print_exec
from pyspark.serializers import protocol

if sys.version < '3':
import cPickle as pickle
Expand Down Expand Up @@ -78,7 +79,7 @@ def __init__(self, sc=None, value=None, pickle_registry=None, path=None):

def dump(self, value, f):
try:
pickle.dump(value, f, 2)
pickle.dump(value, f, protocol)
except pickle.PickleError:
raise
except Exception as e:
Expand Down
2 changes: 1 addition & 1 deletion python/pyspark/serializers.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@
from itertools import izip as zip
else:
import pickle
protocol = 3
protocol = min(pickle.HIGHEST_PROTOCOL, 4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we only change the protocol for Broadcast? Otherwise we should also make sure that the Pyrolite can support protocol 4, which is used by Spark SQL.

Copy link
Contributor Author

@singularperturbation singularperturbation Oct 31, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had only identified this as an issue for Broadcast and the serializer in the SparkContext (I wasn't using Spark SQL in my pyspark application when I came across this issue).

It looks like the highest supported version of Pickle (for writing) is Protocol 2 in Pyrolite (https://github.com/irmen/Pyrolite/blob/master/java/src/main/java/net/razorvine/pickle/Pickler.java#L36)
From the project README:

Pickle protocol version support: reading: 0,1,2,3,4; writing: 2.
Pyrolite can read all pickle protocol versions  (0 to 4, so this includes
the latest additions made in Python 3.4).
Pyrolite always writes pickles in protocol version 2. There are no plans on 
including protocol version 1 support. Protocols 3 and 4 contain some nice new
features which may eventually be utilized, but for now, only version 2 is used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like this part of the change would be ok then since Pyrolite can read v4 just fine.

xrange = range

from pyspark import cloudpickle
Expand Down