pandas python2和python3之间的pandas.DataFrame.load/save:pickle协议问题

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14586898/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 20:37:30  来源:igfitidea点击:

pandas.DataFrame.load/save between python2 and python3: pickle protocol issues

pythonpandas

提问by mathtick

I haven't figure out how to do pickle load/save's between python 2 and 3 with pandas DataFrames. There is a 'protocol' option in the pickler that I've played with unsuccessfully but I'm hoping someone has a quick idea for me to try. Here is the code to get the error:

我还没有弄清楚如何使用 Pandas DataFrames 在 python 2 和 3 之间进行 pickle 加载/保存。我使用过的pickler 中有一个“协议”选项,但没有成功,但我希望有人有一个快速的想法供我尝试。这是获取错误的代码:

python2.7

蟒蛇2.7

>>> import pandas; from pylab import *
>>> a = pandas.DataFrame(randn(10,10))
>>> a.save('a2')
>>> a = pandas.DataFrame.load('a2')
>>> a = pandas.DataFrame.load('a3')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/pandas-0.10.1-py2.7-linux-x86_64.egg/pandas/core/generic.py", line 30, in load
    return com.load(path)
  File "/usr/local/lib/python2.7/site-packages/pandas-0.10.1-py2.7-linux-x86_64.egg/pandas/core/common.py", line 1107, in load
    return pickle.load(f)
ValueError: unsupported pickle protocol: 3

python3

蟒蛇3

>>> import pandas; from pylab import *
>>> a = pandas.DataFrame(randn(10,10))
>>> a.save('a3')
>>> a = pandas.DataFrame.load('a3')
>>> a = pandas.DataFrame.load('a2')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.3/site-packages/pandas-0.10.1-py3.3-linux-x86_64.egg/pandas/core/generic.py", line 30, in load
    return com.load(path)
  File "/usr/local/lib/python3.3/site-packages/pandas-0.10.1-py3.3-linux-x86_64.egg/pandas/core/common.py", line 1107, in load
    return pickle.load(f)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf4 in position 0: ordinal not in range(128)

Maybe expecting pickle to work between python version is a bit optimistic?

也许期望pickle在python版本之间工作有点乐观?

采纳答案by ben.dichter

I had the same problem. You can change the protocol of the dataframe pickle file with the following function in python3:

我有同样的问题。您可以在python3中使用以下函数更改数据帧pickle文件的协议:

import pickle
def change_pickle_protocol(filepath,protocol=2):
    with open(filepath,'rb') as f:
        obj = pickle.load(f)
    with open(filepath,'wb') as f:
        pickle.dump(obj,f,protocol=protocol)

Then you should be able to open it in python2 no problem.

那么你应该可以在python2中打开它没问题。

回答by ragesz

If somebody uses pandas.DataFrame.to_pickle()then do the following modification in source code to have the capability of pickle protocol setting:

如果有人使用pandas.DataFrame.to_pickle()然后在源代码中做以下修改以具有pickle协议设置的能力:

1)In source file /pandas/io/pickle.py(before modification copy the original file as /pandas/io/pickle.py.ori) search for the following lines:

1)在源文件/pandas/io/pickle.py(修改前将原文件复制为/pandas/io/pickle.py.ori)中搜索以下几行:

def to_pickle(obj, path):

pkl.dump(obj, f, protocol=pkl.HIGHEST_PROTOCOL)

Change these lines to:

将这些行更改为:

def to_pickle(obj, path, protocol=pkl.HIGHEST_PROTOCOL):

pkl.dump(obj, f, protocol=protocol)

2)In source file /pandas/core/generic.py(before modification copy the original file as /pandas/core/generic.py.ori) search for the following lines:

2)在源文件/pandas/core/generic.py(修改前将原文件复制为/pandas/core/generic.py.ori)中搜索以下几行:

def to_pickle(self, path):

return to_pickle(self, path)

Change these lines to:

将这些行更改为:

def to_pickle(self, path, protocol=None):

return to_pickle(self, path, protocol)

3)Restart your python kernel if it runs then save your dataframe using any available pickle protocol(0, 1, 2, 3, 4):

3)重启你的 python 内核,如果它运行,然后使用任何可用的 pickle 协议(0, 1, 2, 3, 4)保存你的数据帧:

# Python 2.x can read this
df.to_pickle('my_dataframe.pck', protocol=2)

# protocol will be the highest (4), Python 2.x can not read this
df.to_pickle('my_dataframe.pck')

4)After pandas upgrade, repeat step 1 & 2.

4)pandas 升级后,重复步骤 1 和 2。

5) (optional)Ask the developers to have this capability in official releases (because your code will throw exception on any other Python environments without these changes)

5)(可选)要求开发人员在官方版本中拥有此功能(因为您的代码将在没有这些更改的任何其他 Python 环境中抛出异常)

Nice day!

美好的一天!

回答by user3197748

You can override the highest protocol available for the pickle package:

您可以覆盖 pickle 包可用的最高协议:

import pickle as pkl
import pandas as pd
if __name__ == '__main__':
    # this constant is defined in pickle.py in the pickle package:"
    pkl.HIGHEST_PROTOCOL = 2
    # 'foo.pkl' was saved in pickle protocol 4
    df = pd.read_pickle(r"C:\temp\foo.pkl")

    # 'foo_protocol_2' will be saved in pickle protocol 2 
    # and can be read in pandas with Python 2
    df.to_pickle(r"C:\temp\foo_protocol_2.pkl")

This is definitely not an elegant solution but it does the work without changing pandas code directly.

这绝对不是一个优雅的解决方案,但它可以在不直接更改 pandas 代码的情况下完成工作。

UPDATE:I found that the newer version of pandas, allow to specify the pickle version in the .to_picklefunction: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_pickle.html[1]DataFrame.to_pickle(path, compression='infer', protocol=4)

更新:我发现较新版本的Pandas,允许在.to_pickle函数中指定泡菜版本:https: //pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_pickle.html[1]DataFrame.to_pickle(path, compression='infer', protocol=4)