pandas 在两个不同的文件中转储和加载莳萝（泡菜）

Question

提问by achimneyswallow

I think this is fundamental to many people who know how to deal with pickle. However, I still can't get it very right after trying for a few hours. I have the following code:

我认为这对许多知道如何处理泡菜的人来说是基础。但是，我尝试了几个小时后仍然无法完全正确。我有以下代码：

In the first file

在第一个文件中

import pandas as pd

names = ["John", "Mary", "Mary", "Suzanne", "John", "Suzanne"]
scores = [80, 90, 90, 92, 95, 100]

records = pd.DataFrame({"name": names, "score": scores})
means = records.groupby('name').mean()

def name_score_function(record):
    if record in names:
        return(means.loc[record, 'score'])

import dill as pickle
with open('name_model.pkl', 'wb') as file:
    pickle.dump(means, file)

The second file

第二个文件

I would like to load what I have in the first file and make the score of a person (i.e. John, Mary, Suzanne) callable via a function name_model(record):

我想加载第一个文件中的内容，并通过函数 name_model(record) 使一个人（即约翰、玛丽、苏珊娜）的分数可调用：

import dill as pickle
B = pickle.load('name_model.pkl')

def name_model(record):
    if record in names:
        return(means.loc[record, 'score'])

Here it shows the error:

这里显示错误：

File "names.py", line 21, in <module>
B = pickle.load('name_model.pkl')
File "/opt/conda/lib/python2.7/site-packages/dill/dill.py", line 197, in load
pik = Unpickler(file)
File "/opt/conda/lib/python2.7/site-packages/dill/dill.py", line 356, in __init__
StockUnpickler.__init__(self, *args, **kwds)
File "/opt/conda/lib/python2.7/pickle.py", line 847, in __init__
self.readline = file.readline
AttributeError: 'str' object has no attribute 'readline'

I know the error comes from my lack of understanding of pickle. I would humbly accept your opinions to improve this code. Thank you!!

我知道错误来自我对泡菜缺乏了解。我虚心接受您的意见以改进此代码。谢谢！！

UPDATEThe more specific thing I would like to achieve:

更新我想实现的更具体的事情：

I would like to be able to use the function that I write in the first file and dump it, and then read it in the second file and be able to use this function to query the mean score of any person in the records.

我希望能够使用我在第一个文件中编写的函数并将其转储，然后在第二个文件中读取它并能够使用该函数查询记录中任何人的平均分数。

Here is what I have:

这是我所拥有的：

import pandas as pd

names = ["John", "Mary", "Mary", "Suzanne", "John", "Suzanne"]
scores = [80, 90, 90, 92, 95, 100]

records = pd.DataFrame({"name": names, "score": scores})
means = records.groupby('name').mean()

def name_score_function(record):
if record in names:
    return(means.loc[record, 'score'])

B = name_score_function(record)

import dill as pickle
with open('name_model.pkl', 'wb') as file:
    pickle.dump(B, file)

with open('name_model.pkl', 'rb') as file:
    B = pickle.load(f)

def name_model(record):
   return B(record)

print(name_model("John"))

As I execute this code, I have this error File "test.py", line 13, in <module> B = name_score_function(record) NameError: name 'record' is not defined

当我执行此代码时，出现此错误 File "test.py", line 13, in <module> B = name_score_function(record) NameError: name 'record' is not defined

I highly appreciate your assistance and patience.

我非常感谢您的帮助和耐心。

Answer 1

回答by achimneyswallow

Thank you. It looks like the following can solve the problem.

谢谢你。看起来以下内容可以解决问题。

import pandas as pd

names = ["John", "Mary", "Mary", "Suzanne", "John", "Suzanne"]
scores = [80, 90, 90, 92, 95, 100]

records = pd.DataFrame({"name": names, "score": scores})
means = records.groupby('name').mean()

import dill as pickle
with open('name_model.pkl', 'wb') as file:
    pickle.dump(means, file)

with open('name_model.pkl', 'rb') as file:
    B = pickle.load(file)

def name_score_function(record):
    if record in names:
        return(means.loc[record, 'score'])

print(name_score_function("John"))

Answer 2

回答by eafit

Hmm. you need to read it the same way you wrote it -- nesting it inside an open clause:

唔。你需要像你写的一样阅读它——将它嵌套在一个开放子句中：

import dill as pickle
with open('name_model.pkl' ,'rb') as f:
    B = pickle.load(f)

pandas 在两个不同的文件中转储和加载莳萝（泡菜）

提问by achimneyswallow

回答by achimneyswallow

回答by eafit

相关推荐

最近更新

标签

pandas 在两个不同的文件中转储和加载莳萝（泡菜）

提问by achimneyswallow

回答by achimneyswallow

回答by eafit

相关推荐

Qcut Pandas：ValueError：Bin 边缘必须是唯一的

返回 Pandas 数据框中特定值的列名

Pandas to_sql 到 sqlite 返回“Engine”对象没有属性“cursor”

在列表中的数据框列中搜索部分字符串匹配 - Pandas - Python

相关推荐

最近更新

标签