Pandas DataFrame apply() ValueError:解包的值太多(预期为 2)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35373223/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas DataFrame apply() ValueError: too many values to unpack (expected 2)
提问by Irek Rybark
I just started poking around Python and while I am very excited, it seems that I am far from pythonian thinking.
我刚刚开始研究 Python,虽然我很兴奋,但我似乎与 Python 思想相去甚远。
Here is an example of approach, which has word 'suboptimal' all over. While this is sufficient for my relatively small dataset, I am wondering how can I write it better way?
这是一个方法示例,其中到处都是“次优”一词。虽然这对于我相对较小的数据集来说已经足够了,但我想知道如何更好地编写它?
import pandas as pd
from pandas import DataFrame
# create sample log data frame
lg = pd.DataFrame(['Access violation at address 00A97...',
'Try to edit the splines or change...',
'Access violation at address 00F2B...',
'Please make sure the main electro...'], columns=['lg_msg'])
# define message classification
err_messages = [['Access violation', 'ACC-VIOL', 'PROG'],
['Please make sure th', 'ELE-NOT-PLACED', 'MOD'],
['Try to edit the splines', 'TRY-EDIT-SPLINES', 'MOD']]
# lookup code
def message_code(msg_text):
for msg in err_messages:
if msg_text.startswith(msg[0]):
return msg[1]
return ''
# lookup type
def message_type(msg_text):
for msg in err_messages:
if msg_text.startswith(msg[0]):
return msg[2]
return ''
lg['msg_code'] = lg['lg_msg'].apply(lambda x: message_code(x))
lg['msg_type'] = lg['lg_msg'].apply(lambda x: message_type(x))
I tried creating a single function to calculate log entry code and type at once:
我尝试创建一个函数来计算日志条目代码并一次输入:
def message_code_type(msg_text):
for msg in err_messages:
if msg_text.startswith(msg[0]):
return (msg[1], msg[2])
return ('', '')
lg['msg_code'], lg['msg_type'] = lg['lg_msg'].apply(lambda x: message_code_type(x))
but getting:
但得到:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-18-72f97d857539> in <module>()
----> 1 lg['msg_code'], lg['msg_code'] = lg['lg_msg'].apply(lambda x: message_code_type(x))
ValueError: too many values to unpack (expected 2)
Is there any way to not traverse the dataframe twice?
有没有办法不遍历数据帧两次?
Any feedback will be appreciated.
任何反馈将不胜感激。
import sys
print(sys.version)
3.5.1 |Anaconda 2.4.0 (64-bit)| (default, Jan 29 2016, 15:01:46) [MSC v.1900 64 bit (AMD64)]
pd.__version__
'0.17.1'
回答by Kevin
try this using izip
from the itertools module:
使用izip
itertools 模块试试这个:
from itertools import izip
lg['msg_code'], lg['msg_code'] = izip(*lg['lg_msg'].apply(lambda x: message_code_type(x)))
In [21]: lg
Out[21]:
lg_msg msg_code
0 Access violation at address 00A97... PROG
1 Try to edit the splines or change... MOD
2 Access violation at address 00F2B... PROG
3 Please make sure the main electro... MOD
Sorry, thats for 2.7, you should just be able to use the built-in zip
对不起,那是 2.7,你应该可以使用内置的 zip
lg['msg_code'], lg['msg_type'] = zip(*lg['lg_msg'].apply(lambda x: message_code_type(x)))
lg_msg msg_code msg_type
0 Access violation at address 00A97... ACC-VIOL PROG
1 Try to edit the splines or change... TRY-EDIT-SPLINES MOD
2 Access violation at address 00F2B... ACC-VIOL PROG
3 Please make sure the main electro... ELE-NOT-PLACED MOD