Pandas DataFrame apply() ValueError：解包的值太多（预期为 2）

Question

提问by Irek Rybark

I just started poking around Python and while I am very excited, it seems that I am far from pythonian thinking.

我刚刚开始研究 Python，虽然我很兴奋，但我似乎与 Python 思想相去甚远。

Here is an example of approach, which has word 'suboptimal' all over. While this is sufficient for my relatively small dataset, I am wondering how can I write it better way?

这是一个方法示例，其中到处都是“次优”一词。虽然这对于我相对较小的数据集来说已经足够了，但我想知道如何更好地编写它？

import pandas as pd
from pandas import DataFrame

# create sample log data frame
lg = pd.DataFrame(['Access violation at address 00A97...',
                   'Try to edit the splines or change...',
                   'Access violation at address 00F2B...',
                   'Please make sure the main electro...'], columns=['lg_msg'])

# define message classification
err_messages = [['Access violation', 'ACC-VIOL', 'PROG'],
                ['Please make sure th', 'ELE-NOT-PLACED', 'MOD'],
                ['Try to edit the splines', 'TRY-EDIT-SPLINES', 'MOD']]                

# lookup code
def message_code(msg_text):
    for msg in err_messages:
        if msg_text.startswith(msg[0]):
            return msg[1]
    return ''

# lookup type
def message_type(msg_text):
    for msg in err_messages:
        if msg_text.startswith(msg[0]):
            return msg[2]
    return ''               

lg['msg_code'] = lg['lg_msg'].apply(lambda x:  message_code(x))
lg['msg_type'] = lg['lg_msg'].apply(lambda x:  message_type(x))

I tried creating a single function to calculate log entry code and type at once:

我尝试创建一个函数来计算日志条目代码并一次输入：

def message_code_type(msg_text):
    for msg in err_messages:
        if msg_text.startswith(msg[0]):
            return (msg[1], msg[2])
    return ('', '')

lg['msg_code'], lg['msg_type'] = lg['lg_msg'].apply(lambda x:  message_code_type(x))

but getting:

但得到：

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-18-72f97d857539> in <module>()
----> 1 lg['msg_code'], lg['msg_code'] = lg['lg_msg'].apply(lambda x:  message_code_type(x))

ValueError: too many values to unpack (expected 2)

Is there any way to not traverse the dataframe twice?

有没有办法不遍历数据帧两次？

Any feedback will be appreciated.

任何反馈将不胜感激。

import sys
print(sys.version)
3.5.1 |Anaconda 2.4.0 (64-bit)| (default, Jan 29 2016, 15:01:46) [MSC v.1900 64 bit (AMD64)]

pd.__version__
'0.17.1'

Answer 1

回答by Kevin

try this using izipfrom the itertools module:

使用izipitertools 模块试试这个：

from itertools import izip
lg['msg_code'], lg['msg_code'] = izip(*lg['lg_msg'].apply(lambda x:  message_code_type(x)))

In [21]:    lg
Out[21]:
    lg_msg  msg_code
0   Access violation at address 00A97...    PROG
1   Try to edit the splines or change...    MOD
2   Access violation at address 00F2B...    PROG
3   Please make sure the main electro...    MOD

Sorry, thats for 2.7, you should just be able to use the built-in zip

对不起，那是 2.7，你应该可以使用内置的 zip

lg['msg_code'], lg['msg_type'] = zip(*lg['lg_msg'].apply(lambda x:  message_code_type(x)))

    lg_msg  msg_code    msg_type
0   Access violation at address 00A97...    ACC-VIOL    PROG
1   Try to edit the splines or change...    TRY-EDIT-SPLINES    MOD
2   Access violation at address 00F2B...    ACC-VIOL    PROG
3   Please make sure the main electro...    ELE-NOT-PLACED  MOD

Pandas DataFrame apply() ValueError：解包的值太多（预期为 2）

提问by Irek Rybark

回答by Kevin

相关推荐

最近更新

标签

Pandas DataFrame apply() ValueError：解包的值太多（预期为 2）

提问by Irek Rybark

回答by Kevin

相关推荐

pandas read_csv 中的最佳块大小是多少以最大化速度？

如何将两个 JSON 文件与 Pandas 合并

pandas 使用熊猫反转数据框的行顺序

Pandas - 阅读 HTML

相关推荐

最近更新

标签