pandas 在多列熊猫上应用 lambda 行

Question

提问by muni

I am creating a sample dataframe:

我正在创建一个示例数据框：

tp = pd.DataFrame({'source':['a','s','f'], 
                   'target':['b','n','m'], 
                   'count':[0,8,4]})

And creating a column 'col' based on condition of 'target' column >> same as source, if matching condition, else to a default, as below:

并根据“目标”列的条件创建列“col”>>与源相同，如果匹配条件，则为默认值，如下所示：

tp['col'] = tp.apply(lambda row:row['source'] if row['target'] in ['b','n'] else 'x')

But it's throwing me this error: KeyError: ('target', 'occurred at index count')

但它给我这个错误： KeyError: ('target', 'occurred at index count')

How can I make it work, without defining a function?

如何在不定义函数的情况下使其工作？

Answer 1

As per @Zero's comment, you need to use axis=1to tell Pandas you want to apply a function to each row. The default is axis=0.

根据@Zero 的评论，您需要使用axis=1来告诉 Pandas 您想对每一行应用一个函数。默认为axis=0。

tp['col'] = tp.apply(lambda row: row['source'] if row['target'] in ['b', 'n'] else 'x',
                     axis=1)

However, for this specific task, you should use vectorised operations. For example, using numpy.where:

但是，对于此特定任务，您应该使用矢量化操作。例如，使用numpy.where：

tp['col'] = np.where(tp['target'].isin(['b', 'n']), tp['source'], 'x')

pd.Series.isinreturns a Boolean series which tells numpy.wherewhether to select the second or third argument.

pd.Series.isin返回一个布尔系列，它告诉您numpy.where是选择第二个还是第三个参数。